This project identifies faces in an image or video and classifies each face into one of the 7 emotions: ['neutral', 'happy', 'sad', 'surprise', 'fear', 'disgust', 'anger', 'contempt']. The primary libraries used for modeling purposes were PyTorch and OpenCV. The primary tools used for development were pytest, black code formatter, GitHub Actions, and FastAPI.
Input:
image or a videoOutput:
the input with annotated faces [local usage] or a set of coordinates and annotations for each face [API]
The aim of the project is twofold. First, emotion recognition technology is used in various real applications. In marketing, it is used to better understand the impact of marketing material during focus group activities. In healthcare, it is used to help individuals with autism better identify the emotions and facial expressions they encounter. The automotive industry is experimenting with computer vision technology to monitor the driver's emotional state. An extreme emotional state or drowsiness could trigger an alert for the driver. Therefore, this project serves to show the level of emotion recognition systems today.
Second, the project was developed for learning purposes. Particular emphasis was put on development methods used in the workplace, such as testing, logging, continuous integration, and developing an API.
Code structure tree, shows the exact content of the project
├── data
│ ├── raw_data
│ ├── train_val_test_split
│ └── predictions
├── requirements.txt
├── architectures
│ ├── ED_model.py
│ └── DAN.py
├── trained_models
│ └── (affecnet8.pth)
├── scripts
│ ├── config.yml
│ ├── etl.py
│ ├── train.py
│ ├── get_model.py
│ ├── predict.py
│ ├── live_predict.py
│ ├── tests
│ │ └── test_models.py
│ ├── utils
│ │ └── utils.py
│ └── train_utils
│ ├── create_dataloaders.py
│ └── dataloader.py
├── log
│ ├── etl.log
│ ├── predict.log
│ └── results.csv
├── prediction_api
│ ├── api_utility.py
│ └── main.py
├── README.md
└── .github
└── workflows
└── ed_app_workflow.yml
- Git Clone the repo
git clone https://github.com/JenAlchimowicz/Emotion-Recognition-Api.git
- Go to project root folder
cd Emotion-Recognition-Api
- Setup virtual environment (venv + pip)
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
- Download the weights of pretrained DAN model (training from scratch also possible see section below)
python scripts/get_model.py
- Start a local server using uvicorn
python prediction_api/main.py
-
Go to the displayed URL (default is
http://127.0.0.1:8000
) -
Make predictions. Example output:
API output | Output from running scripts |
---|---|
-
Complete steps 1-4
-
Set
file_path
inconfig.yaml
to your image or video path -
Run predictions
python scripts/predict.py
-
Complete steps 1-3
-
Download fer2013 dataset from here and place in
data/raw_data
directory -
Prepare the fer2013 dataset for training
python scripts/etl.py
- Run the training procedure. You can change arguments like number of epochs or learning rate in the
config.yaml
file.
python scripts/train.py
- Make sure the
config.yaml
file points to your new trained model and make predictions as before
python scripts/predict.py
Emotion recognition is a two-step process:
- Detect faces in an image
- Classify each face into one of the emotions
In this project, I relied heavily on pre-trained models. For face detection, I relied on Haar Cascade Classifier, and for emotion classification, I relied on DAN pre-trained on AffectNet dataset. I used Haar Cascade because developing a face detection system was not the goal of this project and Haar Cascade provides a stable, easy-to-implement solution. I used DAN because it is pre-trained on an unaccessible to me dataset, and provides better performance than my own implementations trained on the fer2013 dataset.
Why PyTorch?
- most semi-publicly accessible emotion classification datasets are restricted to academics, which mainly use PyTorch. Therefore, I expected to find more good pre-trained models available in PyTorch than e.g. Tensorflow.Why config.yaml?
- in machine learning the data processing, transforming, and splitting have as big of an impact on the final result as the choice of model and training parameters. Since I use separate scripts for each of those steps, one would have to remember which arguments were used to run each of the scripts to be able to reproduce a specific result. A config.yaml file puts all arguments together and ensures easy tracking and reproducibility.Why logging?
- logging leaves an easy-to-track trace of what was happening during a run. We could print out the results to the command line but in case I want to run e.g. 20 models, the print outputs would quickly become messy. Logging is a clean solution to save all the information in separate files.Why pytest?
- testing is crucial for development. Pytest is easy to follow, easy to trace, and provides good error reporting.Why GitHub actions?
- this is a small project, therefore, quick set-up and simplicity of use is a big advantages. I believed it to be the right tool for the job.Why black?
- clarity and standardization make it easier for everyone to read code.
There are a few emotion recognition datasets out there. The three considered in the project were:
- fer2013 [publicly available] - a dataset of 35k 48x48 grayscale images annotated with one of 7 emotions. The dataset suffers from a large amount of mislabeled data and the low quality of input images (grayscale and small size).
- AffectNet [available to academics only] - a dataset of 440K RGB images annotated for one of 7 emotions along with the intensity of valence and arousal. State of the art. Currently unavailable to me.
- Real-world Affective Faces [available to academics only] - a dataset of 30k RGB images annotated for two of the most relevant of 7 emotions. Recently gained access to this dataset.
- Experiment with the Real-world Affective Faces dataset
- Replace Haar Cascade with e.g. YOLO
- Increase test coverage
- Deploy API online
- Any suggestions are welcome :)
- Structure of the project and key development tools: Geoffrey Hung's articles here and here
- DAN paper by Wen, Zhengyao and Lin, Wenzhong and Wang, Tao and Xu, Ge: https://arxiv.org/pdf/2109.07270.pdf
- Webinar on testing: https://www.youtube.com/watch?v=ytI4Xapvx1w
- Haar Classifier implemetation: https://www.youtube.com/watch?v=7IFhsbfby9s
- Readme design: https://github.com/ma-shamshiri/Human-Activity-Recognition#readme
- FastAPI tutorial: https://github.com/aniketmaurya/tensorflow-fastapi-starter-pack