Submitted:
22 March 2024
Posted:
27 March 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Methodology
2.1. Step 1: Feature Extraction
2.2. Step 2: Modal Feature Fusion
3. Experiments
3.1. Implementation details
3.2. Evaluation Metrics
3.3. Results
4. Conclusions
References
- Muhammad, G.; Alsulaiman, M.; Amin, S.U.; Ghoneim, A.; Alhamid, M.F. A facial-expression monitoring system for improved healthcare in smart cities. IEEE Access 2017, 5, 10871–10881. [Google Scholar] [CrossRef]
- Davoudi, A.; Malhotra, K.R.; Shickel, B.; Siegel, S.; Williams, S.; Ruppert, M.; Bihorac, E.; Ozrazgat-Baslanti, T.; Tighe, P.J.; Bihorac, A.; others. Intelligent ICU for autonomous patient monitoring using pervasive sensing and deep learning. Scientific reports 2019, 9, 8020. [Google Scholar] [CrossRef] [PubMed]
- Rouast, P.V.; Adam, M.T.; Chiong, R. Deep learning for human affect recognition: Insights and new developments. IEEE Transactions on Affective Computing 2019, 12, 524–543. [Google Scholar] [CrossRef]
- Suma, V. Computer vision for human-machine interaction-review. Journal of trends in Computer Science and Smart technology (TCSST) 2019, 1, 131–139. [Google Scholar]
- Hu, K.; Jin, J.; Zheng, F.; Weng, L.; Ding, Y. Overview of behavior recognition based on deep learning. Artificial intelligence review 2023, 56, 1833–1865. [Google Scholar] [CrossRef]
- Chowdary, M.K.; Nguyen, T.N.; Hemanth, D.J. Deep learning-based facial emotion recognition for human–computer interaction applications. Neural Computing and Applications 2023, 35, 23311–23328. [Google Scholar] [CrossRef]
- Hema, C.; Marquez, F.P.G. Emotional speech recognition using cnn and deep learning techniques. Applied Acoustics 2023, 211, 109492. [Google Scholar] [CrossRef]
- Kollias, D.; Tzirakis, P.; Cowen, A.; Zafeiriou, S.; Shao, C.; Hu, G. The 6th Affective Behavior Analysis in-the-wild (ABAW) Competition. arXiv preprint arXiv:2402.19344 2024. [Google Scholar]
- Zafeiriou, S.; Kollias, D.; Nicolaou, M.A.; Papaioannou, A.; Zhao, G.; Kotsia, I. Aff-wild: Valence and arousal ‘in-the-wild’challenge. Computer Vision and Pattern Recognition Workshops (CVPRW), 2017 IEEE Conference on. IEEE, 2017, pp. 1980–1987.
- Kollias, D.; Sharmanska, V.; Zafeiriou, S. Face Behavior a la carte: Expressions, Affect and Action Units in a Single Network. arXiv preprint arXiv:1910.11111 2019. [Google Scholar]
- Kollias, D.; Tzirakis, P.; Nicolaou, M.A.; Papaioannou, A.; Zhao, G.; Schuller, B.; Kotsia, I.; Zafeiriou, S. Deep affect prediction in-the-wild: Aff-wild database and challenge, deep architectures, and beyond. International Journal of Computer Vision 2019, pp. 1–23.
- Kollias, D.; Zafeiriou, S. Expression, Affect, Action Unit Recognition: Aff-Wild2, Multi-Task Learning and ArcFace. arXiv preprint arXiv:1910.04855 2019. [Google Scholar]
- Kollias, D.; Schulc, A.; Hajiyev, E.; Zafeiriou, S. Analysing Affective Behavior in the First ABAW 2020 Competition. 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020)(FG), pp. 794–800.
- Kollias, D.; Sharmanska, V.; Zafeiriou, S. Distribution Matching for Heterogeneous Multi-Task Learning: a Large-scale Face Study. arXiv preprint arXiv:2105.03790 2021. [Google Scholar]
- Kollias, D.; Zafeiriou, S. Affect Analysis in-the-wild: Valence-Arousal, Expressions, Action Units and a Unified Framework. arXiv preprint arXiv:2103.15792 2021. [Google Scholar]
- Kollias, D.; Zafeiriou, S. Analysing affective behavior in the second abaw2 competition. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 3652–3660.
- Kollias, D. Abaw: Valence-arousal estimation, expression recognition, action unit detection & multi-task learning challenges. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 2328–2336.
- Kollias, D. ABAW: learning from synthetic data & multi-task learning challenges. European Conference on Computer Vision. Springer, 2023, pp. 157–172.
- Kollias, D. Multi-Label Compound Expression Recognition: C-EXPR Database & Network. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 5589–5598.
- Kollias, D.; Tzirakis, P.; Baird, A.; Cowen, A.; Zafeiriou, S. Abaw: Valence-arousal estimation, expression recognition, action unit detection & emotional reaction intensity estimation challenges. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 5888–5897.
- Xie, Z.; Zhang, Z.; Cao, Y.; Lin, Y.; Bao, J.; Yao, Z.; Dai, Q.; Hu, H. Simmim: A simple framework for masked image modeling. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 9653–9663.
- Schneider, S.; Baevski, A.; Collobert, R.; Auli, M. wav2vec: Unsupervised pre-training for speech recognition. arXiv preprint arXiv:1904.05862 2019. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 10012–10022.
- Zhang, W.; Ma, B.; Qiu, F.; Ding, Y. Multi-modal facial affective analysis based on masked autoencoder. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 5792–5801.
- Mollahosseini, A.; Hasani, B.; Mahoor, M.H. Affectnet: A database for facial expression, valence, and arousal computing in the wild. IEEE Transactions on Affective Computing 2017, 10, 18–31. [Google Scholar] [CrossRef]
- Yi, D.; Lei, Z.; Liao, S.; Li, S.Z. Learning face representation from scratch. arXiv preprint arXiv:1411.7923 2014. [Google Scholar]
- Liu, Z.; Luo, P.; Wang, X.; Tang, X. Deep learning face attributes in the wild. Proceedings of the IEEE international conference on computer vision, 2015, pp. 3730–3738.
- Rothe, R.; Timofte, R.; Van Gool, L. Deep expectation of real and apparent age from a single image without facial landmarks. International Journal of Computer Vision 2018, 126, 144–157. [Google Scholar] [CrossRef]

| EXPR Task | F1-Score |
|---|---|
| Neutral | 0.6063 |
| Anger | 1.0 |
| Disgust | 1.0 |
| Fear | 1.0 |
| Happiness | 0.2857 |
| Sadness | 0.0178 |
| Surprise | 0.0028 |
| Other | 0.4238 |
| Average | 0.5420 |
| AU Task | F1-Score |
|---|---|
| AU1 | 0.5909 |
| AU2 | 0.6365 |
| AU4 | 0.6285 |
| AU6 | 0.7126 |
| AU7 | 0.6841 |
| AU10 | 0.6930 |
| AU12 | 0.7017 |
| AU15 | 1.0 |
| AU23 | 1.0 |
| AU24 | 0.4886 |
| AU25 | 0.8148 |
| AU26 | 0.5423 |
| Average | 0.7077 |
| AU Task | CCC |
|---|---|
| Valence | 0.4328 |
| Arousal | 0.5906 |
| Average | 0.5117 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).