Version 1
: Received: 30 May 2023 / Approved: 1 June 2023 / Online: 1 June 2023 (10:58:46 CEST)
Version 2
: Received: 8 June 2023 / Approved: 8 June 2023 / Online: 8 June 2023 (15:06:19 CEST)
He, E.; Chen, Q.; Zhong, Q. SL-Swin: A Transformer-Based Deep Learning Approach for Macro- and Micro-Expression Spotting on Small-Size Expression Datasets. Electronics2023, 12, 2656.
He, E.; Chen, Q.; Zhong, Q. SL-Swin: A Transformer-Based Deep Learning Approach for Macro- and Micro-Expression Spotting on Small-Size Expression Datasets. Electronics 2023, 12, 2656.
He, E.; Chen, Q.; Zhong, Q. SL-Swin: A Transformer-Based Deep Learning Approach for Macro- and Micro-Expression Spotting on Small-Size Expression Datasets. Electronics2023, 12, 2656.
He, E.; Chen, Q.; Zhong, Q. SL-Swin: A Transformer-Based Deep Learning Approach for Macro- and Micro-Expression Spotting on Small-Size Expression Datasets. Electronics 2023, 12, 2656.
Abstract
In recent years, the analysis of macro- and micro-expression has drawn the attention of researchers since they provide visual cues of an individual's emotions for a broad range of potential applications such as lie detection and criminal detection. In this paper, we address the challenge of spotting facial macro- and micro-expression from videos and present compelling results by using a deep learning approach to analyze the optical flow features. Different from other deep learning approaches that are mainly based on Convolutional Neural Networks (CNNs), we propose a Transformer-based deep learning approach that predicts a score indicating the probability of a frame being within an expression interval. In contrast to other Transformer-based models that achieve high performance by being pre-trained on large datasets, our deep learning model, called SL-Swin, which incorporates Shifted Patch Tokenization and Locality Self-Attention into the backbone network Swin Transformer, effectively spots macro- and micro-expressions by being trained from scratch on small-size expression datasets. Our evaluation outcomes surpass the MEGC 2022 spotting baseline result, obtaining an overall F1-score of 0.1366. Additionally, our approach also performs well in the MEGC 2021 spotting task with an overall F1-score of 0.1824 and 0.1357 on CAS(ME)^2 and SAMM Long Videos, respectively. The code is publicly available on GitHub (https://github.com/eddiehe99/pytorch-expression-spotting).
Keywords
Macro- and Micro-Expression Spotting; Image Processing; Computer Vision; Artificial Intelligence; Deep Learning; Swin Transformer; Shifted Patch Tokenization; Locality Self-Attention
Subject
Computer Science and Mathematics, Computer Vision and Graphics
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Received:
8 June 2023
Commenter:
Erheng He
Commenter's Conflict of Interests:
Author
Comment: 1- In the introduction section, we have added demonstrations about Tranformers and their prior usages in expression spotting. Though the amount of work employing Transfomers for expression spotting is limited, we briefly indicate three prior usages. Please kindly find the last paragraph on page 2.
2- We have redrawn Figure 1 in which the text has been enlarged. Therefore, please kindly see Figure 1 on page 3.
3- We have redrawn both Figure 2 and 3 in which the text has been aligned. Please kindly see Figure 2 on page 5 and Figure 3 on page 8.
4- In section 3.2 (Performance Metrics), we have added description text and equations of other evaluation metrics for Macro-Expression (MaE) and Micro-Expression (ME) spotting experiments, including Precision, Recall, and F1-score.
5- We have deleted the extra word "(F1-score))" in the title of Table 5 of section 4.3.2. (Labeling) on page 14.
6- We have added the achieved results of our approach in the conclusion section (page 15).
7- We have uploaded our code to the given GitHub repository (https://github.com/eddiehe99/pytorch-expression-spotting).
8- In section 3.1.1. (MEGC 2022 Datasets), we have added more insights into both CAS(ME)^3 and SAMM Challenge datasets, including composition, diversity, size, etc.
9- In section 3.1.2. (MEGC 2021 Datasets), we have added more insights into both CAS(ME)^2 and SAMM Long Videos datasets, including composition, diversity, size, etc.
10- Based on the received review report, we have also implemented minor editing of the English language mainly in the Abstract (page 1), Introduction (page 3), and Conclusion (page 15) sections.
Commenter: Erheng He
Commenter's Conflict of Interests: Author
2- We have redrawn Figure 1 in which the text has been enlarged. Therefore, please kindly see Figure 1 on page 3.
3- We have redrawn both Figure 2 and 3 in which the text has been aligned. Please kindly see Figure 2 on page 5 and Figure 3 on page 8.
4- In section 3.2 (Performance Metrics), we have added description text and equations of other evaluation metrics for Macro-Expression (MaE) and Micro-Expression (ME) spotting experiments, including Precision, Recall, and F1-score.
5- We have deleted the extra word "(F1-score))" in the title of Table 5 of section 4.3.2. (Labeling) on page 14.
6- We have added the achieved results of our approach in the conclusion section (page 15).
7- We have uploaded our code to the given GitHub repository (https://github.com/eddiehe99/pytorch-expression-spotting).
8- In section 3.1.1. (MEGC 2022 Datasets), we have added more insights into both CAS(ME)^3 and SAMM Challenge datasets, including composition, diversity, size, etc.
9- In section 3.1.2. (MEGC 2021 Datasets), we have added more insights into both CAS(ME)^2 and SAMM Long Videos datasets, including composition, diversity, size, etc.
10- Based on the received review report, we have also implemented minor editing of the English language mainly in the Abstract (page 1), Introduction (page 3), and Conclusion (page 15) sections.
Commenter:
The commenter has declared there is no conflict of interests.