Submitted:
10 May 2025
Posted:
12 May 2025
You are already at the latest version
Abstract
Keywords:
I. Introduction
II. Related Work
III. Method
IV. Experiment
A. Datasets
B. Experimental Results
V. Conclusion
References
- A. Hussain, W. Ullah, N. Khan, et al., “TDS-Net: Transformer enhanced dual-stream network for video anomaly detection”, Proceedings of the 2024 Expert Systems with Applications Conference, pp. 124846, 2024. [CrossRef]
- M. H. Habeb, M. Salama and L. A. Elrefaei, “Enhancing video anomaly detection using a transformer spatiotemporal attention unsupervised framework for large datasets”, Proceedings of the 2024 Algorithms Symposium, vol. 17, no. 7, pp. 286, 2024. [CrossRef]
- T. M. Tran, D. C. Bui, T. V. Nguyen, et al., “Transformer-based spatio-temporal unsupervised traffic anomaly detection in aerial videos”, Proceedings of the 2024 IEEE Conference on Circuits and Systems for Video Technology, 2024. [CrossRef]
- S. Paulraj and S. Vairavasundaram, “M2VAD: multiview multimodality transformer-based weakly supervised video anomaly detection”, Proceedings of the 2024 Image and Vision Computing Conference, vol. 148, pp. 105139, 2024.
- D. Wang and K. Wu, “Anomaly detection in surveillance videos using Transformer with margin learning”, Proceedings of the 2024 Multimedia Systems Conference, vol. 30, no. 5, pp. 250, 2024. [CrossRef]
- R. Hao, Y. Xiang, J. Du, Q. He, J. Hu and T. Xu, “A Hybrid CNN-Transformer Model for Heart Disease Prediction Using Life History Data”, Proceedings of the 2025 arXiv Machine Learning and Healthcare Applications Conference, arXiv:2503.02124, 2025.
- T. Xu, Y. Xiang, J. Du and H. Zhang, “Cross-Scale Attention and Multi-Layer Feature Fusion YOLOv8 for Skin Disease Target Detection in Medical Images”, Proceedings of the 2025 Journal of Computer Technology and Software Conference, vol. 4, no. 2, 2025.
- X. Li, Q. Lu, Y. Li, M. Li and Y. Qi, “Optimized Unet with Attention Mechanism for Multi-Scale Semantic Segmentation”, Proceedings of the 2025 arXiv Computer Vision and AI Conference, arXiv:2502.03813, 2025.
- W. Wang, Y. Li, X. Yan, M. Xiao and M. Gao, “Breast cancer image classification method based on deep transfer learning,” Proceedings of the International Conference on Image Processing, Machine Learning and Pattern Recognition, pp. 190-197, 2024.
- W. He, Y. Zhang, T. Xu, T. An, Y. Liang and B. Zhang, “Object Detection for Medical Image Analysis: Insights from the RT-DETR Model”, Proceedings of the 2025 arXiv Medical Imaging and Deep Learning Conference, arXiv:2501.16469, 2025.
- S. Duan, “Deep Learning-Based Gesture Key Point Detection for Human-Computer Interaction Applications”, Proceedings of the 2025 Transactions on Computational and Scientific Methods Conference, vol. 5, no. 1, 2025.
- F. Shao, T. Zhang, S. Gao, Q. Sun and L. Yang, “Computer Vision-Driven Gesture Recognition: Toward Natural and Intuitive Human-Computer Interaction”, Proceedings of the 2024 arXiv Computer Vision and HCI Conference, arXiv:2412.18321, 2024.
- S. Wang, R. Zhang, J. Du, R. Hao and J. Hu, “A Deep Learning Approach to Interface Color Quality Assessment in HCI”, Proceedings of the 2025 arXiv Human-Computer Interaction Conference, arXiv:2502.09914, 2025.
- X. Liao, B. Zhu, J. He, G. Liu, H. Zheng and J. Gao, “A Fine-Tuning Approach for T5 Using Knowledge Graphs to Address Complex Tasks”, Proceedings of the 2025 arXiv NLP and Knowledge Engineering Conference, arXiv:2502.16484, 2025.
- L. Wu, J. Gao, X. Liao, H. Zheng, J. Hu and R. Bao, “Adaptive Attention and Feature Embedding for Enhanced Entity Extraction Using an Improved BERT Model”, Proceedings of the 2025 AI for NLP Conference, 2025.
- J. Zhan, “Elastic Scheduling of Micro-Modules in Edge Computing Based on LSTM Prediction”, Proceedings of the 2025 Journal of Computer Technology and Software Conference, vol. 4, no. 2, 2025.
- X. Yan, Y. Jiang, W. Liu, D. Yi, and J. Wei, “Transforming Multidimensional Time Series into Interpretable Event Sequences for Advanced Data Mining”, arXiv preprint, arXiv:2409.14327, 2024.
- S. M. Rahimpour, M. Kazemi, P. Moallem, et al., “Video anomaly detection using transformers and ensemble of convolutional auto-encoders”, Proceedings of the 2024 Computers and Electrical Engineering Conference, vol. 120, pp. 109879, 2024. [CrossRef]
- C. Tao, C. Wang, S. Lin, et al., “Feature reconstruction with disruption for unsupervised video anomaly detection”, Proceedings of the 2024 IEEE Transactions on Multimedia Conference, 2024. [CrossRef]
- K. Biradar, D. K. Tyagi, R. B. Battula and J. Y. Jung, “Robust Anomaly Detection through Transformer-Encoded Feature Diversity Learning”, Proceedings of the 2024 Asian Conference on Computer Vision, pp. 115–128, 2024.
- H. Kim, C. H. Lee and C. Hong, “VATMAN: Video Anomaly Transformer for Monitoring Accidents and Nefariousness”, Proceedings of the 2024 IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–7, 2024.




| Model | AUC | EER |
|---|---|---|
| Conv2D-AE | 72.4 | 31.2 |
| STAE | 74.8 | 29.5 |
| ConvLSTM-AE | 76.1 | 28.3 |
| TSC | 75.6 | 28.8 |
| MemAE | 78.2 | 26.7 |
| MNAD-AE | 79.1 | 25.6 |
| Stack-RNN | 77.5 | 27.2 |
| Ours | 80.5 | 24.5 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).