Submitted:
25 July 2024
Posted:
26 July 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Related Work
2.1. Human Body Posture Estimation Recognition
2.2. Recognition Based on Human-Object Interactions
- SRGAN (Image Super Resolution Generative Adversarial Network) is used to generate the original image with high resolution, enrich the degree of detail of small targets, enhance the spatial features of human-object interaction relationship, and improve the accuracy of recognition;
- add variable kernel convolution AKConv in the Backbone module of target detection algorithm YOLOv8s, through the variable kernel convolution to adjust the initial rule pattern of adopting the network, according to the actual needs of adjusting the shape and size of the samples, so as to enable the network to adapt to different datasets and detect more targets;
- In the SPPF of the Backbone module of YOLOv8s, the integration of the LASK attention mechanism expands the receptive field and acquires wider contextual information, which significantly improves the feature aggregation capability of the SPPF module at multiple scales. It makes the network more focused on target-related features, which in turn improves the detection accuracy;
- In order to validate the effectiveness of the proposed network, the study evaluates the accuracy and processing speed of the algorithmic model by utilizing the publicly available dataset (SCB-datasets) and the self-constructed classroom video dataset of our university. The results show that the network exhibits higher recognition accuracy with faster processing speed under the challenges of unclear original images, multiple targets, overlapping characters and complex interactions;
3. Recognition Network Model Used in this Study
3.1. System Architecture
3.2. Super-Resolution Generative Adversarial Networks
3.3. Improved YOLOv8s Network
3.3.1. YOLOv8s Network
3.3.2. YOLOv8s Network with Variable Kernel Convolution
3.3.3. A Spatial Pyramid Pooling with Attention (SPPF_LSKA)
3.3.4. Improved Feature Fusion Section
4. Experimental Results
4.1. Data Set
4.2. Experimental Environment and Configuration
4.3. Indicators for Evaluation
4.4. Experimental Results and Analysis
4.4.1. Variable Kernel Convolution Based Ablation Experiments
4.4.2. Ablation Experiments Based on LSKA Localization
Bulleted
4.4.3. Improved Feature Fusion Based Partial Ablation Experiments
| Attention | Precision(%) | GPU_mem(G) | Params(M) | FLOPs(B) |
|---|---|---|---|---|
| YOLOv8s(baseline) | 85.65 | 4.1 | 11.2 | 28.6 |
| GAM | 89.45 | 4.9 | 19.8 | 32.8 |
| ECA | 91.3 | 5.4 | 25.2 | 56.9 |
| SENet | 84.31 | 4.5 | 15.2 | 30.2 |
| ShuffleAttention | 88.90 | 4.7 | 17.8 | 34.5 |
| CBAM | 90.12 | 4.6 | 17.89 | 31.54 |
lists look
4.4.4. Comparative Experiments with Different Models
like
4.4.5. Results and Analysis of this Experiment
5. Discussion
6. Conclusion
Author Contributions
Funding
References
- WU, S. Simulation of classroom student behavior recognition based on PSO-kNN algorithm and emotional image processing [J]. Journal of Intelligent & Fuzzy Systems 2021, 40, 7273–7283. [Google Scholar]
- ZEJIE W, CHAOMIN S, CHUN Z, et al. Recognition of classroom learning behaviors based on the fusion of human pose estimation and object detection [J]. Journal of East China Normal University (Natural Science) 2022, 2022, 55. [Google Scholar]
- CHEN G, JI J, HUANG C. Student classroom behavior recognition based on openpose and deep learning; proceedings of the 2022 7th International Conference on Intelligent Computing and Signal Processing (ICSP), F, 2022 [C]. IEEE.
- FU R, WU T, LUO Z, et al. Learning behavior analysis in classroom based on deep learning [J]. 2019 Tenth International Conference on Intelligent Control and Information Processing (ICICIP), 2019: 206-12.
- KOLESNIKOV A, KUZNETSOVA A, LAMPERT C, et al. Detecting visual relationships using box attention; proceedings of the Proceedings of the IEEE/CVF international conference on computer vision workshops, F, 2019 [C].
- ULUTAN O, IFTEKHAR A, MANJUNATH B S. Vsgnet: Spatial attention network for detecting human object interactions using graph convolutions; proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, F, 2020 [C].
- WANG Z, YAO J, ZENG C, et al. Yolov5 enhanced learning behavior recognition and analysis in smart classroom with multiple students; proceedings of the 2022 International Conference on Intelligent Education and Intelligent Research (IEIR), F, 2022 [C]. IEEE.
- WANG T, YANG T, DANELLJAN M, et al. Learning human-object interaction detection using interaction points; proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, F, 2020 [C].
- LIU Q, JIANG R, XU Q, et al. YOLOv8n_BT: Research on Classroom Learning Behavior Recognition Algorithm Based on Improved YOLOv8n [J]. IEEE Access, 2024.
- LIU B, CHEN J. A super resolution algorithm based on attention mechanism and srgan network [J]. IEEE Access 2021, 9, 139138–45. [Google Scholar] [CrossRef]
- LUO Z, WANG C, QI Z, et al. LA_YOLOv8s: A lightweight-attention YOLOv8s for oil leakage detection in power transformers [J]. Alexandria Engineering Journal 2024, 92, 82–91. [Google Scholar] [CrossRef]
- JOOSHIN H K, NANGIR M, SEYEDARABI H. Inception-YOLO: Computational cost and accuracy improvement of the YOLOv5 model based on employing modified CSP, SPPF, and inception modules [J]. IET Image Processing 2024, 18, 1985–1999. [Google Scholar] [CrossRef]
- ZHANG X, SONG Y, SONG T, et al. AKConv: Convolutional Kernel with Arbitrary Sampled Shapes and Arbitrary Number of Parameters. arXiv 2023 [J]. arXiv preprint arXiv:231111587.
- LAU K W, PO L-M, REHMAN Y A U. Large separable kernel attention: Rethinking the large kernel attention design in cnn [J]. Expert Systems with Applications 2024, 236, 121352. [Google Scholar] [CrossRef]
- CHEVTCHENKO S F, VALE R F, MACARIO V, et al. A convolutional neural network with feature fusion for real-time hand posture recognition [J]. Applied Soft Computing 2018, 73, 748–766. [Google Scholar] [CrossRef]
- YANG K, ZHANG Y, ZHANG X, et al. YOLOX with CBAM for insulator detection in transmission lines [J]. Multimedia Tools and Applications 2024, 83, 43419–43437. [Google Scholar]
- JIA L, WANG Y, ZANG Y, et al. MobileNetV3 with CBAM for bamboo stick counting [J]. IEEE Access 2022, 10, 53963–53971. [Google Scholar] [CrossRef]
- SHENG W, YU X, LIN J, et al. Faster rcnn target detection algorithm integrating cbam and fpn [J]. Applied Sciences 2023, 13, 6913. [Google Scholar]
- FU H, SONG G, WANG Y. Improved YOLOv4 marine target detection combined with CBAM [J]. Symmetry 2021, 13, 623. [Google Scholar]
- PISCHEDDA V, RADESCU S, DUBOIS M, et al. Experimental and DFT high pressure study of fluorinated graphite (C2F) n [J]. Carbon 2017, 114, 690–699. [Google Scholar]
- CHEN Y, ZHANG X, CHEN W, et al. Research on recognition of fly species based on improved RetinaNet and CBAM [J]. IEEE Access 2020, 8, 102907–19. [Google Scholar]
- SUN B, WU Y, ZHAO K, et al. Student Class Behavior Dataset: a video dataset for recognizing, detecting, and captioning students’ behaviors in classroom scenes [J]. Neural Computing and Applications 2021, 33, 8335–8354. [Google Scholar]
- WANG Z, YAO J, ZENG C, et al. Learning behavior recognition in smart classroom with multiple students based on YOLOv5 [J]. arXiv preprint arXiv:230310916, 2023.
- LIN J, LI J, CHEN J. An analysis of English classroom behavior by intelligent image recognition in IoT [J]. International Journal of System Assurance Engineering and Management 2022, 13 (Suppl 3), 1063–1071. [Google Scholar]
- ZAMRI F N M, GUNAWAN T S, YUSOFF S H, et al. Enhanced Small Drone Detection using Optimized YOLOv8 with Attention Mechanisms [J]. IEEE Access, 2024.
- JI X, NIU Y. A Lightweight Network for Human Pose Estimation Based on ECA Attention Mechanism [J]. Electronics 2023, 13, 150. [Google Scholar]
- JIA Z, WANG K, LI Y, et al. High precision feature fast extraction strategy for aircraft attitude sensor fault based on RepVGG and SENet attention mechanism [J]. Sensors 2022, 22, 9662. [Google Scholar]
- LIU P, WANG Q, ZHANG H, et al. A lightweight object detection algorithm for remote sensing images based on attention mechanism and YOLOv5s [J]. Remote Sensing 2023, 15, 2429. [Google Scholar] [CrossRef]
- LEE H, EUM S, KWON H. Me r-cnn: Multi-expert r-cnn for object detection [J]. IEEE Transactions on Image Processing 2019, 29, 1030–1044. [Google Scholar]
- SAIKI Y, KABATA T, OJIMA T, et al. Reliability and validity of OpenPose for measuring hip-knee-ankle angle in patients with knee osteoarthritis [J]. Scientific Reports 2023, 13, 3297. [Google Scholar]
- LI L, LIU M, SUN L, et al. ET-YOLOv5s: toward deep identification of students’ in-class behaviors [J]. IEEE Access 2022, 10, 44200–44211. [Google Scholar] [CrossRef]
- YANG, F. Student Classroom Behavior Detection based on Improved YOLOv7 [J]. arXiv preprint arXiv:230603318, 2023.














| Name | Parameter |
|---|---|
| CPU GPU |
11th Gen Intel(R) Core(TM) i5-11400F @ 2.60GHz 2.59 GHz NVIDIA GeForce RTX 3050 |
| Memory Operating System |
16G Windows11 |
| PyCharm Python |
2020.1 x64 3.9.16 |
| Frameworks CUDA |
Pytorch 1.12.1+cu113 Version: 12.0 |
| metrics | description |
|---|---|
| Precision | How many of the predicted positive samples are correct |
| Recall | How many of the truly positive samples were correctly predicted as positive |
| F1-Score | A metric that takes into account both precision and recall |
| IOU | Assessing the overlap between predicted and actual bounding boxes |
| mAP | mAP, or mean Average Precision, is the average of AP values calculated under multiple IOU thresholds |
| AKConv position | Precision(%) | Recall(%) | F1(%) | mAP50(%) | mAP50-90(%) |
|---|---|---|---|---|---|
| YOLOv8s | 85.65 | 84.78 | 85.35 | 87.74 | 67.55 |
| Backbone | 88.78 | 87.42 | 87.5 | 89.86 | 70.56 |
| SPPF | 86.11 | 86.31 | 87.14 | 89.52 | 67.67 |
| Neck | 85.68 | 85.16 | 87.06 | 89.24 | 69.17 |
| Backbone+Neck | 84.45 | 85.21 | 86.12 | 88.12 | 66.78 |
| Backbone+SPPF | 85.98 | 84.88. | 86.12 | 88.45 | 68.12 |
| LSKA position | Precision(%) | Recall(%) | F1(%) | mAP50(%) | mAP50-90(%) |
|---|---|---|---|---|---|
| YOLOv8s(baseline) | 85.65 | 84.78 | 85.35 | 87.74 | 67.55 |
| Conv_LSKA | 86.25 | 85.21 | 86.35 | 87.98 | 68.12 |
| MaxPool2d_LSKA | 89.46 | 88.65 | 87.43 | 90.15 | 69.97 |
| Concat_LSKA | 87.21 | 86.98 | 87.25 | 88.25 | 68.14 |
| classroom behavior | Our YOLOv8s | Faster R-CNN | OpenPose | YOLOv5 | YOLOv7 |
|---|---|---|---|---|---|
| Hand-raising | 0.995 | 0.842 | 0.836 | 0.884 | 0.841 |
| Reading | 0.920 | 0.839 | 0.819 | 0.888 | 0.802 |
| Writing | 0.925 | 0.814 | 0.789 | 0.892 | 0.782 |
| Using phone | 0.962 | 0.959 | 0.958 | 0.961 | 0.966 |
| Bowing the head | 0.894 | 0.901 | 0.888 | 0.842 | 0.905 |
| Learning over the table | 0.991 | 0.984 | 0.983 | 0.980 | 0.988 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).