Submitted:
21 August 2023
Posted:
21 August 2023
You are already at the latest version
Abstract
Keywords:
1. Introduction
- A 3D skeletal dataset comprising 16 unit actions in Taekwondo poomsae was constructed using motion data collected using full-body motion-capture suits.
- This study proposes methods for generating 2D skeletons by projecting 3D skeletons from diverse viewpoints and synthetic joint and bone heatmaps to incorporate viewpoint-dependent action characteristics into the training dataset. This ensured consistent and reliable performance, regardless of the viewpoint.
- The optimal camera viewpoint for action recognition of Taekwondo poomsae was determined via the analysis and evaluation of recognition performance.
2. Materials and Methods
2.1. Data Collection
2.2. Three-dimensional (3D) Convolutional Neural Network (CNN)-based Viewpoint-agnostic Action Recognition
2.2.1. Generation of Diverse Viewpoint Two-dimensional (2D) Skeletons from 3D Skeleton
2.2.2. Generation of Synthetic Heatmap Image from 2D Skeleton
2.2.4. Training Procedure
2.3. Evaluation Metrics
3. Results
3.1. Performance Evaluation using Synthetic 2D Skeleton Datasets
3.2. Performance Evaluation using 2D Skeletons Extracted from Front- and Side-view RGB Images
3.3. Performance Evaluation using 2D Skeletons Extracted from Random View RGB Images
3.4. Performance Comparison with Previous Models
4. Discussion
5. Conclusions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Choi, C.-H.; Joo, H.-J. Motion recognition technology based remote Taekwondo Poomsae evaluation system. Multimed. Tools Appl. 2016, 75, 13135–13148. [Google Scholar] [CrossRef]
- Lee, J.; Jung, H. TUHAD: Taekwondo Unit Technique Human Action Dataset with Key Frame-Based CNN Action Recognition. Sensors 2020, 20, 4871. [Google Scholar] [CrossRef] [PubMed]
- Andó, B.; Baglio, S.; Lombardo, C.O.; Marletta, V. An Event Polarized Paradigm for ADL Detection in AAL Context. IEEE Trans. Instrum. Meas. 2015, 64, 1814–1825. [Google Scholar] [CrossRef]
- Hsieh, J.; Chuang, C.; Alghyaline, S.; Chiang, H.; Chiang, C. Abnormal Scene Change Detection From a Moving Camera Using Bags of Patches and Spider-Web Map. IEEE Sens. J. 2015, 15, 2866–2881. [Google Scholar] [CrossRef]
- Cosar, S.; Donatiello, G.; Bogorny, V.; Garate, C.; Alvares, L.O.; Brémond, F. Toward Abnormal Trajectory and Event Detection in Video Surveillance. IEEE Trans. Circuits Syst. Video Technol. 2017, 27, 683–695. [Google Scholar] [CrossRef]
- Ismail, S.J.; Rahman, M.A.A.; Mazlan, S.A.; Zamzuri, H. Human gesture recognition using a low cost stereo vision in rehab activities. In Proceedings of the 2015 IEEE International Symposium on Robotics and Intelligent Sensors (IRIS); 2015; pp. 220–225. [Google Scholar]
- Rafferty, J.; Nugent, C.D.; Liu, J.; Chen, L. From Activity Recognition to Intention Recognition for Assisted Living Within Smart Homes. IEEE Trans. Hum. Mach. Syst. 2017, 47, 368–379. [Google Scholar] [CrossRef]
- Zolfaghari, S.; Keyvanpour, M.R. SARF: Smart activity recognition framework in Ambient Assisted Living. In Proceedings of the 2016 Federated Conference on IEEE Computer Science and Information Systems (FedCSIS), Gdansk, Poland, 11–14 September 2016; pp. 1435–1443. [Google Scholar]
- Zhang, L.; Hsieh, J.-C.; Ting, T.-T.; Huang, Y.-C.; Ho, Y.-C.; Ku, L.-K. A Kinect based golf swing score and grade system using GMM and SVM. In Proceedings of the 5th International Congress on Image and Signal Processing (CISP 2012), Chongqing, China, 16–18 October 2012; pp. 711–715. [Google Scholar]
- Zhu, G.; Xu, C.; Huang, Q.; Gao, W.; Xing, L. Player action recognition in broadcast tennis video with applications to semantic analysis of sports game. In Proceedings of the 14th ACM International Conference on Multimedia, Santa Barbara, CA, USA, 23–27 October 2006; pp. 431–440. [Google Scholar]
- Martin, P.-E.; Benois-Pineau, J.; Péteri, R.; Morlier, J. Sport Action Recognition with Siamese Spatio-Temporal Cnns: Application to Table Tennis. In Proceedings of the 2018 International Conference on Content-Based Multimedia Indexing (CBMI), La Rochelle, France, 4–6 September 2018. [Google Scholar]
- Wang, S. A Deep Learning Algorithm for Special Action Recognition of Football. Mobile Information Systems 2022, 2022. [Google Scholar] [CrossRef]
- Leo, M. , D’Orazio, T., Spagnolo, P., Mazzeo, P.L., Distante, A. (2009). Multi-view Player Action Recognition in Soccer Games. In: Gagalowicz, A., Philips, W. (eds) Computer Vision/Computer Graphics CollaborationTechniques. MIRAGE 2009. Lecture Notes in Computer Science, vol 5496. Springer, Berlin, Heidelberg. [CrossRef]
- Lin, C.-H.; Tsai, M.-Y.; Chou, P.-Y. A Lightweight Fine-Grained Action Recognition Network for Basketball Foul Detection. In Proceedings of the 2021 IEEE International Conference on Consumer Electronics-Taiwan (ICCE-TW); 2021; pp. 1–2. [Google Scholar]
- Ji, R. Research on Basketball Shooting Action Based on Image Feature Extraction and Machine Learning. IEEE Access 2020, 8, 138743–138751. [Google Scholar] [CrossRef]
- Mora, S.V.; Knottenbelt, W.J. Deep Learning for Domain-Specific Action Recognition in Tennis. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 21–26 July 2017; pp. 170–178. [Google Scholar]
- Rahmad, N.; As’ari, M. The new Convolutional Neural Network (CNN) local feature extractor for automated badminton action recognition on vision based data. J. Phys. Conf. Ser. 2020, 1529, 022021. [Google Scholar] [CrossRef]
- Rahmad, N.; As’ari, M.; Soeed, K.; Zulkapri, I. Automated badminton smash recognition using convolutional neural network on the vision based data. In Proceedings of the IOP Conference Series: Materials Science and Engineering; IOP Publishing: Putrajaya, Malaysia, 2020; Volume 884, p. 012009. [Google Scholar]
- Ijjina, E.P.; Chalavadi, K.M. Human action recognition in RGB-D videos using motion sequence information and deep learning. Pattern Recognit. 2017, 72, 504–516. [Google Scholar] [CrossRef]
- Wang, P.; Wang, S.; Gao, Z.; Hou, Y.; Li, W. Structured Images for RGB-D Action Recognition. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV 2017), Venice, Italy, 22–29 October 2017; pp. 1005–1014. [Google Scholar]
- Trivedi, N.; Kiran, R.S. PSUMNet: Unified Modality Part Streams are All You Need for Efficient Pose-based Action Recognition. arXiv 2022, arXiv:2208.05775. [Google Scholar]
- Duan, H.; Zhao, Y.; Chen, K.; Lin, D.; Dai, B. Revisiting skeleton-based action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 2969–2978. [Google Scholar]
- Xia, H.; Gao, X. Multi-Scale Mixed Dense Graph Convolution Network for Skeleton-Based Action Recognition. IEEE Access 2021, 9, 36475–36484. [Google Scholar] [CrossRef]
- Gupta, P.; Thatipelli, A.; Aggarwal, A.; Maheshwari, S.; Trivedi, N.; Das, S.; Sarvadevabhatla, R.K. Quo vadis, skeleton action recognition? Int. J. Comput. Vis. 2021, 129, 2097–2112. [Google Scholar] [CrossRef]
- Song, Y.F.; Zhang, Z.; Shan, C.; Wang, L. Constructing stronger and faster baselines for skeleton-based action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 1474–1488. [Google Scholar] [CrossRef] [PubMed]
- Chen, Y.; Zhang, Z.; Yuan, C.; Li, B.; Deng, Y.; Hu, W. Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition. arXiv 2021, arXiv:2107.12213. [Google Scholar]
- Wang, M.; Ni, B.; Yang, X. Learning Multi-View Interactional Skeleton Graph for Action Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 45, 6940–6954. [Google Scholar] [CrossRef] [PubMed]
- Du, Y.; Wang, W.; Wang, L. Hierarchical recurrent neural network for skeleton based action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 8–10 June 2015; pp. 1110–1118. [Google Scholar]
- Yan, S.; Xiong, Y.; Lin, D. Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition. arXiv 2018, arXiv:1801.07455. [Google Scholar] [CrossRef]
- Song, S.; Lan, C.; Xing, J.; Zeng, W.; Liu, J. An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Volume 31. [Google Scholar]
- Du, Y.; Wang, W.; Wang, L. Hierarchical recurrent neural network for skeleton based action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 8–10 June 2015; pp. 1110–1118. [Google Scholar]
- Yan, S.; Xiong, Y.; Lin, D. Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition. arXiv 2018, arXiv:1801.07455. [Google Scholar] [CrossRef]
- Roetenberg, D.; Luinge, H.; Slycke, P. Xsens MVN: Full 6DOF human motion tracking using miniature inertial sensors. Xsens Motion Technologies BV. Tech. Rep. 2009, 1. [Google Scholar]
- Feichtenhofer, C.; Fan, H.; Malik, J.; He, K. Slowfast networks for video recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 6202–6211. [Google Scholar]
- Duan, H.; Wang, J.; Chen, K.; Lin, D. Pyskl: Towards good practices for skeleton action recognition. In Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Portugal, 10–14 October 2022; pp. 7351–7354. [Google Scholar]
- Shi, L.; Zhang, Y.; Cheng, J.; Lu, H. Skeleton-based action recognition with multi-stream adaptive graph convolutional networks. IEEE Trans. Image Process. 2020, 29, 9532–9545. [Google Scholar] [CrossRef] [PubMed]







| ID | Viewpoint configuration | Number of viewpoints | Number of training data |
|---|---|---|---|
| Model A | 2 | 12,360 | |
| Model B | 4 | 24,720 | |
| Model C | 8 | 49,440 | |
| Model D | 36 | 222,480 |
| Model | Precision | Recall | F1-score | Accuracy |
|---|---|---|---|---|
| Model A | 0.8611 | 0.8398 | 0.8373 | 0.7997 |
| Model B | 0.9680 | 0.9669 | 0.9670 | 0.9669 |
| Model C | 0.9769 | 0.9764 | 0.9765 | 0.9764 |
| Model D | 0.9803 | 0.9802 | 0.9802 | 0.9802 |
| Model | Precision | Recall | F1-score | Accuracy |
|---|---|---|---|---|
| Model A | 0.8611 | 0.8398 | 0.8373 | 0.7997 |
| Model B | 0.9686 | 0.9682 | 0.9682 | 0.9682 |
| Model C | 0.9769 | 0.9764 | 0.9765 | 0.9764 |
| Model D | 0.9786 | 0.9783 | 0.9784 | 0.9783 |
| Model | Precision | Recall | F1-score | Accuracy |
|---|---|---|---|---|
| Model A | 0.7638 | 0.7638 | 0.5854 | 0.5795 |
| Model B | 0.8761 | 0.8533 | 0.8500 | 0.8516 |
| Model C | 0.8705 | 0.7647 | 0.7763 | 0.7626 |
| Model D | 0.8998 | 0.8717 | 0.8706 | 0.8705 |
| Model | Precision | Recall | F1-score | Accuracy |
|---|---|---|---|---|
| Model A | 0.7453 | 0.6559 | 0.6549 | 0.6549 |
| Model B | 0.8750 | 0.8303 | 0.8300 | 0.8294 |
| Model C | 0.8903 | 0.8766 | 0.8752 | 0.8761 |
| Model D | 0.8891 | 0.8743 | 0.8717 | 0.8732 |
| Model | Precision | Recall | F1-score | Accuracy |
|---|---|---|---|---|
| Model A | 0.8298 | 0.8185 | 0.7944 | 0.8398 |
| Model B | 0.9217 | 0.9009 | 0.9037 | 0.9010 |
| Model C | 0.9432 | 0.9381 | 0.9384 | 0.9381 |
| Model D | 0.8977 | 0.8702 | 0.8682 | 0.8623 |
| Model | Precision | Recall | F1-score | Accuracy |
|---|---|---|---|---|
| Model A | 0.7142 | 0.6222 | 0.6095 | 0.6041 |
| Model B | 0.8665 | 0.7862 | 0.7930 | 0.7715 |
| Model C | 0.8848 | 0.8471 | 0.8482 | 0.8419 |
| Model D | 0.9040 | 0.8736 | 0.8764 | 0.8670 |
| Model | Precision | Recall | F1-score | Accuracy |
| stgcn [29] | 0.7257 | 0.6915 | 0.6715 | 0.6667 |
| stgcn++ [35] | 0.8144 | 0.7210 | 0.7417 | 0.7058 |
| ctrgcn [26] | 0.7944 | 06509 | 0.6587 | 0.6275 |
| aagcn [36] | 0.8040 | 0.7442 | 0.7541 | 0.7261 |
| posec3d [22] | 0.8825 | 0.7520 | 0.7865 | 0.7340 |
| Proposed (Model C) | 0.8848 | 0.8471 | 0.8482 | 0.8419 |
| Proposed (Model D) | 0.9040 | 0.8736 | 0.8764 | 0.8670 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).