Preprint Article Version 2 Preserved in Portico This version is not peer-reviewed

Spatiotemporal Graph Autoencoder Network for Skeleton-Based Human Action Recognition

Version 1 : Received: 28 January 2024 / Approved: 29 January 2024 / Online: 29 January 2024 (08:19:58 CET)
Version 2 : Received: 18 February 2024 / Approved: 20 February 2024 / Online: 21 February 2024 (03:46:16 CET)

How to cite: Abduljalil, H.; Elhayek, A.; Marish Ali, A.; Alsolami, F. Spatiotemporal Graph Autoencoder Network for Skeleton-Based Human Action Recognition. Preprints 2024, 2024011998. https://doi.org/10.20944/preprints202401.1998.v2 Abduljalil, H.; Elhayek, A.; Marish Ali, A.; Alsolami, F. Spatiotemporal Graph Autoencoder Network for Skeleton-Based Human Action Recognition. Preprints 2024, 2024011998. https://doi.org/10.20944/preprints202401.1998.v2

Abstract

Human action recognition (HAR) based on skeleton data is a challenging yet important technique because of its wide range of applications in many fields, including patient monitoring, security surveillance, and observing human-machine interactions. Many algorithms that attempt to distinguish between many types of activities have been proposed. However, most practical applications require highly accurate detection of specific types of activities. In this study, a novel and highly accurate spatiotemporal graph autoencoder network for HAR based on skeleton data is proposed. Furthermore, an extensive study was conducted using different modalities. For this purpose, a spatiotemporal graph autoencoder that automatically learns spatial as well as temporal patterns from human skeleton datasets was built. The powerful graph convolutional network named GA-GCN, developed in this study, notably outperforms most of the existing state-of-the-art methods based on two common datasets, namely NTU RGB+D and NTU RGB+D 120 . On the first dataset, we achieved an accuracy of 92.3% and 96.7% based on the cross-subject and cross-view evaluations, respectively. On the other more challenging dataset (i.e. NTU RGB+D 120), GA-GCN achieved 88.8% and 90.4% based on the cross-subject and cross-set evaluation, respectively.

Keywords

graph convolutional networks; graph autoencoder; deep learning; human activity analysis; skeleton-based human action recognition

Subject

Computer Science and Mathematics, Computer Vision and Graphics

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.