Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

End-to-end 3D Human Pose Estimation using Dual Decoders

Version 1 : Received: 31 May 2023 / Approved: 1 June 2023 / Online: 1 June 2023 (05:00:33 CEST)

How to cite: Wang, Z.; Wang, T.; Song, M.; Jin, L. End-to-end 3D Human Pose Estimation using Dual Decoders. Preprints 2023, 2023060033. https://doi.org/10.20944/preprints202306.0033.v1 Wang, Z.; Wang, T.; Song, M.; Jin, L. End-to-end 3D Human Pose Estimation using Dual Decoders. Preprints 2023, 2023060033. https://doi.org/10.20944/preprints202306.0033.v1

Abstract

Existing methods for 3D human pose estimation mainly divide the task into two stages. The first stage identifies the 2D coordinates of the human joints in the input image, namely the 2D human joint coordinates. The second stage uses the results from the first stage as input to recover the depth information of human joints from the 2D human joint coordinates to achieve 3D human pose estimation. However, the recognition accuracy of the two-stage method relies heavily on the results of the first stage and includes too many redundant processing steps, which reduces the inference efficiency of the network. To address these issues, we propose the EDD, a fully End-to-end 3D human pose estimation method based on transformer architecture with Dual Decoders. By learning multiple human poses, the model can directly infer all 3D human poses in the image using a pose decoder, and then further optimize the recognition result using a joint decoder based on the kinematic relations between joints. With the attention mechanism, this method can adaptively focus on the most relevant features to the target joint, effectively overcoming the feature misalignment problem in the human pose estimation task and greatly improving the model performance. Any complex post-processing step, such as non-maximum suppression, is eliminated, further improving the efficiency of the model. The results show that the method achieves an accuracy of 87.4% on the MuPoTS-3D dataset, significantly improving the accuracy of end-to-end 3D human pose estimation methods.

Keywords

Computer vision; 3D human pose estimation; Transformer

Subject

Computer Science and Mathematics, Computer Vision and Graphics

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.