Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

PVTReID: A Quick Person Re-Identification Based Pyramid Vision Transformer

Version 1 : Received: 3 August 2023 / Approved: 3 August 2023 / Online: 4 August 2023 (07:26:19 CEST)

A peer-reviewed article of this Preprint also exists.

Han, K.; Wang, Q.; Zhu, M.; Zhang, X. PVTReID: A Quick Person Reidentification-Based Pyramid Vision Transformer. Appl. Sci. 2023, 13, 9751. Han, K.; Wang, Q.; Zhu, M.; Zhang, X. PVTReID: A Quick Person Reidentification-Based Pyramid Vision Transformer. Appl. Sci. 2023, 13, 9751.

Abstract

Due to the influence of background conditions, lighting conditions, occlusion issues and the image resolution, how to extract robust person features is one of the difficulties in ReID research. Vision in Transformers (ViT) has achieved significant results in the field of computer vision. However, the existing problems still limit its application in ReID due to slow extraction of person features and difficulty in utilizing local features of people. To solve the mentioned problems, we utilize Pyramid Vision Transformer (PVT) as the backbone of feature extraction and propose a PVT-based ReID method in conjunction with other studies. Firstly, some improvements suitable for ReID are used on the PVT backbone, and we establish a basic model by using powerful methods verified on CNN-based ReID. Secondly, in an effort to further promote the robustness of the person features extracted by the PVT backbone, two new modules are designed. (1) The local feature clustering (LFC) is recommend to enhance the robustness of person features by calculating the distance between local features and global feature to select the most discrete local features and clustering them. (2) The side information embeddings (SIE) are used to encode non-visual information and send it into the network for training to reduce its impact on person features. Finally, the experiments show that PVTReID has achieved excellent results in ReID datasets and are 20% faster on average than CNN-based ReID methods.

Keywords

ReID; Pyramid Vision Transformer; local feature clustering; side information embeddings

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.