Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

CRTED: Few-Shot Object Detection via Correlation-RPN and Transformer Encoder-Decoder

Version 1 : Received: 9 April 2024 / Approved: 9 April 2024 / Online: 9 April 2024 (13:06:53 CEST)

How to cite: Chen, J.; Xu, K.; Ning, Y.; Jiang, L.; Xu, Z. CRTED: Few-Shot Object Detection via Correlation-RPN and Transformer Encoder-Decoder. Preprints 2024, 2024040633. https://doi.org/10.20944/preprints202404.0633.v1 Chen, J.; Xu, K.; Ning, Y.; Jiang, L.; Xu, Z. CRTED: Few-Shot Object Detection via Correlation-RPN and Transformer Encoder-Decoder. Preprints 2024, 2024040633. https://doi.org/10.20944/preprints202404.0633.v1

Abstract

Few-shot object detection (FSOD) aims to address the challenge of requiring a substantial amount of annotations for training in conventional object detection, which is very labor-intensive. However, existing few-shot methods achieve high precision with the sacrifice of time-consuming for exhaustive fine-tuning, or take poor performance in novel-class adaptation. We presume the major reason is that the valuable correlation feature among different categories is insufficiently exploited, hindering the generalization of knowledge from base to novel categories for object detection. In this paper, we propose Few-Shot object detection via Correlation-RPN and Transformer Encoder-Decoder (CRTED), a novel training network to learn object-relevant features of inter-class correlation and intra-class compactness while suppressing object-agnostic features in the background with limited annotated samples. And we also introduce a 4-way tuple-contrast training strategy to positively activate the training progress of our object detector. Experiments over two few-shot benchmarks (Pascal VOC, MS-COCO) demonstrate that, our proposed CRTED without further fine-tuning can achieve comparable performance with current state-of-the-art fine-tuned works. The codes and pre-trained models will be released.

Keywords

Few-shot object detection; Region proposal network; Transformer Encoder-Decoder; Training strategies

Subject

Computer Science and Mathematics, Computer Vision and Graphics

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.