Submitted:
15 June 2024
Posted:
17 June 2024
You are already at the latest version
Abstract

Keywords:
1. Introduction
2. Related Work
2.1. Zero-Shot Classification
3. Method
3.1. Reference Images Generation
- (1)
- Feature Analysis
- (2)
- Image Synthesis
3.2. Alignment-Driven Image Classification
3.2.1. Text-Image Alignment
3.2.2. Image-Image Alignment
3.3. Logits Aggregation
4. Experiments
4.1. Datasets and Evaluation Metrics
4.2. Main Results
4.3. Ablation Study
4.3.1. Effectiveness of Different Components
4.3.2. Effects of Different Aggregation Strategies
4.3.3. Number of Reference Images
4.3.4. Visualization of Reference Images
5. Conclusion
References
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition; 2016; pp. 770–778. [Google Scholar] [CrossRef]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv preprint 2017, arXiv:1704.04861. [Google Scholar]
- Paz-Argaman, T.; Atzmon, Y.; Chechik, G.; Tsarfaty, R. Zest: Zero-Shot Learning from Text Descriptions Using Textual Similarity and Visual Summarization. arXiv preprint, 2020; arXiv:2010.03276. [Google Scholar]
- Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; others. Language models are few-shot learners. Advances in neural information processing systems 2020, 33, 1877–1901. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint 2018, arXiv:1810.04805. [Google Scholar]
- Ou, G.; Yu, G.; Domeniconi, C.; Lu, X.; Zhang, X. Multi-label zero-shot learning with graph convolutional networks. Neural Networks 2020, 132, 333–341. [Google Scholar] [CrossRef] [PubMed]
- Gao, J.; Xu, C.S. CI-GNN: Building a category-instance graph for zero-shot video classification. IEEE Transactions on Multimedia 2020, 22, 3088–3100. [Google Scholar] [CrossRef]
- Liu, S.C.; Long, M.S.; Wang, J.M.; Jordan, M.I. Generalized zero-shot learning with deep calibration network. Advances in neural information processing systems 2018, 31. [Google Scholar]
- Sankaranarayanan, S.; Balaji, Y. Meta learning for domain generalization. Elsevier, 2023; pp. 75–86. [Google Scholar]
- Xian, Y.Q.; Lorenz, T.; Schiele, B.; Akata, Z. Feature generating networks for zero-shot learning. In Proceedings of the IEEE conference on computer vision and pattern recognition; 2018; pp. 5542–5551. [Google Scholar] [CrossRef]
- Ren, J.W.; Yu, C.J.; Ma, X.; Zhao, H.Y.; Yi, S.; others. Balanced meta-softmax for long-tailed visual recognition. Advances in neural information processing systems 2020, 33, 4175–4186. [Google Scholar] [CrossRef]
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; others. Learning transferable visual models from natural language supervision. International conference on machine learning. PMLR, 2021, pp. 8748–8763. [CrossRef]
- Shipard, J.; Wiliem, A.; Thanh, K.N.; Xiang, W.; Fookes, C. Diversity is definitely needed: Improving model-agnostic zero-shot classification via stable diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2023; pp. 769–778. [Google Scholar] [CrossRef]
- Christensen, A.; Mancini, M.; Koepke, A.; Winther, O.; Akata, Z. Image-free Classifier Injection for Zero-Shot Classification. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023; 19072–19081. [Google Scholar]
- Novack, Z.; McAuley, J.; Lipton, Z.C.; Garg, S. Chils: Zero-shot image classification with hierarchical label sets. International Conference on Machine Learning. PMLR, 2023, pp. 26342–26362. [CrossRef]
- Caron, M.; Touvron, H.; Misra, I.; Jégou, H.; Mairal, J.; Bojanowski, P.; Joulin, A. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision; 2021; pp. 9650–9660. [Google Scholar] [CrossRef]
- Zhang, R.; Hu, X.; Li, B.; Huang, S.; Deng, H.; Qiao, Y.; Gao, P.; Li, H. Prompt, Generate, then Cache: Cascade of Foundation Models Makes Strong Few-Shot Learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2023; pp. 15211–15222. [Google Scholar] [CrossRef]
- Chen, G.; Qiao, L.; Shi, Y.; Peng, P.; Li, J.; Huang, T.; Pu, S.; Tian, Y. Learning Open Set Network with Discriminative Reciprocal Points. Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III. Springer, 2020, pp. 507–522. [CrossRef]
- Zhang, H.; Li, A.; Guo, J.; Guo, Y. Hybrid Models for Open Set Recognition. Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III. Springer, 2020, pp. 102–117. [CrossRef]
- Lu, J.; Xu, Y.; Li, H.; Cheng, Z.; Niu, Y. PMAL: Open Set Recognition via Robust Prototype Mining. Proceedings of the AAAI Conference on Artificial Intelligence, 2022, Vol. 36, pp. 1872–1880. [CrossRef]
- Esmaeilpour, S.; Liu, B.; Robertson, E.; Shu, L. Zero-shot Out-of-Distribution Detection Based on the Pre-trained Model CLIP. Proceedings of the AAAI Conference on Artificial Intelligence, 2022, Vol. 36, pp. 6568–6576. [CrossRef]
- Vaze, S.; Han, K.; Vedaldi, A.; Zisserman, A. Open-Set Recognition: A Good Closed-Set Classifier Is All You Need? 2021. [Google Scholar]
- Moon, W.; Park, J.; Seong, H.S.; Cho, C.H.; Heo, J.P. Difficulty-Aware Simulator for Open Set Recognition. European Conference on Computer Vision. Springer, 2022, pp. 365–381. [CrossRef]
- Cho, W.; Choo, J. Towards Accurate Open-Set Recognition via Background-Class Regularization. European Conference on Computer Vision. Springer, 2022, pp. 658–674.
- Liu, Z.g.; Fu, Y.m.; Pan, Q.; Zhang, Z.w. Orientational Distribution Learning with Hierarchical Spatial Attention for Open Set Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 2022. [Google Scholar] [CrossRef] [PubMed]
- Huang, H.; Wang, Y.; Hu, Q.; Cheng, M.M. Class-Specific Semantic Reconstruction for Open Set Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 2022, 45, 4214–4228. [Google Scholar] [CrossRef] [PubMed]



| Method | Venue | CIFAR10 | CIFAR+10 | TinyImageNet |
| Methods that involve a training process | ||||
| RPL [18] | ECCV 2020 | |||
| OpenHybrid [19] | ECCV 2020 | |||
| PMAL [20] | AAAI 2022 | |||
| ZOC [21] | AAAI 2022 | |||
| MLS [22] | ICLR 2022 | |||
| DIAS [23] | ECCV 2022 | |||
| Class-inclusion [24] | ECCV 2022 | |||
| ODL [25] | TPAMI 2022 | |||
| ODL+ [25] | TPAMI 2022 | |||
| CSSR [26] | TPAMI 2022 | |||
| RCSSR [26] | TPAMI 2022 | |||
| Methods that involve no extra training process | ||||
| Ours | ||||
| Method | CIFAR10 | CIFAR100 | TinyImageNet | ||||||
| Top1 | Top3 | Top5 | Top1 | Top3 | Top5 | Top1 | Top3 | Top5 | |
| M1+M2 | 83.83% | 94.85% | 98.22% | 48.00% | 66.41% | 73.27% | 43.52% | 63.97% | 71.79% |
| M1+M3 | 92.51% | 98.16% | 99.35% | 71.93% | 86.59% | 90.50% | 73.29% | 84.73% | 88.41% |
| M2+M3 | 79.99% | 93.22% | 97.41% | 64.19% | 79.29% | 84.63% | 72.74% | 84.33% | 87.30% |
| Ours (M1+M2+M3) | 92.96% | 98.32% | 99.33% | 72.17% | 86.55% | 90.36% | 73.52% | 84.95% | 88.55% |
| Method | CIFAR10 | CIFAR100 | TinyImageNet |
| M1+M2 | 97.94% | 92.99% | 87.03% |
| M1+M3 | 99.75% | 95.87% | 96.54% |
| M2+M3 | 97.26% | 94.70% | 96.09% |
| Ours (M1+M2+M3) | 99.78% | 96.03% | 96.48% |
| Weighting Method | Top1 | Top3 | Top5 |
| 1:1:1 | 92.36% | 98.22% | 99.31% |
| 3:3:4 | 92.29% | 98.21% | 99.26% |
| Max Similarity | 92.75% | 98.26% | 99.34% |
| Inverse Entropy | 92.96% | 98.32% | 99.33% |
| Negative Exponential of Entropy | 92.90% | 98.31% | 99.36% |
| Weighting Method | AUROC |
| 1:1:1 | 99.55% |
| 3:3:4 | 99.60% |
| Max Similarity | 99.68% |
| Inverse Entropy | 99.78% |
| Negative Exponential of Entropy | 99.73% |
| Number of Reference Images | CIFAR10 | CIFAR100 | TinyImageNet |
| One | 88.71%-99.05% | 67.78%-96.025% | 67.77%-96.00% |
| Multiple | 92.96%-99.78% | 72.17%-96.026% | 73.52%-96.48% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).