Submitted:
05 September 2024
Posted:
09 September 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Related Work
2.1. Fine-Grained Visual Classification
2.2. Visual Classification Based on Spatial Frequency Domain
2.3. Small Dataset Fine-Grained Visual Classification
3. Proposed Method
3.1. LSI Extraction
3.2. Information Preprocessing
3.3. Classification Network
4. Experiments
4.1. Experiment Setting
4.2. Parameter Settings
4.3. Experiment Results
5. Application
5.1. Patch Data and Preprocessing
5.2. Data Training
5.3. Inference and Change Detection
5.4. Change Detection Result
6. Conclusion
References
- Jonathan, K.; Jin, H.; Yang, J.; Fei-Fei, L. Fine-grained recognition without part annotations. IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 5546–5555.
- Huang, S.; Xu, Z.; Tao, D.; Zhang, Y. Part-stacked CNN for fine-grained visual categorization. Conference on Computer Vision and Pattern Recognition, 2016, pp. 1173–1182.
- Berg, T.; Liu, J.; Woo Lee, S.; Alexander, M.L.; Jacobs, D.W.; Belhumeur, P.N. Birdsnap: Large-scale fine-grained visual categorization of birds. IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 2011–2018.
- Ye, S.; Wang, Y.; Peng, Q.; You, X.; Chen, C.P. The image data and backbone in weakly supervised fine-grained visual categorization: A revisit and further thinking. IEEE Transactions on Circuits and Systems for Video Technology 2023, pp. 2–16. [CrossRef]
- Wang, H.; Liao, J.; Cheng, T.; Gao, Z.; Liu, H.; Ren, B.; Bai, X.; Liu, W. Knowledge mining with scene text for fine-grained recognition. IEEE Conference on Computer Vision and Pattern Recognition, 2022, pp. 4624–4633.
- Diao, Q.; Jiang, Y.; Wen, B.; Sun, J.; Yuan, Z. Metaformer: A unified meta framework for fine-grained recognition. arXiv preprint arXiv:2203.02751 2022.
- Zhou, P.; Pang, C.; Lan, R.; Wu, G.; Zhang, Y. Multi-discriminative Parts Mining for Fine-Grained Visual Classification. Asian Conference on Pattern Recognition, 2023, pp. 279–292.
- Xu, Q.; Wang, J.; Jiang, B.; Luo, B. Fine-grained visual classification via internal ensemble learning transformer. IEEE Transactions on Multimedia 2023, pp. 9015–9028. [CrossRef]
- Cui, S.; Hui, B. Dual-Dependency Attention Transformer for Fine-Grained Visual Classification. Sensors 2024, 24, 2337. [Google Scholar] [CrossRef] [PubMed]
- An, C.; Wang, X.; Wei, Z.; Zhang, K.; Huang, L. Multi-scale network via progressive multi-granularity attention for fine-grained visual classification. Applied Soft Computing 2023, 146, 110588. [Google Scholar] [CrossRef]
- Shen, J.; Yao, Y.; Huang, S.; Wang, Z.; Zhang, J.; Wang, R.; Yu, J.; Liu, T. ProtoSimi: label correction for fine-grained visual categorization. Machine Learning 2024, 113, 1903–1920. [Google Scholar] [CrossRef]
- Pu, Y.; Han, Y.; Wang, Y.; Feng, J.; Deng, C.; Huang, G. Fine-grained recognition with learnable semantic data augmentation. IEEE Transactions on Image Processing 2024, pp. 3130–3144. [CrossRef]
- Wah, C.; Branson, S.; Welinder, P.; Perona, P.; Belongie, S. The Caltech-UCSD Birds-200-2011 dataset. California Institute of Technology 2011. [Google Scholar]
- Krause, J.; Stark, M.; Deng, J.; Fei-Fei, L. 3D object representations for fine-grained categorization. IEEE International Conference on Computer Vision Workshops, 2013, pp. 554–561.
- Maji, S.; Rahtu, E.; Kannala, J.; Blaschko, M.; Vedaldi, A. Fine-grained visual classification of aircraft. ArXiv Preprint ArXiv:1306.5151 2013.
- Wang, Y.; Yao, Q.; Kwok, J.T.; Ni, L.M. Generalizing from a few examples: A survey on few-shot learning. ACM Computing Surveys 2020, 53, 1–34. [Google Scholar] [CrossRef]
- Schmidt, L.A. Meaning and compositionality as statistical induction of categories and constraints. PhD thesis, Massachusetts Institute of Technology, 2009.
- Shui, P.L.; Zhang, W.C. Corner detection and classification using anisotropic directional derivative representations. IEEE Transactions on Image Processing 2013, 22, 3204–3218. [Google Scholar] [CrossRef] [PubMed]
- Zhang, W.; Zhao, Y.; Breckon, T.P.; Chen, L. Noise robust image edge detection based upon the automatic anisotropic Gaussian kernels. Pattern Recognition 2017, 63, 193–205. [Google Scholar] [CrossRef]
- Li, P.; Lu, X.; Wang, Q. From dictionary of visual words to subspaces: Locality-constrained affine subspace coding. IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 2348–2357.
- Dai, X.; Ng, J.Y.; Davis, L.S. FASON: First and Second Order Information Fusion Network for Texture Recognition. IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 6100–6108.
- Huang, S.W.; Lin, C.T.; Chen, S.P.; Wu, Y.Y.; Hsu, P.H.; Lai, S.H. AugGAN: Cross domain adaptation with GAN-based data augmentation. European Conference on Computer Vision, 2018, pp. 718–731.
- Yoo, S.; Bahng, H.; Chung, S.; Lee, J.; Chang, J.; Choo, J. Coloring with limited data: Few-shot colorization via memory augmented networks. IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 11283–11292.
- Feng, Z.; Xu, C.; Tao, D. Self-supervised representation learning by rotation feature decoupling. IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 10364–10374.
- Ji, R.; Wen, L.; Zhang, L.; Du, D.; Wu, Y.; Zhao, C.; Liu, X.; Huang, F. Attention convolutional binary neural tree for fine-grained visual categorization. IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 10468–10477.
- Chen, Y.; Bai, Y.; Zhang, W.; Mei, T. Destruction and construction learning for fine-grained image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 5157–5166.
- Luo, C.; Zhu, Y.; Jin, L.; Wang, Y. Learn to augment: Joint data augmentation and network optimization for text recognition. Conference on Computer Vision and Pattern Recognition, 2020, pp. 13746–13755.
- Lin, S.;Zhang,Z.;Huang,Z.;Lu,Y.;Lan,C.;Chu,P.;You,Q.;Wang,J.;Liu,Z.;Parulkar,A.; others. Deep frequency filtering for domain generalization. IEEE Conference on Computer Vision and Pattern Recognition, 2023, pp. 11797–11807.
- Shi, H.; Cao, G.; Zhang, Y.; Ge, Z.; Liu, Y.; Yang, D. F 3 Net: Fast Fourier filter network for hyperspectral image classification. IEEE Transactions on Instrumentation and Measurement 2023. [Google Scholar] [CrossRef]
- Xu, K.; Qin, M.; Sun, F.; Wang, Y.; Chen, Y.K.; Ren, F. Learning in the frequency domain. IEEE conference on computer vision and pattern recognition, 2020, pp. 1740–1749.
- Lin, H.; Tse, R.; Tang, S.K.; Qiang, Z.; Pau, G. Few-shot learning for plant-disease recognition in the frequency domain. Plants 2022, 11, 2814. [Google Scholar] [CrossRef]
- Zhu, H.; Gao, Z.; Wang, J.; Zhou, Y.; Li, C. Few-shot fine-grained image classification via multi-frequency neighborhood and double-cross modulation. arXiv preprint arXiv:2207.08547 2022.
- Chen, X.; Wang, G. Few-shot learning by integrating spatial and frequency representation. 2021 18th Conference on Robots and Vision (CRV), 2021, pp. 49–56.
- Tang, H.; Yuan, C.; Li, Z.; Tang, J. Learning attention-guided pyramidal features for few-shot fine-grained recognition. Pattern Recognition 2022, 130, 108792. [Google Scholar] [CrossRef]
- Zhang, B.; Yuan, J.; Li, B.; Chen, T.; Fan, J.; Shi, B. Learning cross-image object semantic relation in transformer for few-shot fine-grained image classification. 30th ACM International Conference on Multimedia, 2022, pp. 2135–2144.
- Tsutsui, S.; Fu, Y.; Crandall, D. Reinforcing generated images via meta-learning for one-shot fine-grained visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 2022. [Google Scholar] [CrossRef] [PubMed]
- Zhang, C.; Cai, Y.; Lin, G.; Shen, C. Deepemd: Few-shot image classification with differentiable earth mover’s distance and structured classifiers. IEEE conference on computer vision and pattern recognition, 2020, pp. 12203–12213.
- Wertheimer, D.; Tang, L.; Hariharan, B. Few-shot classification with feature map reconstruction networks. IEEE conference on computer vision and pattern recognition, 2021, pp. 8012–8021.
- Wu, J.; Chang, D.; Sain, A.; Li, X.; Ma, Z.; Cao, J.; Guo, J.; Song, Y.Z. Bi-directional feature reconstruction network for fine-grained few-shot image classification. AAAI Conference on Artificial Intelligence, 2023, Vol. 37, pp. 2821–2829.
- Sun, M.; Ma, W.; Liu, Y. Global and local feature interaction with vision transformer for few-shot image classification. 31st ACM International Conference on Information & Knowledge Management, 2022, pp. 4530–4534.
- Ren, J.; Li, C.; An, Y.; Zhang, W.; Sun, C. Few-Shot Fine-Grained Image Classification: A Comprehensive Review. AI 2024, 5, 405–425. [Google Scholar] [CrossRef]
- Snell, J.; Swersky, K.; Zemel, R. Prototypical networks for few-shot learning. Advances in Neural Information Processing Systems, 2017, pp. 4077–4087.
- Sung, F.; Yang, Y.; Zhang, L.; Xiang, T.; Torr, P.H.; Hospedales, T.M. Learning to compare: Relation network for few-shot learning. IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 1199–1208.
- Jing, J.; Gao, T.; Zhang, W.; Gao, Y.; Sun, C. Image feature information extraction for interest point detection: A comprehensive review. IEEE Transactions on Pattern Analysis and Machine Intelligence 2022, 45, 4694–4712. [Google Scholar] [CrossRef] [PubMed]
- Zhang, W.; Sun, C.; Gao, Y. Image intensity variation information for interest point detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 2023, 45, 9883–9894. [Google Scholar] [CrossRef] [PubMed]
- Shui, P.L.; Zhang, W.C. Corner detection and classification using anisotropic directional derivative representations. IEEE Transactions on Image Processing 2013, 22, 3204–3218. [Google Scholar] [CrossRef]
- Zhang, W.C.; Shui, P.L. Contour-based corner detection via angle difference of principal directions of anisotropic Gaussian directional derivatives. Pattern Recognition 2015, 48, 2785–2797. [Google Scholar] [CrossRef]
- Zhang, W.; Sun, C. Corner detection using multi-directional structure tensor with multiple scales. International Journal of Computer Vision 2020, 128, 438–459. [Google Scholar] [CrossRef]
- Jing, J.; Liu, S.; Wang, G.; Zhang, W.; Sun, C. Recent advances on image edge detection: A comprehensive review. Neurocomputing 2022, 503, 259–271. [Google Scholar] [CrossRef]
- Jing, J.; Gao, T.; Zhang, W.; Gao, Y.; Sun, C. Image Feature Information Extraction for Interest Point Detection: A Comprehensive Review 2021. abs/2106.07929.
- Zhang, W.; Sun, C.; Breckon, T.; Alshammari, N. Discrete curvature representations for noise robust image corner detection. IEEE Transactions on Image Processing 2019, 28, 4444–4459. [Google Scholar] [CrossRef]
- Lowe, D.G. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Zhang, W.; Sun, C. Corner detection using second-order generalized Gaussian directional derivative representations. IEEE Transactions on Pattern Analysis and Machine Intelligence 2021, 43, 1213–1224. [Google Scholar] [CrossRef] [PubMed]
- Yu, X.; Zhao, Y.; Gao, Y.; Xiong, S.; Yuan, X. Patchy image structure classification using multi-orientation region transform. Association for the Advancement of Artificial Intelligence, 2020, pp. 12741–12748.
- Nilsback, M.; Zisserman, A. Automated flower classification over a large number of classes. Sixth Indian Conference on Computer Vision, Graphics Image Processing, 2008, pp. 722–729.
- Mohanty, S.P.; Hughes, D.P.; Salathé, M. Using deep learning for image-based plant disease detection. Frontiers in Plant Science 2016, 7, 1419. [Google Scholar] [CrossRef] [PubMed]
- Simonyan, K.; Andrew, Z. Very deep convolutional networks for large-scale image recognition. Proceedings of the International Conference on Learning Representations, 2015, pp. 770–784.
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
- Yang, Z.; Luo, T.; Wang, D.; Hu, Z.; Gao, J.; Wang, L. Learning to navigate for fine-grained classification. European Conference on Computer Vision, 2018, pp. 420–435.
- Li, P.; Xie, J.; Wang, Q.; Gao, Z. Towards faster training of global covariance pooling networks by iterative matrix square root normalization. IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 947–955.
- Luo, W.; Yang, X.; Mo, X.; Lu, Y.; Davis, L.S.; Li, J.; Yang, J.; Lim, S.N. Cross-X learning for fine-grained visual categorization. Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 8242–8251.
- Min, S.; Yao, H.; Xie, H.; Zha, Z.J.; Zhang, Y. Multi-objective matrix normalization for fine-grained visual recognition. IEEE Transactions on Image Processing 2020, 29, 4996–5009. [Google Scholar] [CrossRef]
- Impedovo, D.; Dentamaro, V.; Abbattista, G.; Gattulli, V.; Pirlo, G. A comparative study of shallow learning and deep transfer learning techniques for accurate fingerprints vitality detection. Pattern Recognition Letters 2021, 151, 11–18. [Google Scholar] [CrossRef]







| Input images | Accuracy (%) | |||||
|---|---|---|---|---|---|---|
| Cotton | CUB | CAR | AIR | FLO | PD | |
| 64.89 | 86.63 | 92.77 | 91.07 | 95.49 | 97.46 | |
| 65.33 | 86.63 | 92.32 | 91.23 | 96.16 | 96.58 | |
| 65.42 | 85.29 | 92.89 | 90.79 | 95.49 | 98.72 | |
| 64.37 | 85.13 | 92.26 | 90.47 | 97.18 | 96.46 | |
| 64.63 | 86.16 | 92.13 | 91.43 | 96.84 | 97.58 | |
| Method | Base Model | Accuracy (%) | |||||
|---|---|---|---|---|---|---|---|
| Cotton | CUB | CAR | AIR | FLO | PD | ||
| ResNet-50 | Resnet-50 | 48.24 | 84.20 | 90.92 | 89.74 | 95.35 | 96.33 |
| VGG-16 | VGG-16 | 40.19 | 82.18 | 87.55 | 96.32 | 94.37 | 95.17 |
| NTS-Net | ResNet-50 | 52.50 | 84.23 | 90.32 | 88.15 | 95.42 | 96.00 |
| fast-MPN-Cov | ResNet-50 | 50.73 | 85.12 | 88.61 | 90.26 | 96.33 | 95.78 |
| DCL | ResNet-50 | 60.08 | 85.47 | 92.18 | 90.58 | 96.49 | 96.19 |
| Cross-X | ResNet-50 | 52.71 | 85.22 | 92.18 | 89.84 | 96.12 | 93.63 |
| MOMN | ResNet-50 | 40.00 | 81.79 | 86.25 | 85.33 | 97.15 | 98.26 |
| ACNet | ResNet-50 | 53.42 | 85.31 | 92.29 | 88.65 | 96.88 | 96.68 |
| FVD | ResNet-50 | 57.69 | 84.20 | 91.06 | 88.52 | 96.62 | 95.43 |
| Ours | ResNet-50 | 65.42 | 86.63 | 92.89 | 91.43 | 97.18 | 98.72 |
| Method | Base Model | Accuracy (%) | |||||
|---|---|---|---|---|---|---|---|
| Cotton | CUB | CAR | AIR | FLO | PD | ||
| ResNet-50 | Resnet-50 | 57.92 | 88.23 | 95.02 | 94.03 | 96.13 | 97.14 |
| VGG-16 | VGG-16 | 48.94 | 85.36 | 90.25 | 95.13 | 95.33 | 96.23 |
| NTS-Net | ResNet-50 | 61.25 | 88.42 | 94.73 | 93.31 | 96.22 | 96.74 |
| fast-MPN-Cov | ResNet-50 | 59.72 | 89.01 | 91.33 | 94.41 | 96.88 | 96.71 |
| DCL | ResNet-50 | 69.91 | 89.92 | 96.17 | 94.53 | 97.09 | 96.97 |
| Cross-X | ResNet-50 | 61.77 | 89.22 | 96.14 | 94.10 | 96.73 | 94.88 |
| MOMN | ResNet-50 | 49.49 | 86.02 | 89.85 | 89.49 | 97.35 | 98.47 |
| ACNet | ResNet-50 | 61.49 | 89.13 | 96.22 | 93.47 | 97.10 | 97.27 |
| FVD | ResNet-50 | 65.91 | 88.95 | 95.36 | 93.77 | 97.01 | 96.53 |
| Ours | ResNet-50 | 74.15 | 90.05 | 96.77 | 95.53 | 97.49 | 98.88 |
| Method | Base Model | Accuracy (%) | |||||
|---|---|---|---|---|---|---|---|
| Cotton | CUB | CAR | AIR | FLO | PD | ||
| Lighting Changes | ResNet-50 | 44.35 | 88.34 | 90.14 | 91.92 | 94.18 | 92.11 |
| Colorizing Images | ResNet-50 | 43.52 | 86.32 | 89.36 | 90.11 | 93.92 | 92.16 |
| Image Rotations | ResNet-50 | 44.24 | 85.66 | 90.12 | 91.37 | 93.15 | 92.21 |
| Image Flips | ResNet-50 | 43.17 | 84.39 | 90.19 | 90.38 | 92.17 | 91.32 |
| Image Affine Transformations | ResNet-50 | 46.78 | 89.91 | 94.36 | 92.18 | 93.96 | 94.49 |
| Ours | ResNet-50 | 74.15 | 90.05 | 96.77 | 95.53 | 97.49 | 98.88 |
| Method | Accuracy (%) | Performance improvement | ||
|---|---|---|---|---|
| Cotton | CUB | Cotton | CUB | |
| Original NTS-Net | 52.50 | 84.23 | 11.71% | 8.45% |
| NTS-Net in our framework | 58.65 | 91.36 | ||
| Original fast-MPN-Cov | 50.73 | 85.12 | 13.97% | 9.38% |
| fast-MPN-Cov in our framework | 57.82 | 93.11 | ||
| Original DCL | 60.08 | 85.47 | 12.21% | 8.14% |
| DCL in our framework | 67.42 | 92.34 | ||
| Method | Base Model | Correct Prediction | Agricultural | Non-Agricultural | Accuracy |
|---|---|---|---|---|---|
| NTS-Net | ResNet-50 | 618 | 37 | 581 | 34.85% |
| fast-MPN-Cov | ResNet-50 | 479 | 16 | 463 | 27.01% |
| DCL | ResNet-50 | 1317 | 116 | 1201 | 74.28% |
| Cross-X | ResNet-50 | 1139 | 97 | 1042 | 64.24% |
| MOMN | ResNet-50 | 1263 | 132 | 1131 | 71.23% |
| Ours | ResNet-50 | 1443 | 189 | 1254 | 80.14% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).