Submitted:
26 October 2024
Posted:
28 October 2024
You are already at the latest version
Abstract
Keywords:
Introduction
Related Work
Food Datasets
Food Recognition
Fine-Grained Visual Classification
Methods
Dataset Reconstruction
Unbalanced Number of Images Cross Categories on CNFOOD-241
Comparison with SOTA Methods
Visualization Analysis
Results
Recognition Performance on Other Datasets
Visualization Analysis
Food Image Generation
Discussion
Conclusion
Data Availability Statement
Conflicts of Interest
References
- Bossard, L. , Guillaumin, M., & Van Gool, L. (2014). Food-101 – mining discriminative components with random forests. Computer Vision– ECCV 2014, 446-461.
- Chen, C. S. , Chen, G. Y., Zhou, D., Jiang, D., & Chen, D. (2024). Res-VMamba: Fine-Grained Food Category Visual Classification Using Selective State Space Models with Deep Residual Learning. arXiv:2402.15761. [CrossRef]
- Chou, P. Y. , Kao, Y. Y., & Lin, C. H. (2023). Fine-grained Visual Classification with High-temperature Refinement.
- and Background Suppression. arXiv:2303.06442. [CrossRef]
- Dalakleidi, K.V.; Papadelli, M.; Kapolos, I.; Papadimitriou, K. Applying Image-Based Food-Recognition Systems on Dietary Assessment: A Systematic Review. Adv. Nutr. Int. Rev. J. 2022, 13, 2590–2619. [Google Scholar] [CrossRef] [PubMed]
- De Toro-Martín, J.; Arsenault, B.J.; Després, J.-P.; Vohl, M.-C. Precision Nutrition: A Review of Personalized Nutritional Approaches for the Prevention and Management of Metabolic Syndrome. Nutrients 2017, 9, 913. [Google Scholar] [CrossRef] [PubMed]
- Deng, J. , Dong, W., Socher, R., Li, L., Li, K., & Li, F. (2009). Imagenet: A large-scale hierarchical image database. IEEE Conference on Computer Vision and Pattern Recognition, 248-255.
- Dosovitskiy, A. , Beyer, L., Kolesnikov, K., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2021). An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations. https://openreview.net/forum?id=YicbFdNTTy.
- Fan, B.; Li, W.; Dong, L.; Li, J.; Nie, Z. Automatic Chinese Food recognition based on a stacking fusion model. 2023 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC); pp. 1–4.
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2261–2269. [Google Scholar] [CrossRef]
- Le, Q. , Tan, M. (2019). Efficientnet: Rethinking model scaling for convolutional neural networks. Technical report.
- Liu, D.; Zhao, L.; Wang, Y.; Kato, J. Learn from each other to Classify better: Cross-layer mutual attention learning for fine-grained visual classification. Pattern Recognit. 2023, 140. [Google Scholar] [CrossRef]
- Liu, Y. , Tian, Y., Zhao, Y., Yu, H., Xie, L., Wang, Y., Ye, Q., & Liu, Y. (2024). Vmamba: Visual state space model. Technical report.
- Liu, Z.; Mao, H.; Wu, C.-Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A ConvNet for the 2020s. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18-24 June 2022; pp. 11966–11976. [Google Scholar] [CrossRef]
- Matsuda, Y. , & Yanai, K. (2015). Automatic expansion of a food image dataset leveraging existing categories with domain adaptation. Computer Vision -ECCV 2014 Workshops, 3-17. [CrossRef]
- Min, W.; Jiang, S.; Liu, L.; Rui, Y.; Jain, R. A Survey on Food Computing. ACM Comput. Surv. 2019, 52, 1–36. [Google Scholar] [CrossRef]
- Min, W.; Liu, L.; Wang, Z.; Luo, Z.; Wei, X.; Wei, X.; Jiang, S. ISIA Food-500: A Dataset for Large-Scale Food Recognition via Stacked Global-Local Attention Network. MM '20: The 28th ACM International Conference on Multimedia.
- Min, W.; Wang, Z.; Liu, Y.; Luo, M.; Kang, L.; Wei, X.; Wei, X.; Jiang, S. Large Scale Visual Food Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 9932–9949. [Google Scholar] [CrossRef] [PubMed]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. arXiv arXiv:1610.02391, 2017.
- Simonyan, K. , & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556. [CrossRef]
- Szegedy, C. , Loffe, S., Vanhoucke, V., & Alemi, A. (2016). Inception-v4, inception-resnet and the impact of residual connections on learning. Technical report.
- Wang, A. , Chen, H., Lin, Z., Han, J., & G., D. (2023). Repvit: Revisiting mobile cnn from vit perspective. Technical report.
- Zahisham, Z.; Lee, C.P.; Lim, K.M. Food Recognition with ResNet-50. 2020 IEEE 2nd International Conference on Artificial Intelligence in Engineering and Technology (IICAIET). LOCATION OF CONFERENCE, MalaysiaDATE OF CONFERENCE; pp. 1–5.
- Zheng, H.; Fu, J.; Mei, T.; Luo, J. Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition. 2017 IEEE International Conference on Computer Vision (ICCV). LOCATION OF CONFERENCE, United StatesDATE OF CONFERENCE; pp. 5219–5227.
- Zhou, L.; Zhang, C.; Liu, F.; Qiu, Z.; He, Y. Application of Deep Learning in Food: A Review. Compr. Rev. Food Sci. Food Saf. 2019, 18, 1793–1811. [Google Scholar] [CrossRef] [PubMed]
- Zhuang, P.; Wang, Y.; Qiao, Y. Learning Attentive Pairwise Interaction for Fine-Grained Classification. Proc. AAAI Conf. Artif. Intell. 2020, 34, 13130–13137. [Google Scholar] [CrossRef]




| Dataset | Year | Classes/Images | Category Entropy | Type | Public |
| PFID | 2009 | 101/4,545 | - | Western | - |
| Food50 | 2010 | 50/5,000 | - | Misc. | - |
| Food85 | 2010 | 85/8,500 | - | Misc. | - |
| UEC Food100 | 2012 | 100/14,361 | 0.9758 | Japanese | V |
| UEC Food256 | 2014 | 256/25,088 | 0.9891 | Japanese | V |
| ETH Food-101 | 2014 | 101/101,000 | 1.0000 | Western | V |
| Diabetes | 2014 | 11/4,868 | - | Misc. | - |
| UPMC Food-101 | 2015 | 101/90,840 | 0.9999 | Western | V |
| UNICT-FD889 | 2015 | 889/3,583 | 0.9832 | Misc. | V |
| Vireo Food-172 | 2016 | 172/110,241 | 0.9876 | Chinese | V |
| Food-975 | 2016 | 975/37,785 | - | Misc. | - |
| Food500 | 2016 | 508/148,408 | - | Misc. | - |
| Food11 | 2016 | 11/16,643 | 0.9565 | Misc. | V |
| UNICT-FD1200 | 2016 | 1,200/4,754 | 0.9883 | Misc. | V |
| Food524DB | 2017 | 524/247,636 | - | Misc. | - |
| ChineseFoodNet | 2017 | 208/192,000 | - | Chinese | - |
| Vegfru | 2017 | 292/160,000 | 0.9759 | Misc. | V |
| Sushi-50 | 2019 | 50/3,963 | 0.9951 | Japanese | V |
| FoodX-251 | 2019 | 251/158,846 | 0.9974 | Misc. | V |
| ISIA Food-200 | 2019 | 200/197,323 | 0.9889 | Misc. | V |
| FoodAI-756 | 2019 | 756/400,000 | - | Misc. | - |
| Taiwanese-Food-101 | 2020 | 101/20,200 | 1.0000 | Chinese | V |
| ISIA Food-500 | 2020 | 500/399,726 | 0.9880 | Misc. | V |
| Food2K | 2021 | 2000/1,036,564 | 0.9821 | Misc. | V |
| MyFoodRepo-273 | 2022 | 273/24,119 | - | Misc. | - |
| CNFOOD-241 | 2022 | 241/191,811 | 0.9780 | Chinese | V |
| Model | Top-1 Val. Acc. | Top-5 Val. Acc. | Top-1 Test Acc. | Top-5 Test Acc. |
| VGG16 | 66.98 | 90.10 | 65.06 | 89.60 |
| ViT-B | 73.14 | 92.06 | 71.58 | 91.62 |
| ResNet101 | 74.42 | 93.62 | 72.59 | 93.16 |
| DenseNet121 | 76.46 | 94.57 | 74.77 | 94.29 |
| InceptionV4 | 77.30 | 94.28 | 75.70 | 93.89 |
| SEnet154 | 77.47 | 94.86 | 76.02 | 94.61 |
| PRENet | 77.28 | 95.16 | 76.28 | 94.85 |
| RepViT | 78.08 | 95.41 | 76.86 | 95.02 |
| ConvNeXT-B | 78.30 | 94.36 | 76.76 | 93.90 |
| EfficientNet-B6 | 80.10 | 94.64 | 78.48 | 94.22 |
| CMAL-Net | 80.16 | 95.99 | 78.56 | 95.40 |
| VMamba-S | 82.15 | 96.91 | 80.58 | 96.71 |
| ResVMamba | 79.54 # | 95.72# | 81.70 | 96.83 |
| HERBS | 83.56 | 97.31 | 82.72 | 97.19 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).