Submitted:
11 July 2024
Posted:
11 July 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
- the Variational Prototyping Encoder (VPE) was adapted for product recognition on retail shelves,
- various loss functions in the Variational Autoencoder (VAE) model were analyzed to enhance performance,
- the cosine metric was introduced to the nearest neighbor method to improve similarity measurement,
- the method was modified by incorporating prototypes as a signal at the encoder input,
- tests were conducted to select suitable prototypes for different classes,
- background removal and size uniformity were applied to prototypes and extracted products to optimize the recognition process and eliminate irrelevant disturbances,
- the optimal network parameters and latent space size were tested and selected to ensure effective performance.
2. Related Works
3. Method
3.1. Variational Prototyping Encoder
3.2. Training and Testing Phases
3.3. Network Architecture
3.4. Loss Functions
-
Sum of two components: Binary Cross Entropy (BCE) and Kullback-Leibler Divergence (KLD) [15]:where are the original data and are the reconstructed data.where is the mean vector and is the standard deviation vector.The total loss function is the sum of these two components:
- Relative Average Spectral Error (RASE) is computed using the RMSE value using the following equation [29]:where is the mean radiance of the N spectral bands and represents the i-th band of the input multispectral image. The desired value of this parameter is zero.
-
Root Mean Square Error (RMSE) measures the changes in pixel values of the input band of the multispectral image R and the sharpened image F. This error is determined using the following formula [29]:The desired value of this error is zero.
- Relative dimensionless global error in synthesis (ERGAS) is a global quality factor. This error is affected by variations in the average pixel value of the image and the dynamically changing range. It can be expressed as [29]:where: is the ratio of the number of pixels of a panchromatic image to the number of pixels of a multispectral image, is the mean of i-th band, while N is the total number of bands. The optimal value for this error is close to zero.
- Correlation coefficient (CC) shows the spectral correlation between two images. The value of this coefficient for the sharpened image F and the input multispectral image R is calculated as [29]:where and mean the average values of the F and R images, while m and n denote the shape of the images. The desired value of this coefficient is one.
4. Experiments
4.1. Dataset Overview
4.2. Implementation Overview
4.3. Results
5. Conclusion
Author Contributions
Funding
References
- Merler, M.; Galleguillos, C.; Belongie, S. Recognizing Groceries in situ Using in vitro Training Data. In Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition; 2007; pp. 1–8. [Google Scholar] [CrossRef]
- Marder, M.; Harary, S.; Ribak, A.; Tzur, Y.; Alpert, S.; Tzadok, A. Using image analytics to monitor retail store shelves. IBM Journal of Research and Development 2015, 59, 3:1–3:11. [Google Scholar] [CrossRef]
- Kurzejamski, G.; Zawistowski, J.; Sarwas, G. A framework for robust object multi-detection with a vote aggregation and a cascade filtering. In Proceedings of the WSCG ’2015: short communications proceedings: The 23rd International Conference in Central Europe on Computer Graphics, 2015., Visualization and Computer Vision 2015 in co-operation with EUROGRAPHICS: University of West Bohemia. [Google Scholar]
- Kurzejamski, G.; Zawistowski, J.; Sarwas, G. Robust Method of Vote Aggregation and Proposition Verification for Invariant Local Features. In Proceedings of the Proceedings of the 10th International Conference on Computer Vision Theory and Applications - Volume 2: VISAPP, (VISIGRAPP 2015). INSTICC,SciTePress, 2015, pp.252–259. [CrossRef]
- George, M.; Mircic, D.; Sörös, G.; Floerkemeier, C.; Mattern, F. Fine-Grained Product Class Recognition for Assisted Shopping. In Proceedings of the 2015 IEEE International Conference on Computer Vision Workshop (ICCVW); 2015; pp. 546–554. [Google Scholar] [CrossRef]
- Melek, C.G.; Sonmez, E.B.; Albayrak, S. A survey of product recognition in shelf images. In Proceedings of the 2017 International Conference on Computer Science and Engineering (UBMK); 2017; pp. 145–150. [Google Scholar] [CrossRef]
- Tonioni, A.; Serra, E.; Di Stefano, L. A deep learning pipeline for product recognition on store shelves. In Proceedings of the 2018 IEEE International Conference on Image Processing, Applications and Systems (IPAS); 2018; pp. 25–31. [Google Scholar] [CrossRef]
- Geng, W.; Han, F.; Lin, J.; Zhu, L.; Bai, J.; Wang, S.; He, L.; Xiao, Q.; Lai, Z. Fine-Grained Grocery Product Recognition by One-Shot Learning. In Proceedings of the Proceedings of the 26th ACM International Conference on Multimedia, New York, NY, USA, 2018;MM’18; pp. 1706–1714. [CrossRef]
- Sun, H.; Hanata, K.; Sato, H.; Tsuchitani, I.; Akashi, T. Segmentation based Non-learning Product Detection for Product Recognition on Store Shelves. In Proceedings of the 2019 Nicograph International (NicoInt); 2019; pp. 9–16. [Google Scholar] [CrossRef]
- Leo, M.; Carcagnì, P.; Distante, C. A Systematic Investigation on end-to-end Deep Recognition of Grocery Products in the Wild. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR); 2021; pp. 7234–7241. [Google Scholar] [CrossRef]
- Chen, S.; Liu, D.; Pu, Y.; Zhong, Y. Advances in deep learning-based image recognition of product packaging. Image and Vision Computing 2022, 128, 104571. [Google Scholar] [CrossRef]
- Selvam, P.; Faheem, M.; Dakshinamurthi, V.; Nevgi, A.; Bhuvaneswari, R.; Deepak, K.; Abraham Sundar, J. Batch Normalization Free Rigorous Feature Flow Neural Network for Grocery Product Recognition. IEEE Access 2024, 12, 68364–68381. [Google Scholar] [CrossRef]
- Goldman, E.; Herzig, R.; Eisenschtat, A.; Goldberger, J.; Hassner, T. Precise Detection in Densely Packed Scenes. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2019; pp. 5222–5231. [Google Scholar] [CrossRef]
- Melek, C.G.; Battini Sönmez, E.; Varlı, S. Datasets and methods of product recognition on grocery shelf images using computer vision and machine learning approaches: An exhaustive literature review. Engineering Applications of Artificial Intelligence 2024, 133, 108452. [Google Scholar] [CrossRef]
- Kim, J.; Oh, T.H.; Lee, S.; Pan, F.; Kweon, I.S. Variational Prototyping-Encoder: One-Shot Learning With Prototypical Images. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2019; pp. 9454–9462. [Google Scholar] [CrossRef]
- Fe-Fei, L. ; Fergus. In ; Perona. A Bayesian approach to unsupervised one-shot learning of object categories. In Proceedings of the Proceedings Ninth IEEE International Conference on Computer Vision; 2003; pp. 1134–1141. [Google Scholar] [CrossRef]
- Lake, B.M.; Salakhutdinov, R.; Tenenbaum, J.B. Human-level concept learning through probabilistic program induction. Science 2015, 350, 1332–1338. [Google Scholar] [CrossRef] [PubMed]
- Vinyals, O.; Blundell, C.; Lillicrap, T.; kavukcuoglu, k.; Wierstra, D. Matching Networks for One Shot Learning. In Proceedings of the Advances in Neural Information Processing Systems; Lee, D.; Sugiyama, M.; Luxburg, U.; Guyon, I.; Garnett, R., Eds. Curran Associates, Inc., Vol. 29. 2016. [Google Scholar]
- Sung, F.; Yang, Y.; Zhang, L.; Xiang, T.; Torr, P.H.; Hospedales, T.M. Learning to Compare: Relation Network for Few-Shot Learning. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2018; pp. 1199–1208. [Google Scholar] [CrossRef]
- Zhenguo, L.; Fengwei, Z.; Fei, C.; Hang, L. Meta-SGD: Learning to Learn Quickly for Few Shot Learning. ArXiv 2017, abs/1707.09835. [Google Scholar]
- Finn, C.; Abbeel, P.; Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the Proceedings of the 34th International Conference on Machine Learning - Volume 70.; pp. 2017101126–1135.
- Chen, T.; Xie, G.S.; Yao, Y.; Wang, Q.; Shen, F.; Tang, Z.; Zhang, J. Semantically Meaningful Class Prototype Learning for One-Shot Image Segmentation. IEEE Transactions on Multimedia 2022, 24, 968–980. [Google Scholar] [CrossRef]
- Snell, J.; Swersky, K.; Zemel, R. Prototypical Networks for Few-shot Learning. In Proceedings of the Advances in Neural Information Processing Systems; Guyon, I.; Luxburg, U.V.; Bengio, S.; Wallach, H.; Fergus, R.; Vishwanathan, S.; Garnett, R., Eds. Curran Associates, Inc., Vol. 30. 2017. [Google Scholar]
- Wang, C.; Huang, C.; Zhu, X.; Zhao, L. One-Shot Retail Product Identification Based on Improved Siamese Neural Networks. Circuits, Systems, and Signal Processing 2022, 41, 1–15. [Google Scholar] [CrossRef]
- Kingma, D.P.; Welling, M. Auto-Encoding Variational Bayes. In Proceedings of the 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, 2014, Conference Track Proceedings, 2014, [http://arxiv.org/abs/1312.6114v10]., April 14-16.
- Kang, J.S.; Ahn, S.C. Variational Multi-Prototype Encoder for Object Recognition Using Multiple Prototype Images. IEEE Access 2022, 10, 19586–19598. [Google Scholar] [CrossRef]
- Liu, Y.; Shi, D. SS-VPE: Semi-Supervised Variational Prototyping Encoder With Student’s-t Mixture Model. IEEE Transactions on Instrumentation and Measurement 2023, 72, 1–9. [Google Scholar] [CrossRef]
- Xiao, C.; Madapana, N.; Wachs, J. One-Shot Image Recognition Using Prototypical Encoders with Reduced Hubness. In Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision (WACV); 2021; pp. 2251–2260. [Google Scholar] [CrossRef]
- Panchal, S. Implementation and Comparative Quantitative Assessment of Different Multispectral Image Pansharpening Approaches. Signal & Image processing:An International Journal 2015, 6, 35. [Google Scholar] [CrossRef]
- Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.Y.; et al. Segment Anything. In Proceedings of the Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2023; pp. 4015–4026.
- Hu, R.; Hu, W.; Li, J. Saliency Driven Nonlinear Diffusion Filtering for Object Recognition. In Proceedings of the 2013 2nd IAPR Asian Conference on Pattern Recognition; 2013; pp. 381–385. [Google Scholar] [CrossRef]
Short Biography of Authors
![]() |
Aleksandra Kowalczyk received her B.S. degree in Computer Science in 2023 and M.Sc. in Computer Science in 2024, both from the Faculty of Electrical Engineering at the Warsaw University of Technology. She works professionally as a Data Engineer. Her research interests include machine learning, deep learning, and computer vision. |
![]() |
Grzegorze Sarwas received his M.Sc. degree in Electrical Engineering, majoring in Control and Computer Engineering, from Warsaw University of Technology in 2007 and his Ph.D. in Automation and Robotics in 2013. He has been actively engaged in R&D projects in computer vision and data analysis with several companies. Since 2016, he has been an assistant professor at the Warsaw University of Technology, focusing his research on image processing, computer vision, and data modeling. |








| Distance | Method | Recall | Top-nn | |||
|---|---|---|---|---|---|---|
| All | Train | Test | 2-nn | 3-nn | ||
| Euclidean | Reach defined number of epochs | 0.888 | 0.894 | 0.883 | 0.972 | 0.986 |
| Trigger after validation accuracy is achieved | 0.769 | 0.939 | 0.623 | 0.825 | 0.839 | |
| Cosine | Reach defined number of epochs | 0.916 | 0.909 | 0.922 | 0.986 | 0.993 |
| Trigger after validation accuracy is achieved | 0.888 | 0.955 | 0.831 | 0.986 | 0.986 | |
| Image size | Algorithm’s version | One-shot classification recall (%) | |
|---|---|---|---|
| Classes seen | Classes unseen | ||
| VPE | 0.939 | 0.961 | |
| VPE + aug | 0.939 | 0.896 | |
| VPE + aug + rotate | 0.576 | 0.818 | |
| VPE + stn | 0.939 | 0.948 | |
| VPE + aug + stn | 0.955 | 0.896 | |
| VPE | 0.924 | 0.740 | |
| VPE + aug | 0.970 | 0.909 | |
| VPE + aug + rotate | 0.712 | 0.909 | |
| VPE + stn | 0.939 | 0.935 | |
| VPE + aug + stn | 0.909 | 0.922 | |
| Loss function | One-shot classification recall (%) | |
|---|---|---|
| Classes seen | Classes unseen | |
| 0.970 | 0.949 | |
| 0.970 | 0.949 | |
| 0.939 | 0.970 | |
| 0.955 | 0.949 | |
| 0.924 | 0.929 | |
| Classes | Recall | Accuracy | Precision |
|---|---|---|---|
| Seen | |||
| Class 1, Black, orange | 1.000 | 1.000 | 1.000 |
| Class 2, Coca cola, bootle | 1.000 | 1.000 | 1.000 |
| Class 8, Easy boost, pink | 1.000 | 0.994 | 0.917 |
| Class 9, Easy boost, purple | 1.000 | 1.000 | 1.000 |
| Class 10, Level up, blue | 1.000 | 1.000 | 1.000 |
| Class 11, Dzik, green | 1.000 | 1.000 | 1.000 |
| Unseen | |||
| Class 0, Black, light-blue | 1.000 | 1.000 | 1.000 |
| Class 3, Tiger, light-yellow | 0.909 | 0.987 | 0.909 |
| Class 4, Tiger, pink | 0.909 | 0.987 | 0.909 |
| Class 5, Black, green | 1.000 | 1.000 | 1.000 |
| Class 6, Red-bull, purple | 0.909 | 0.994 | 1.000 |
| Class 7, Lipton, bottle | 1.000 | 1.000 | 1.000 |
| Class 12, Oshee, narrow bottle, blue | 1.000 | 1.000 | 1.000 |
| Class 13, Oshee, bottle, blue | 1.000 | 1.000 | 1.000 |
| Category | One-shot classification recall (%) | |
|---|---|---|
| Classes seen | Classes unseen | |
| 0.939 | 0.725 | |
| 0.924 | 0.613 | |
| 0.954 | 0.754 | |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

