1. Introduction
Food origin labeling is an important factor when consumers judge the quality and safety of a product. In Japan, labeling of the origin of ingredients for processed foods became mandatory in April 2022, and interest in the accuracy of origin labeling is growing. This trend is not limited to Japan but is observed worldwide. For example, the European Union (EU) requires origin labeling according to food category, and the United States mandates origin labeling for certain agricultural and livestock products under the COOL Act. These regulations have improved transparency and reliability in the food industry, enabling consumers to make informed choices.
However, despite the growing prevalence of origin labeling, origin fraud remains a significant issue. Fraudulent practices, such as mislabeling the origin of high-value products, pose risks to consumer trust and safety. Consequently, there is a need for reliable methods to verify food origins and support accurate labeling.
Various techniques have been explored for food origin identification. Near-infrared spectroscopy, as used by Chen et al. (2019) [
1], demonstrated its effectiveness in identifying Japanese tea origins with machine learning algorithms. Similarly, fluorescence fingerprinting has emerged as a promising method for addressing these challenges. For instance, Wang et al. (2020) [
2] combined fluorescence spectroscopy with CNN models to classify coffee beans from different origins, achieving high accuracy. Wu et al. (2023) [
3] demonstrated the utility of CNNs in detecting adulteration of vegetable oils, highlighting the power of deep learning in processing high-dimensional fluorescence data.
Research on fluorescence fingerprinting has expanded to various food categories. Yamazaki et al. (2013) [
4] analyzed fluorescence properties of Pu-erh tea to identify regional differences, providing insights into the unique chemical profiles associated with specific tea varieties. Tan et al. (2019) [
5] applied fluorescence spectroscopy to distinguish rice grains from Malaysia and Indonesia, showcasing its adaptability for agricultural products. Similarly, Li et al. (2020) [
6] used fluorescence spectroscopy to detect milk adulteration, demonstrating its versatility across diverse food matrices. Ahmed et al. (2022) [
7] extended its application to fish, revealing that fluorescence properties could reliably assess fish quality and origin.
In the context of tea, Zhao et al. (2022) [
8] combined ultraviolet-visible absorption spectroscopy with partial least squares regression (PLS) to identify the geographical origin of Chinese tea. Their findings underscore the importance of combining spectroscopic data with chemometric methods to enhance classification accuracy. Similarly, Hu et al. (2023) [
9] used fluorescence spectroscopy and chemometric approaches to determine the origin and detect adulteration in rice, further validating the utility of fluorescence fingerprinting for origin identification.
Recent advancements in machine learning have enhanced the applicability of fluorescence fingerprinting. Hybrid modeling approaches, as explored by Jones et al. (2023) [
10], have shown promise in fruit origin identification by integrating fluorescence data with other analytical methods. Tanaka et al. (2024) [
11] demonstrated the potential of fluorescence fingerprinting and deep learning for tuna origin identification, indicating its broader applicability to other food categories. Huang et al. (2023) [
12] investigated the fluorescence properties of dairy products under varying processing conditions, highlighting the robustness of fluorescence techniques in diverse contexts.
Despite these advancements, challenges remain in preprocessing fluorescence spectra to minimize variability caused by environmental factors and measurement conditions. Zandomeneghi et al. (2005) [
13] reported that the oxidation products of natural fluorescent substances significantly influence fluorescence intensity, emphasizing the need for robust preprocessing techniques. Hu et al. (2017) [
14] pointed out that standardizing measurement conditions is crucial to ensure consistent results. Furthermore, Chen et al. (2018) [
15] emphasized the importance of nonlinear machine learning algorithms, such as artificial neural networks (ANN), for analyzing fluorescence data.
In addition, recent studies have further expanded the applications of fluorescence spectroscopy. Smith et al. (2021) [
16] applied fluorescence techniques to regional wine identification, providing insights into the relationship between chemical composition and geographic origin. Khan et al. (2022) [
17] explored fluorescence spectroscopy for grain quality evaluation, demonstrating its applicability in differentiating cultivars. Liu et al. (2023) [
18] utilized fluorescence fingerprinting for multi-component analysis of processed foods, expanding its role in quality assurance. Patel et al. (2023) [
19] investigated fluorescence spectroscopy in identifying adulterants in spice products, highlighting its role in ensuring food authenticity. Roy et al. (2024) [
20] applied fluorescence techniques to identify geographic origins of honey, showing its versatility across natural products.
Moreover, new methodologies have emerged to enhance fluorescence-based analyses. Singh et al. (2023) [
21] demonstrated the use of fluorescence spectroscopy in combination with Raman spectroscopy for comprehensive profiling of fruit juices. Lee et al. (2022) [
22] applied machine learning to fluorescence spectral data to assess coffee bean roasting levels and their relationship to origin. Gupta et al. (2023) [
23] integrated fluorescence fingerprinting with near-infrared spectroscopy to improve classification accuracy for dairy products. Park et al. (2024) [
24] studied the effect of environmental conditions on fluorescence variability in grains, proposing advanced normalization methods. Finally, Wang et al. (2023) [
25] utilized deep learning to analyze fluorescence spectral data for food fraud detection, achieving enhanced performance in multi-class classification tasks.
Building on these insights, this study developed a high-accuracy identification system for green tea origins by combining fluorescence fingerprint data with CNN models. The objectives were to (1) collect fluorescence fingerprint data from green tea samples using a fluorescence spectrophotometer, (2) preprocess the data for machine learning analysis, (3) construct and train a CNN model for origin identification, and (4) evaluate the model's performance. By leveraging the unique fluorescence characteristics of green tea, this study aims to contribute to food traceability and fraud prevention while advancing the application of fluorescence fingerprinting and deep learning in the food industry.
3. Results
3.1. Development of the Origin Identification System
The origin identification system was developed through a three-step process: data acquisition, preprocessing, and model training. A total of 1,200 pieces of fluorescence fingerprint data were used to construct an origin identification model for four types of green tea collected from different production areas in Japan. For each sample, the fluorescence characteristics were measured using a spectrofluorometer, and the FD3 files were converted into text format data. These text format data files were subsequently transformed into image format data for analysis with the deep learning model.
In the preprocessing of the fluorescence fingerprint data, the maximum and minimum values of the fluorescence intensity scale were unified for all image data to reduce data variation and provide the image data for learning in a more consistent state. This processing reduced the effects of noise and was expected to improve the accuracy of the model.
This approach allowed unnecessary color information to be eliminated, which reduced the computational cost and improved processing speed compared to RGB format. Moreover, it was expected that the sensitivity of the model to features important for origin identification would be improved because it became easier to focus on changes in fluorescence intensity.
Additionally, the scale and unnecessary white space common to all images were cropped to reduce information unnecessary for origin identification, allowing for efficient training.
The dataset was divided into training data (80%) and test data (20%), and the training data was further divided into five subsets (folds) using the stratified k-fold cross-validation method to train the model. Through feature extraction by the convolutional layer and classification by the fully connected layer, a CNN architecture specialized for green tea origin identification was designed.
3.2. Results of Green Tea Origin Identification
Table 1 shows the accuracy and loss of the test data when the weighted average ensemble method based on the validation accuracy of each fold of 10 trials was applied. As a result of model evaluation, the average accuracy of the test data using the weighted average ensemble method based on the accuracy of each fold of 10 trials was 92.83%, confirming high prediction accuracy in origin prediction. In addition, the standard deviation of accuracy was small at 0.01329, indicating that the accuracy of the model between trials was stable. Furthermore, the average loss of the test data using the ensemble method was 0.8487, making it clear that there is some room for improvement in the quality of the prediction, but the standard deviation of loss was small at 0.01376, confirming that the prediction performance of the model between trials was generally stable.
Analysis of the confusion matrix based on the test data revealed the numerical values of the classification performance evaluation indexes and the tendency of misclassification for each production area (
Table 2,
Table 3).
The precision, recall, and F1-score of Chiran tea and Yame tea were all over 95%, and the tea was classified with high accuracy.
For Kakegawa tea and Sayama tea, the recall of Kakegawa tea and the precision of Sayama tea were over 93%, which were lower than those of Chiran tea and Yame tea, but the classification performance of the model was good.
On the other hand, the precision of Kakegawa tea was about 85%, the F1-score was about 87%, and the recall of Sayama tea was about 81%, and the F1-score was about 88%.
These indices were below 90%, and the classification performance of the model for Kakegawa tea and Sayama tea was poor compared to Chiran tea and Yame tea.
Precision: The percentage of green tea predicted to be from a particular origin that is actually from that origin.
(True Positive) / (True Positive + False Positive)
Recall: The percentage of green tea samples that are correctly predicted to come from a certain region.
(True Positive) / (True Positive + False Negative)
F1-Score: The percentage of green tea samples that are correctly predicted to come from a certain region.
2 × (Precision × Recall) / (Precision + Recall)
In terms of misclassification trends, Sayama tea was most frequently misclassified as Kakegawa tea, at 10.3 times per trial. Next, Kakegawa tea was misclassified as Sayama tea, at 3.4 times per trial. Misclassification of Chiran tea and Yame tea into other producing areas was less frequent than Kakegawa tea and Sayama tea, with the most frequent misclassification of Yame tea into Sayama tea occurring 2.0 times per trial. Chiran tea was never misclassified into other producing areas in any of the 10 trials. In terms of misclassification trends in this study, it was confirmed that misclassification between Kakegawa tea and Sayama tea accounted for the majority of cases, and misclassification of Chiran tea and Yame tea was rare.
In addition, in the analysis of the learning curve, it was observed that the training loss and validation loss decreased while taking close values (the training accuracy and validation accuracy increased while taking close values) as the number of epochs increased, and the performance of the model improved. As an example, the learning curves of each fold in Trial Number 1 are shown in
Figure 1. It is generally known that overlearning begins to occur when the number of epochs exceeds a certain level. In this study, by introducing early termination, it was possible to detect signs of overlearning and stop learning at the optimal number of epochs. The number of epochs in model training was 35 epochs on average, 51 epochs at maximum, and 20 epochs at minimum, and the computational load required to converge the model was about 1.5 times higher (
Table 4). Although the number of epochs for each training varied slightly, the validation accuracy was within a certain range, with an average of 0.9530 (rounded to the fifth decimal place) and a standard deviation of 0.02041 (rounded to the sixth decimal place). It was confirmed that the best weights were obtained in an average of 25 epochs, since the Patience (waiting period) was set to 10. In addition, the more epochs in training the model, the higher the validation accuracy tended to be (
Figure 2).
Furthermore, the analysis of the fluorescence fingerprint images revealed similarities and differences in the fluorescence characteristics of each production area (
Figure 3). Chiran tea and Yame tea had clear fluorescent characteristics in certain products, making it possible to distinguish them visually. On the other hand, the fluorescent characteristics of Kakegawa tea and Sayama tea were very similar, making it difficult to distinguish them visually. In addition, it was confirmed that the fluorescent characteristics differ from product to product even within the same production area, and there was also variation within the same product.
Fluorescent components in green tea include catechin and tea polyphenols. In the fluorescence fingerprint images of green tea, there was an increase in fluorescence intensity in the ranges corresponding to the excitation wavelength and fluorescence wavelength of these fluorescent substances, which revealed that this was the cause of the differences and similarities in the fluorescent characteristics of green tea. However, the fluorescent components and wavelengths in green tea shown in
Table 5 are data for green tea extract, so attention should be paid to differences from this experiment, which used crushed green tea.
These results indicate that the combination of fluorescence fingerprinting and deep learning can provide an effective method for identifying the origin of green tea. In future research, it is expected that the versatility of the identification model will be improved by targeting more production areas and different types of food. In addition, further improvements in the accuracy of identification should be aimed at by improving data preprocessing and introducing new feature extraction methods.
4. Discussion
In this study, the origin of green tea from various parts of Japan was identified by combining fluorescence fingerprint data with a convolutional neural network (CNN). The accuracy of identification for the test data exceeded 92% (average of 10 trials), demonstrating high accuracy. This shows that the fluorescence fingerprint method is a promising approach for food origin identification, consistent with findings by Chen et al. (2019) [
1], who used near-infrared spectroscopy and machine learning for tea origin identification. Similarly, studies by Wang et al. (2020) [
2] and Smith et al. (2021) [
16] on coffee beans and wine origins, respectively, demonstrated the efficacy of spectroscopic methods combined with machine learning. While the test data loss (0.8487) indicates acceptable performance, further improvements in the model's predictive capability are warranted.
The accuracy of identification varied by origin, with lower accuracy observed for Sayama tea and Kakegawa tea. This is presumably due to the similarity in fluorescence peak patterns for these teas, providing limited features for origin differentiation. Similar challenges were reported by Zhao et al. (2022) [
8] and Tanaka et al. (2024) [
11], where overlapping spectral characteristics reduced classification accuracy in tea and tuna origin identification, respectively. Conversely, Yame tea and Chiran tea, with distinctive peak patterns, achieved high accuracy, aligning with findings by Wu et al. (2023) [
3], who reported that fluorescence spectroscopy effectively distinguished vegetable oils from different origins. This suggests that improving feature extraction methods or combining fluorescence data with additional analytical methods, such as chemical composition analysis or Raman spectroscopy (Singh et al., 2023 [
21]), may enhance classification accuracy.
Data preprocessing played a key role in this study, as normalizing fluorescence intensity reduced noise and variability. However, as Hu et al. (2023) [
9] and Park et al. (2024) [
22] noted, excessive normalization may obscure subtle spectral differences, particularly in datasets with minimal variability. This trade-off highlights the need for optimized preprocessing techniques that balance noise reduction with retention of distinctive spectral features. To further improve classification, especially for Sayama tea and Kakegawa tea, exploring preprocessing methods that preserve fluorescence intensity variation and account for environmental variability could be beneficial.
A notable feature of this study is the use of CNNs, which are well-suited for capturing patterns in image data such as EEM spectra. Compared to conventional methods like KNN and SVM, CNNs have demonstrated superior performance in extracting complex features, as shown in Wu et al. (2023) [
3] and Gupta et al. (2023) [
23]. However, CNNs require substantial data and computational resources. Introducing hybrid models, which combine CNNs with other algorithms, or employing lightweight architectures may address these challenges while maintaining high performance.
The fluorescence characteristics of green tea are influenced by fluorescent components such as catechins and tea polyphenols, which vary across production areas. Zandomeneghi et al. (2005) [
13] demonstrated that variations in catechin content significantly affect fluorescence properties, which align with the results of this study. Misclassifications between Kakegawa tea and Sayama tea could stem from similarities in their fluorescent component profiles, while the high accuracy for Yame tea and Chiran tea may reflect unique chemical compositions. Component analysis integrated with fluorescence fingerprinting could further elucidate these differences and improve model performance. Additionally, studies by Roy et al. (2024) [
20] on honey and Patel et al. (2023) [
19] on spices support the integration of fluorescence fingerprinting with multi-component analysis to enhance accuracy.
The number of epochs was found to influence verification accuracy. An increase in epochs improved accuracy, suggesting that the model training may have stopped before full convergence. Similar observations were made by Chen et al. (2018) [
15] and Khan et al. (2022) [
17], who noted that patience settings and advanced architectures played a critical role in optimizing performance. In this study, a patience of 10 was used, but future research should explore extended patience values, transfer learning, and alternative architectures such as EfficientNet or ResNet to achieve better performance.
Expanding the dataset to include green tea from more regions or other food categories could further enhance the versatility of this approach. Previous studies, such as Liu et al. (2023) [
18], demonstrated the applicability of fluorescence fingerprinting for processed foods, while Singh et al. (2023) [
21] highlighted its potential when combined with Raman spectroscopy. These studies suggest that integrating fluorescence fingerprinting with additional analytical technologies, such as chemometric methods or mass spectrometry, could improve robustness and practicality for broader applications in food quality control.
This study provides a new approach to quality control and prevention of origin fraud in the food manufacturing industry. Future efforts should focus on refining the model through advanced preprocessing techniques, incorporating chemical component data, and exploring lightweight or hybrid architectures. These advancements will enhance the reliability of food origin identification, contributing to improved food traceability and consumer trust.
5. Conclusion
This study demonstrated the effectiveness of a method combining fluorescence fingerprint data and deep learning for identifying the origin of Japanese green teas. The model achieved an identification accuracy exceeding 92%, highlighting the potential of fluorescence fingerprinting as a novel approach for food traceability and fraud prevention. However, challenges remain, particularly in distinguishing samples with similar fluorescence characteristics, such as Sayama tea and Kakegawa tea. Data preprocessing methods and feature extraction techniques must be further refined to address these issues.
Future research should focus on expanding the dataset to include green tea from additional regions, as well as other food categories, such as coffee beans and sake, where the origin significantly impacts quality. Incorporating data on variations in fluorescence characteristics due to harvest times and processing methods could further enhance the model’s versatility. Additionally, combining fluorescence fingerprinting with other analytical methods, such as ingredient analysis and spectral data, holds promise for improving model accuracy by providing a more comprehensive evaluation of food characteristics.
Technological advancements, including exploring alternative architectures like ResNet or EfficientNet and utilizing transfer learning, are critical for optimizing the model's performance and reducing dependency on large datasets. These efforts aim to improve the reliability and practicality of this method, making it a valuable tool for preventing origin fraud and strengthening quality control across the food manufacturing industry.
Through these advancements, the proposed method can contribute to enhancing consumer trust by ensuring the integrity of origin labeling and supporting robust quality assurance practices. This study thus provides a foundation for the broader application of fluorescence fingerprinting and machine learning in food science and industry.