2. Materials and Methods
The experiment began on December 14, 2022 and ended on January 12, 2023. Ten control samples were initially measured with an electronic nose. Subsequently, 5% and 0.1% sodium benzoate samples were measured. The treatments are shown in
Table 1. Lemons of the ‘Fino’ and ‘Verna’ varieties were used, provided by the company Citrus Gea Belmonte S.L., located in Torreagüera (Murcia) Spain, which cultivated this variety on a commercial plot located in the north of Murcia.
For the application of the treatments in the field (
Table 1), 8 lemon trees were randomly selected. This application was carried out by foliar spraying (15 total liters per treatment) and up to 4 times at one-month intervals. The last applications were carried out a week before harvest, which occurred at its optimal stage of commercial ripening.
Where the treatments are the next. CNT (control), distilled water was added; BS0, 5 (5% sodium benzoate, solution of 75 g BS, 1.5 L water, 15 mL wetting agent); BS0, 1 (0.1% sodium benzoate, 150 g BS, 1.5 L water, 15 mL wetting agent). They were prepared for application in the field to the already selected trees, except for the control, since it only consisted of distilled water. Data of the specific concentrations are detailed in section 2.2.
2.1. Experimental Design and Storage Conditions
The lemons were transferred to the laboratory immediately after harvesting, where they were organized into different batches of 10 lemons according to the applied treatment. This classification was carried out visually based on size, color and appearance, in order to obtain a homogeneous sample that did not present visually perceptible defects. The cold storage conditions were in cold rooms at 8°C and 80-85% relative humidity (RH). The characterization of the fruits took place through the low-cost prototype of the eNose and this was carried out on the days following receipt of the fruits in the laboratory until reaching a week of analysis with each one. of the varieties (approximately 1 month of testing).
For the application of the treatments, 6 trees were randomly selected for the ‘Fino’ variety and the same number for the ‘Verna’ variety. This cited application was made by foliar spraying (15 liters in total per treatment) and the times that these were applied were 4 at intervals of one month. The last applications were carried out 7 days before harvest, which took place in its commercial maturation stage; on February 17, 2022 for the ‘Fino’ variety and on May 26, 2022 for the ‘Verna’ variety.
The lemons were transferred to the laboratory immediately after their collection, where they were organized in different batches according to the variety and the treatment applied. The ‘Fino’ variety lemons were randomly classified into 7 batches of 10 pieces per treatment, as well as the ‘Verna’ variety lemons, at the time of harvest. Said classification was carried out visually based on size, color and appearance, in order to obtain a homogeneous sample that did not present visible defects. Cold storage conditions were: 8°C and 80-85% relative humidity (RH). The determinations of the volatile organic compounds (VOCs), through the low-cost prototype of eNose, were carried out on the days following the receipt of the fruits in the laboratory until reaching one week of analysis with each one of the varieties.
2.2. Formulation of Treatments
The application of the treatments was distributed over the lemon trees in the field, with a total of 6 different treatments for each of the varieties, in addition to the control. The solutions were prepared by dissolving 75, 150 and 450 grams (g) of sodium benzoate in 1.5 liters (L) of water and 75, 150 and 450 g of potassium sorbate in 1.5 L of water, in order to obtain 15 L total solutions at concentrations of 0.5, 1, and 3%, respectively. Likewise, 15 mL of ecological wetting agent were added to the solutions to improve adhesion. These solutions were applied in the field recently formulated, although the control trees of both varieties were not sprayed.
Table 2
Where, CNT (control), distilled water was added; BS0, 5 (5% sodium benzoate, solution of 75 g BS, 1.5 L water, 15 mL wetting agent); BS0, 1 (0.1% sodium benzoate, 150 g BS, 1.5 L water, 15 mL wetting agent). They were prepared for application in the field to the already selected trees, except for the control, since it only consisted of distilled water. Details of the specific concentrations are detailed in section 2.2.
2.3. Determination of Volatile Organic Compounds Using a Low-Cost Prototype of an eNose
The tests, for each of the lemon varieties, took place the days after the receipt of the fruits in the laboratory, using a low-cost electronic nose (EN) prototype. The prototype was designed by the technology-based company TeleNatura, in collaboration with the work group of the Engineering Department of the Miguel Hernández University (UMH), Spain. The device included a simple sample delivery system, which consisted of a chamber where the samples were deposited, together with a fan, as well as an array of sensors and a microcontroller unit (Arduino Nano microcontroller with USB serial connection;
Figure 1).
The same design was used in works such as the identification of La Rioja wine varieties [
4], the detection of the lethal bronzing disease in palm trees [
5] and the discrimination between a range of types of olive oil [
6]. The sensor array consisted of eight MQS sensors (MQ-135, MQ-2, MQ-3, MQ-4, MQ-5, MQ-7, MQ-8, and MQ-9), manufactured by Hanwei Electronics. Co., Ltd. (Zhengzhou, China), which had resistors (RL) that changed their values depending on the mixture of gases present in them (
Table 3).
The variation of the voltage across these resistances introduces the modulation of the heating of the sensor. Said device works with an Arduino Nano microcontroller to generate the voltage signals introduced into the sensors and measure their responses, together with an analog circuit, made up of a DAC and operational amplifiers to control the heating of the sensors. Data sheets for some sensors (MQ7 and MQ9 in particular) recommend switching between two heater voltages (5.0 V and 1.4 V) in a 60 + 90 s cycle, with 90 s sensor response to end of interval. This can significantly improve sensitivity as it is now possible to detect the sensor at different operating temperatures. EN is built to be able to vary the voltage input to the sensors. In the present investigation, the voltage was varied sinusoidally with a period of 128 s, with values ranging from 1.6 to 4.8 V and a total of 256 steps in each period. The sample chamber of the device was a glass chamber and was connected by a 6 mm PVC tube, through PG7 nylon cable glands, to a separate PP5 (food grade polypropylene) detection chamber, which contained the sensor assembly. Likewise, another tube returned to the sample chamber, completing a hermetically closed circuit.
To normalize the outputs of the MQ sensors, 50 kΩ trim pots were used as load resistors, with the pot values adjusted until the sensor channels gave a voltage difference of less than 100 mV, at least half an hour after introduce the sample. Likewise, the impedance characteristics of each sensor were balanced, since this produced a degree of normalization to counteract the variability produced during their manufacture.
The analysis of the samples through the EN began for the ‘Fino’ variety on February 21, 2022 and for the ‘Verna’ variety on May 30, 2022, these dates being the appropriate ones due to the seasons of harvest of each of the varieties. The investigation was carried out from the beginning of the analyzes to the completion of one week with both lemon varieties, ensuring that the 10 samples from each batch were measured at least once.
All experiments were carried out in a clean and disinfected environment. First, the EN sensors were exposed to ambient air for 30 minutes. Next, the analysis of the samples began, which consisted of introducing each of the ten pieces of the chosen batch into a glass jar of X mL capacity (sample chamber). Once the fruit was inside the sample chamber, it was hermetically closed, leaving the sensors included in the relevant lid exposed to the sample in question for a period of 10 minutes (
Figure 2: lid and sensors). After this time, and before introducing a second sample, the device chamber is cleaned. For this, it is left exposed to ambient air for 20 minutes. At all times, the sensors were prevented from having direct contact with the sample to avoid possible saturations in the measurements. All samples analyzed and ambient air exposures from the sensors are manually recorded in self-developed software before starting to measure. Said software was intended to connect the device at low cost (EN) to a computer with a minimum of 64 bits, in order to obtain the data analyzed during the day, generate Excel files (.xls), which contained the data of each of daily measurements, and perform analysis of raw data for extraction characteristics.
2.4. Electronic Nose Data Analysis
To analyze the raw data, the first step is to perform feature extraction. The Discrete Fourier Transform (DFT) was used to extract the frequency coefficients of each temporal signal from each sensor. For this, the same internal software mentioned above was used. A total of 80 coefficients are extracted for each sample, forming a feature vector together with a label that is later stored and used for classification. The entire data analysis process is included in
Figure 2.
Second, Principal Component Analysis (PCA) is used to reduce the dimensionality of the data from 80 to 2. In this way, 2D plots can be created in which each point represents one of the analyzed samples. This tool allows us to look at sample pools and select a subset of coefficients that we will use for classification. In this case, we chose to use 40 coefficients, whose retained variability was >= 95%. The PCA and the graphical representation were executed using the Python® language, using the sklearn and matplotlib libraries.
The results of 3 classification algorithms are compared below. One of them is unsupervised, k-nearest-neighbors (k-nn) and the other two are supervised, Sequential Neural Networks (SNN) and support vector machines (MVS). The objective is to see which algorithms classify the results better, hypothesizing that the supervised models will outperform the unsupervised one, and that with the best of these it will be possible to obtain a fast data classification tool, which together with the analysis device will provide the ability to easily, reliably and quickly analyze large data sets.
The k-nearest-neighbors (k-nn) algorithm is used, which is an approximation to data classification that estimates the probability that a data point belongs to one group or another based on the group in which the data are found. closest data points. This algorithm classifies in an unsupervised way, that is, it does not make use of the labels that we generate. Instead, it generates labels from the grouping of the data based on the distances of each sample to a series of clusters. Comparing the labels generated by the algorithm with the original ones from the samples, we can measure the classification capacity of the k-nn.
Models based on supervised learning have been trained in order to obtain a tool for rapid data classification. A first group of models are based on SNN sequential neural networks. Hyperparameter tuning is an essential step in machine learning to ensure that the model performs optimally and is suitable for real-world applications. The process can be time consuming, but it can significantly improve model performance and make it more robust to new data. The hyperparameter adjustment process was carried out randomly and they were chosen based on the results on a validation set. 80% of the data from all the analyzes carried out so far were used to train them and the remaining 20% to perform a validation of the model. The models are trained to return an estimate of the probability that they belong to each of the categories into which the samples have been divided. We assume that the group with the highest probability is the output of the model.
Finally, the feature vectors extracted and processed by PCA are classified using models based on support vector machines (SVM). Support Vector Machines (SVMs) are a type of machine learning algorithm used for classification and regression analysis. C and gamma are two important parameters used in SVMs. C is a regularization parameter that controls the tradeoff between low training error and low test error, while gamma controls the shape of the decision bound. A small value of gamma will result in a decision boundary with a larger radius and a smoother decision surface, while a large value of gamma will result in a more complex decision surface that may overfit the training data. The optimal values of C and gamma can be determined using a grid search or random search process to get the best performance of the SVM model on the given data set. This process involves testing different combinations of C and gamma values and evaluating the model’s performance on a validation data set, selecting the values that produce the highest accuracy or lowest error rate. To generate and train the models, optimal values for C and gamma are sought. Using a recursive code with k-fold 5 and 20% trial values, C and gamma values are obtained with a score of 0.94. The data is then divided into training and test groups with proportions of 80-20%, respectively. For the binary classification between infected and non-infected, no particular classifier is specified. For classification among the four types of infection, a onevsone (ovo) multiclassifier was used.
For the analysis of results, confusion matrices are made for each of the models through which the total precision of the models, their sensitivity and specificity are calculated. The maximum, minimum and mean values are calculated in a 10-fold testing process.
Accuracy = number of hits / total number of predictions
For cases of binary classification, the measures are also used:
Sensitivity = number of positive hits / (number of positive predictions)
Specificity = number of negative hits / (number of negative predictions)