Submitted:
26 April 2023
Posted:
27 April 2023
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Literature review
2.1. Risk control of imported food in Taiwan
2.2. Food cloud big data
2.3. International use of big data for food risk prediction
2.4. Development of food risk prediction in the EU
2.5. Imported food risk prediction in the US
2.6. Improvement of ensemble learning model
3. Materials and methods
3.1. Data sources and analytical tools
3.2. Research methodology
3.2.1. Data collection
3.2.2. Integration and data pre-processing
3.2.3. Establishment of risk prediction model
- Data processing:
- Selection of characteristic risk factors:
- Data exploration and modeling
- Establish the optimum prediction model
- ○
- Training set resampling
- ○
- Repeated modeling
- ○
- Selection of the optimal model
3.2.4. Evaluation of the prediction effectiveness
3.2.5. Evaluation of the prediction effectiveness
4. Results
4.1. Resampling method and optimal ratio
4.2. Generation of the optimum prediction model
4.3. Model prediction effectiveness
5. Discussion
5.1. Fβ was employed to regulate the sampling inspection rate
5.2. Comparison between single algorithm and ensemble algorithm
5.3. Comparison of prediction effectiveness between EL V.2 and EL V.1 models
- The AUC of EL V.1 ranged from 53.43% to 69.03%, while the AUC of EL V.2 ranged from 49.40% to 63.39%. After a majority decision, the Bagging-CART model of EL V.2 with AUC less than 50% was considered unsuitable. By adopting a majority decision strategy through ensemble learning, the influence of the Bagging-CART model was diluted by the other six models. Thus, EL V.2 exhibited better robustness than EL V.1. The advantage of ensemble learning was that when a small number of algorithms were not suitable (worse than random sampling), there was a mechanism for eliminating or weakening influence. The performance of AUC showed that EL V.1 and EL V.2 had a greater prediction probability than randomly selecting unqualified cases (Table 10).
- The predictive evaluation index F1 (8.14%) and PPV (4.38%) of EL V.2 had better results compared to F1 (4.49%) and PPV (2.47%) of EL V.1, indicating that EL V.2 had better predictive effects than EL V.1 (Table 11).
5.4. Evaluation of the effectiveness of the prediction model after its launch
- EL V.2 is better than random sampling. After the ensemble learning model EL V.2, developed in this study, was launched online, the predicted results from 2020 to 2022 were reviewed. Based on the overall general sampling cases throughout the year, it was determined that the unqualified rate was 3.74% in 2020, 4.16% in 2021, and 3.01% in 2022, all of which were significantly higher than the 2.09% in 2019. Further observation showed that the unqualified rates of cases recommended for sampling inspection through ensemble learning in 2020, 2021, and 2022 were 5.10%, 6.36%, and 4.39%, respectively, which were significantly higher than the 2.09% under random sampling inspection in 2019.
- The ensemble learning model should be periodically re-modeled. Based on Table 12, it can be observed that the unqualified rate showed a growing trend from 2019 to 2021, but a slight decrease in 2022 (Figure 5). The results of the further chi-square test showed that the unqualified rate in 2022 was still significantly higher than that in 2019 (p value = 0.000***) (Table 11). However, for ensemble learning prediction models constructed using various machine learning algorithms, the factors and data required for modeling often changed with factors such as the external environment and policies. Re-modeling was necessary to make the best adjustments to "data drift" or "concept drift" in the real world to prevent model failure. Drift referred to the degradation of predictive performance over time due to hidden external environmental factors. Due to the fact that data changed over time, the model's capability to make accurate predictions may decrease. Therefore, it was necessary to monitor data drift and conduct timely reviews of modeling factors. When collecting new data, the data predicted by the model should be avoided to prevent the new model from overfitting when making predictions. The goal of this study is to enable the new model to adjust to changes in the external environment, which will be a sustained effort in the future.
5.5. Research limitations
6. Conclusion
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Food and Drug Administration, Ministry of Health and Welfare of Taiwan. Analysis of Import Inspection Data of Food and Related Products at the Taiwan Border for the Year 107. Annual Rep. Food Drug Res. 2019, 10, 404-408. https://www.fda.gov.tw( accessed on 20 July 2020).
- Food Safety Information Network. Available online: https://www.ey.gov.tw/ofs/A236031D34F78DCF (accessed on 22 July 2022).
- Bouzembrak, Y.; Marvin, H.J.P. Prediction of food fraud type using data from rapid alert system for food and feed (RASFF) and Bayesian network modelling. Food Control 2016, 61, 180–187. [CrossRef]
- Brandao, M.P.; Neto, M.G.; Anjos, V.C.; Bell, M.J.V. Detection of adulteration of goat milk powder with bovine milk powder by front-face and time resolved fluorescence. Food Control 2017, 81, 168–172. [CrossRef]
- Lin, Y. Food Safety in the Age of Big Data. Hum. Soc. Sci. Newslett. 2017, 19, 1.
- Feng, L.; Zhang, Z.; Ma, Y.; Du, Q.; Williams, P.; Drewry, J.; Luck, B. Alfalfa Yield Prediction Using UAV-Based Hyperspectral Imagery and Ensemble Learning. Remote Sens. 2020, 12, 2028. [CrossRef]
- Neto, H.A.; Tavares, W.L.F.; Ribeiro, D.C.S.Z.; Alves, R.C.O.; Fonseca, L.M.; Campos, S.V.A. On the utilization of deep and ensemble learning to detect milk adulteration. BioData Min. 2019, 12, 13. [CrossRef]
- Park, M.S.; Kim, H.N.; Bahk, G.J. The analysis of food safety incidents in South Korea, 1998–2016. Food Control 2017, 81, 196-199. [CrossRef]
- Liu, Y.; Zhou, S.; Han, W.; Li, C.; Liu, W.; Qiu, Z.; Chen, H. Detection of Adulteration in Infant Formula Based on Ensemble Convolutional Neural Network and Near-Infrared Spectroscopy. Foods 2021, 10, 785. [CrossRef]
- Wang, Z.; Wu, Z.; Zou, M.; Wen, X.; Wang, Z.; Li, Y.; Zhang, Q.A. Voting-Based Ensemble Deep Learning Method Focused on Multi-Step Prediction of Food Safety Risk Levels: Applications in Hazard Analysis of Heavy Metals in Grain Processing Products. Foods 2022, 11, 823. [CrossRef]
- Parastar, H.; Kollenburg, G.V.; Weesepoel, Y.; Doel, A.V.D.; Buydens, L.; Jansen, J. Integration of handheld NIR and machine learning to “Measure & Monitor” chicken meat authenticity. Food Control 2020, 112, 107149. [CrossRef]
- Marvin, H.J.P.; Bouzembrak, Y.; Janssen, E.M.; van der Fels-Klerx, H.J.; van Asselt, E.D.; Kleter, G.A. A holistic approach to food safety risks: Food fraud as an example. Food Res. Int. 2016, 89, 463-470. [CrossRef]
- Bouzembrak, Y.; Klüche, M.; Gavai, A.; Marvin, H.J.P. Internet of Things in food safety: Literature review and a bibliometric analysis. Trends Food Sci. Technol. 2019, 94, 54-64. [CrossRef]
- Marvin, H.J.P.; Janssen, E.M.; Bouzembrak, Y.; Hendriksen, P.J.M.; Staats, M. Big data in food safety: An overview. Crit. Rev. Food Sci. Nutr. 2017, 57(11), 2286-2295. [CrossRef]
- U.S. Government Accountability Office. Imported Food Safety: FDA’s Targeting Tool has Enhanced Screening, But Further Improvements are Possible. GAO: Washington D.C., WA, USA, 2016. Available online: https://www.gao.gov/products/gao-16-399 (accessed on 1 November 2021).
- Dasarathy, B.V.; Sheela, B.V. A composite classifier system design: Concepts and methodology. Proc. IEEE Inst Electr Electron Eng. 1979, 67(5), 708-713. [CrossRef]
- Hansen, L.K.; Salamon, P. Neural network ensembles. IEEE Trans. Pattern Anal. Mach. Intell. 1990, 12(10), 993-1001. [CrossRef]
- Polikar, R. Ensemble Based Systems in Decision Making. Circuits Syst. Mag. IEEE 2006, 6(3), 21-45. [CrossRef]
- Sugsnyadevi, K.; Malmurugan, N.; Sivakumar, R. OF-SMED: An Optimal Foreground Detection Method in Surveillance System for Traffic Monitoring. In Proceedings of the 2012 International Conference on Cyber Security, Cyber Warfare and Digital Forensic, CyberSec 2012, 12-17. [CrossRef]
- Pagano, C.; Granger, E.; Sabourin, R.; Gorodnichy, D.O. Detector Ensembles for Face Recognition in Video Surveillance. In Proceedings of the 2012 International Joint Conference on Neural Networks (IJCNN), 1-8. [CrossRef]
- Wang, R.; Bunyak, F.; Seetharaman, G.; Palaniappan, K. Static and Moving Object Detection Using Flux Tensor with Split Gaussian Models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2014, 414-418. [CrossRef]
- Wang, T.; Sonoussi, H. Detection of Abnormal Visual Events via Global Optical Flow Orientation Histogram. IEEE Trans. Inf. Forensics Secur. 2014, 9(6), 998-998. [CrossRef]
- Tsai, C.J. New feature selection and voting scheme to improve classification accuracy. Methodologies Appl. 2019, 23(1), 12017-12030. [CrossRef]
- Ogutu, J.O.; Schulz-Streeck, T.; Piepho, H.P. Genomic selection using regularized linear regression models: Ridge regression, Lasso, elastic net and their extensions. BMC Proc. 2012, 6, S10. [CrossRef]





| Characteristic factor | Times | Characteristic factor | Times | Characteristic factor | Times | Characteristic factor | Times |
| Country of production | 70 | Declaration acceptance unit | 17 | Whether there is a trademark | 5 | Frequency of business registration changes | 1 |
| Inspection method | 64 | Advance release | 17 | Product registration location | 5 | Is it an import broker? | 1 |
| Product classification code | 64 | Cumulative sampling number of imports | 17 | Regional political risk index | 5 | Average price per kilogram | 1 |
| Blacklist vendor | 47 | Human development index | 17 | Tax registration data available? | 5 | Acceptance month | 1 |
| Cumulative number of unqualified imports in sampling inspection | 45 | Packaging method | 15 | Non-punctual declaration rate of delivery | 5 | Acceptance season | 1 |
| Dutiable price in Taiwan dollars | 44 | Capital | 14 | Input/output mode | 4 | Acceptance year | 1 |
| Total net weight | 35 | Import cumulative number of new classification | 13 | Percentage of remaining validity period for acceptance | 4 | Overdue delivery | 1 |
| Global food security indicator | 29 | Years of importer establishment | 10 | Whether there is factory registration? | 4 | Any business registration change? | 1 |
| Type of Obligatory inspection applicant | 26 | Number of companies in the same group | 10 | New company established within the past three months | 3 | Product classification code | 1 |
| Legal rights index | 26 | Is it a pure input industry? | 10 | Number of GHP inspection | 3 | Number of GHP inspection failures | 1 |
| Blacklist product | 25 | Customs classification | 10 | Accumulated number of unqualified imports | 3 | Number of HACCP inspection | 1 |
| Storage and transportation conditions | 23 | County and city level | 10 | Transportation time | 2 | Number of HACCP inspection | 1 |
| Packaging materials | 21 | Total number of imported product lines | 8 | GDP growth rate | 2 | Manufacturing date later than effective date | 1 |
| Cumulative number of reports | 20 | Number of downstream manufacturers | 7 | Number of branch companies | 2 | Valid for more than 5 years | 1 |
| Total classification number of imports | 20 | Is it a branch company? | 7 | Number of overdue deliveries | 2 | Acceptance date later than manufacturing date | 1 |
| Corruption perceptions index | 18 | Number of non-review inspections | 7 | Any intermediary trade? | 2 | Acceptance date later than the effective date | 1 |
| GDP | 17 | Rate of non-timely declaration of goods received | 7 | Cumulative number of imports not released | 2 | Number of business projects in the food industry | 1 |
| Type | Definition |
| True Positive, TP | Each batch of inspection application was predicted as unqualified by the model, and it was actually unqualified. |
| False Positive, FP | Each batch of inspection application was predicted as unqualified by the model, but it was actually qualified. |
| True Negative | Each batch of inspection application was predicted as qualified by the model, and it was actually qualified. |
| False Negative | Each batch of inspection application was predicted as qualified by the model, but it was actually unqualified. |
| Imbalanced data processing methods and sampling ratio# | Precision (PPV) |
Recall | F1 |
| SMOTE 7:3 | 6.03% | 64.91% | 11.03% |
| SMOTE 6:4 | 5.68% | 66.15% | 10.46% |
| SMOTE 5:5 | 5.48% | 75.16% | 10.22% |
| SMOTE 4:6 | 4.94% | 77.33% | 9.28% |
| SMOTE 3:7 | 4.80% | 81.68% | 9.06% |
| Equal magnification 7:3 | 4.62% | 87.89% | 8.77% |
| Data set | Combination number | Algorithm | ACR | Recall | PPV | F1 | AUC | TN | FP | TP | FN | Sampling rate | Rejection rate | |
| Validation set | A1 | Bagging-C5.0 | 92.3% | 2.2% | 4.1% | 2.8% | 60.6% | 2451 | 70 | 3 | 135 | 2.75 | 4.11 | |
| A2 | Bagging-CART | 86.6% | 25.4% | 12.2% | 16.5% | 69.5% | 2269 | 252 | 35 | 103 | 10.79 | 12.20 | ||
| A3 | Bagging-EN | 27.9% | 90.6% | 6.2% | 11.5% | 68.0% | 617 | 1904 | 125 | 13 | 76.31 | 6.16 | ||
| A4 | Bagging-GBM | 84.7% | 37.7% | 13.9% | 20.4% | 72.3% | 2200 | 321 | 52 | 86 | 14.03 | 13.94 | ||
| A5 | Bagging-LR | 83.0% | 31.2% | 10.8% | 16.0% | 68.0% | 2164 | 357 | 43 | 95 | 15.04 | 10.75 | ||
| A6 | Bagging-NB | 69.7% | 60.1% | 9.9% | 17.1% | 73.2% | 1769 | 752 | 83 | 55 | 31.40 | 9.94 | ||
| A7 | Bagging-RF | 93.4% | 0.7% | 2.6% | 1.1% | 71.0% | 2483 | 38 | 1 | 137 | 1.47 | 2.56 | ||
| A8 | EL | 85.5% | 28.3% | 12.0% | 16.8% | 72.5% | 2235 | 286 | 39 | 99 | 12.22 | 12.00 | ||
| B1 | Bagging-C5.0 | 89.7% | 22.5% | 15.6% | 18.4% | 72.7% | 2353 | 168 | 31 | 107 | 7.48 | 15.58 | ||
| B2 | Bagging-CART | 88.4% | 19.6% | 12.0% | 14.9% | 68.7% | 2323 | 198 | 27 | 111 | 8.46 | 12.00 | ||
| B3 | Bagging-EN | 7.7% | 97.8% | 5.2% | 9.9% | 69.7% | 71 | 2450 | 135 | 3 | 97.22 | 5.22 | ||
| B4 | Bagging-GBM | 90.1% | 26.8% | 18.6% | 22.0% | 73.1% | 2359 | 162 | 37 | 101 | 7.48 | 18.59 | ||
| B5 | Bagging-LR | 87.9% | 28.3% | 14.9% | 19.5% | 71.2% | 2299 | 222 | 39 | 99 | 9.82 | 14.94 | ||
| B6 | Bagging-NB | 79.6% | 50.0% | 12.7% | 20.3% | 73.3% | 2048 | 473 | 69 | 69 | 20.38 | 12.73 | ||
| B7 | Bagging-RF | 90.6% | 24.6% | 18.9% | 21.4% | 75.2% | 2375 | 146 | 34 | 104 | 6.77 | 18.89 | ||
| B8 | EL | 88.2% | 31.9% | 16.6% | 21.8% | 74.0% | 2300 | 221 | 44 | 94 | 9.97 | 16.60 | ||
| C1 | Bagging-C5.0 | 93.4% | 11.6% | 23.2% | 15.5% | 67.2% | 2468 | 53 | 16 | 122 | 2.59 | 23.19 | ||
| C2 | Bagging-CART | 86.6% | 39.9% | 16.8% | 23.6% | 69.7% | 2248 | 273 | 55 | 83 | 12.34 | 16.77 | ||
| C3 | Bagging-EN | 81.3% | 11.6% | 4.1% | 6.0% | 50.1% | 2145 | 376 | 16 | 122 | 14.74 | 4.08 | ||
| C4 | Bagging-GBM | 88.9% | 33.3% | 18.5% | 23.8% | 73.0% | 2318 | 203 | 46 | 92 | 9.36 | 18.47 | ||
| C5 | Bagging-LR | 86.1% | 33.3% | 14.2% | 20.0% | 69.0% | 2244 | 277 | 46 | 92 | 12.15 | 14.24 | ||
| C6 | Bagging-NB | 72.5% | 58.7% | 10.7% | 18.1% | 73.7% | 1847 | 674 | 81 | 57 | 28.39 | 10.73 | ||
| C7 | Bagging-RF | 94.6% | 7.2% | 38.5% | 12.2% | 75.4% | 2505 | 16 | 10 | 128 | 0.98 | 38.46 | ||
| C8 | EL | 92.0% | 23.2% | 22.9% | 23.0% | 73.6% | 2413 | 108 | 32 | 106 | 5.27 | 22.86 | ||
| D1 | Bagging-C5.0 | 92.2% | 21.7% | 23.1% | 22.4% | 73.2% | 2421 | 100 | 30 | 108 | 4.89 | 23.08 | ||
| D2 | Bagging-CART | 87.3% | 28.3% | 14.0% | 18.8% | 72.9% | 2282 | 239 | 39 | 99 | 10.46 | 14.03 | ||
| D3 | Bagging-EN | 54.2% | 50.7% | 5.7% | 10.3% | 52.6% | 1370 | 1151 | 70 | 68 | 45.92 | 5.73 | ||
| D4 | Bagging-GBM | 91.1% | 20.3% | 18.2% | 19.2% | 74.2% | 2395 | 126 | 28 | 110 | 5.79 | 18.18 | ||
| D5 | Bagging-LR | 90.1% | 22.5% | 16.7% | 19.1% | 70.1% | 2366 | 155 | 31 | 107 | 7.00 | 16.67 | ||
| D6 | Bagging-NB | 77.4% | 52.2% | 11.9% | 19.3% | 73.7% | 1986 | 535 | 72 | 66 | 22.83 | 11.86 | ||
| D7 | Bagging-RF | 91.0% | 29.0% | 22.0% | 25.0% | 76.4% | 2379 | 142 | 40 | 98 | 6.84 | 21.98 | ||
| D8 | EL | 90.2% | 28.3% | 19.4% | 23.0% | 75.1% | 2359 | 162 | 39 | 99 | 7.56 | 19.40 | ||
| E1 | Bagging-C5.0 | 92.3% | 9.4% | 14.1% | 11.3% | 66.3% | 2442 | 79 | 13 | 125 | 3.46 | 14.13 | ||
| E2 | Bagging-CART | 84.8% | 33.3% | 12.9% | 18.6% | 68.4% | 2210 | 311 | 46 | 92 | 13.43 | 12.89 | ||
| E3 | Bagging-EN | 88.2% | 3.6% | 2.7% | 3.1% | 58.1% | 2341 | 180 | 5 | 133 | 6.96 | 2.70 | ||
| E4 | Bagging-GBM | 88.1% | 27.5% | 14.9% | 19.3% | 71.5% | 2304 | 217 | 38 | 100 | 9.59 | 14.90 | ||
| E5 | Bagging-LR | 85.9% | 32.6% | 13.8% | 19.4% | 69.1% | 2239 | 282 | 45 | 93 | 12.30 | 13.76 | ||
| E6 | Bagging-NB | 73.2% | 58.0% | 10.9% | 18.3% | 73.7% | 1867 | 654 | 80 | 58 | 27.60 | 10.90 | ||
| E7 | Bagging-RF | 94.5% | 2.9% | 26.7% | 5.2% | 73.1% | 2510 | 11 | 4 | 134 | 0.56 | 26.67 | ||
| E8 | EL | 91.1% | 16.7% | 15.9% | 16.3% | 70.9% | 2399 | 122 | 23 | 115 | 5.45 | 15.86 | ||
| F1 | Bagging-C5.0 | 92.4% | 7.2% | 11.9% | 9.0% | 71.1% | 2447 | 74 | 10 | 128 | 3.16 | 11.90 | ||
| F2 | Bagging-CART | 89.9% | 9.4% | 8.3% | 8.8% | 67.2% | 2377 | 144 | 13 | 125 | 5.90 | 8.28 | ||
| F3 | Bagging-EN | 62.3% | 19.6% | 2.9% | 5.1% | 55.6% | 1629 | 892 | 27 | 111 | 34.56 | 2.94 | ||
| F4 | Bagging-GBM | 91.1% | 16.7% | 16.0% | 16.3% | 72.9% | 2400 | 121 | 23 | 115 | 5.42 | 15.97 | ||
| F5 | Bagging-LR | 90.4% | 18.8% | 15.3% | 16.9% | 68.4% | 2377 | 144 | 26 | 112 | 6.39 | 15.29 | ||
| F6 | Bagging-NB | 79.7% | 47.8% | 12.4% | 19.6% | 73.6% | 2053 | 468 | 66 | 72 | 20.08 | 12.36 | ||
| F7 | Bagging-RF | 90.9% | 17.4% | 15.9% | 16.6% | 74.3% | 2394 | 127 | 24 | 114 | 5.68 | 15.89 | ||
| F8 | EL | 91.7% | 10.1% | 12.5% | 11.2% | 72.1% | 2423 | 98 | 14 | 124 | 4.21 | 12.50 |
| Data set | Combination number | Algorithm | ACR | Recall | PPV | F1 | AUC | TN | FP | TP | FN | Sampling rate | Rejection rate |
| Test set | C1 | Bagging-C5.0 | 94.7% | 8.0% | 15.7% | 10.6% | 68.0% | 3921 | 70 | 13 | 150 | 2.00 | 15.66 |
| C2 | Bagging-CART | 89.4% | 38.7% | 15.6% | 22.2% | 71.6% | 3649 | 342 | 63 | 100 | 9.75 | 15.56 | |
| C3 | Bagging-EN | 88.9% | 9.2% | 4.6% | 6.1% | 53.2% | 3678 | 313 | 15 | 148 | 7.90 | 4.57 | |
| C4 | Bagging-GBM | 87.8% | 39.9% | 13.8% | 20.5% | 72.0% | 3584 | 407 | 65 | 98 | 11.36 | 13.77 | |
| C5 | Bagging-LR | 49.7% | 64.4% | 4.9% | 9.1% | 51.6% | 1960 | 2031 | 105 | 58 | 51.42 | 4.92 | |
| C6 | Bagging-NB | 74.3% | 52.8% | 8.0% | 13.9% | 66.8% | 2999 | 992 | 86 | 77 | 25.95 | 7.98 | |
| C7 | Bagging-RF | 95.8% | 1.8% | 18.8% | 3.4% | 72.5% | 3978 | 13 | 3 | 160 | 0.39 | 18.75 | |
| C8 | EL | 90.9% | 31.9% | 16.4% | 21.6% | 69.9% | 3725 | 266 | 52 | 111 | 7.66 | 16.35 | |
| D1 | Bagging-C5.0 | 91.2% | 12.3% | 8.2% | 9.8% | 69.3% | 3767 | 224 | 20 | 143 | 5.87% | 8.20% | |
| D2 | Bagging-CART | 89.5% | 14.7% | 7.5% | 9.9% | 67.6% | 3693 | 298 | 24 | 139 | 7.75% | 7.45% | |
| D3 | Bagging-EN | 56.7% | 52.1% | 4.7% | 8.6% | 57.1% | 2272 | 1719 | 85 | 78 | 43.43% | 4.71% | |
| D4 | Bagging-GBM | 93.1% | 13.5% | 13.0% | 13.3% | 71.0% | 3844 | 147 | 22 | 141 | 4.07% | 13.02% | |
| D5 | Bagging-LR | 92.4% | 16.6% | 13.0% | 14.6% | 65.3% | 3811 | 180 | 27 | 136 | 4.98% | 13.04% | |
| D6 | Bagging-NB | 81.3% | 39.9% | 8.7% | 14.3% | 66.8% | 3313 | 678 | 65 | 98 | 17.89% | 8.75% | |
| D7 | Bagging-RF | 86.2% | 33.1% | 10.4% | 15.8% | 68.5% | 3525 | 466 | 54 | 109 | 12.52% | 10.38% | |
| D8 | EL | 91.4% | 19.6% | 12.3% | 15.1% | 69.0% | 3763 | 228 | 32 | 131 | 6.26% | 12.31% |
| DataYear | Overall sampling inspection | EL V.2 sampling inspection | ||||||
| Number of inspection application batches | Sampling rate | Rejection rate | Prediction batch number | Suggested number of inspection batches | Sampling rate | Number of hit batches | Hit rate | |
| 2019 | 29573 | 10.68% | 2. 09% | 4154 | 318 | 7.66% | 52 | 16.35% |
| Model differences | EL V.1 | EL V.2 | Description |
| Screening of characteristic risk factors | Single-factor analysis and stepwise regression were used to screen characteristic factors using simple statistical methods. |
|
Prevent factor collinearity. Make the remaining factors more independent and important. |
| Add algorithms | 5 algorithms | 7 algorithms | When the prediction effect of multiple models was reduced, the AUC>50% can still be retained for integration to improve the robustness of the model. |
| Adjust model parameters | Fβ regulated the sampling inspection rate.Five models had consistent values. | Fβ regulated the sampling inspection rate.Seven models were independently adjusted. | The sampling rate was regulated and the elasticity was set at 2– 8%. |
| Beta | PPV (or sampling inspection rejection rate) |
Recall (or hit rate of unqualified products) |
Number of border inspection application batches | Number of sampling inspection batches | Sampling rate | Fβ threshold | ||||
| Bagging-NB | Bagging-C5.0 | Bagging-CART | Bagging-LR | Bagging-RF | ||||||
| 1 | 20.00% | 33.33% | 249 | 5 | 2.01% | 0.94 | 0.76 | 0.48 | 0.46 | 0.68 |
| 1.2 | 20.00% | 33.33% | 249 | 5 | 2.01% | 0.94 | 0.76 | 0.48 | 0.46 | 0.68 |
| 1.4 | 15.38% | 66.67% | 249 | 13 | 5.22% | 0.94 | 0.46 | 0.48 | 0.46 | 0.64 |
| 1.6 | 21.43% | 100.00% | 249 | 14 | 5.62% | 0.94 | 0.46 | 0.48 | 0.46 | 0.6 |
| 1.8 | 21.43% | 100.00% | 249 | 14 | 5.62% | 0.94 | 0.46 | 0.48 | 0.46 | 0.6 |
| 2 | 21.43% | 100.00% | 249 | 14 | 5.62% | 0.31 | 0.46 | 0.48 | 0.46 | 0.6 |
| 2.2 | 21.43% | 100.00% | 249 | 14 | 5.62% | 0.31 | 0.46 | 0.48 | 0.46 | 0.6 |
| 2.4 | 17.65% | 100.00% | 249 | 17 | 6.83% | 0.31 | 0.46 | 0.43 | 0.46 | 0.6 |
| 2.6 | 16.67% | 100.00% | 249 | 18 | 7.23% | 0.31 | 0.33 | 0.43 | 0.46 | 0.6 |
| 2.8 | 16.67% | 100.00% | 249 | 18 | 7.23% | 0.31 | 0.33 | 0.43 | 0.46 | 0.6 |
| 3 | 8.82% | 100.00% | 249 | 34 | 13.65% | 0.31 | 0.33 | 0.28 | 0.12 | 0.6 |
| 3.2 | 5.45% | 100.00% | 249 | 55 | 22.09% | 0.31 | 0.18 | 0.28 | 0.12 | 0.25 |
| Recommended threshold | PPV Unqualified rate of sampling inspection |
Recall Identification rate of unqualified sampling |
Sampling rate |
| 0.39 | 13.29% | 100.00% | 9.29% |
| 0.40 | 13.29% | 100.00% | 9.29% |
| 0.41 | 14.79% | 100.00% | 8.41% |
| 0.42 | 15.16% | 100.00% | 7.96% |
| 0.43 | 15.45% | 100.00% | 7.56% |
| 0.44 | 15.45% | 100.00% | 7.56% |
| 0.45 | 20.43% | 100.00% | 6.19% |
| Model Revision |
AUC of algorithm | ||||||
| Bagging- EN |
Bagging- LR |
Bagging-GBM | Bagging- BN |
Bagging- RF |
Bagging-C5.0 | Bagging- CART |
|
| EL V.1 | - | 69.03% | - | 53.43% | 57.40% | 63.20% | 63.17% |
| EL V.2 | 63.39% | 63.13% | 62.67% | 62.13% | 61.41% | 57.72% | 49.40% |
| Year of analysis | Model revision | Number of algorithms | Recall | PPV | F1 |
| 2020 | EL V.1 | 5 | 25.00% | 2.47% | 4.49% |
| EL V.2 | 7 | 58.33% | 4.38% | 8.14% |
| General sampling inspection items for each year | Number of inspection application batches | Overall sampling rate | EL sampling rate | Overall rejection rate | EL rejection rate |
| 2022 | 27074 | 10.90% (2952/27074) | 6.48% (1754/27074) | 3.01% (89/2952) | 4.39% (77/1754) |
| 2021 | 23670 | 9.14% (2163/23670) | 6.24% (1478/23670) | 4.16% (90/2163) | 6.36% (84/1478) |
| 2020 | 26823 | 6.07% (1629/26823) | 2.78% (745/26823) | 3.74% (61/1629) | 5.10% (38/ 745) |
| 2019 | 29573 | 10.68% (3157/29573) | - | 2.09%(66/3157) | - |
| General sampling inspection and evaluation items for each year | Annual overall sampling inspection | EL sampling inspection | |||
| Annual rejection rate (Number of unqualified pieces / total number of sampled pieces) |
p value | EL rejection rate (Number of EL unqualified pieces / number of EL sampled pieces) |
p value | ||
| 2022 | 3.01% (89/2952) | 0.022* | 4.39% (77/1754) | 0.000*** | |
| 2021 | 4.16% (90/2163) | 0.000*** | 6.36% (84/1478) | 0.000*** | |
| 2020 | 3.74% (61/1629) | 0.001** | 5.10% (38/ 745) | 0.000*** | |
| 2019 | 2.09% (66/3157) | - | - | ||
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).