Submitted:
11 March 2024
Posted:
15 March 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Materials and Methods
2.1. Data Preparation and Machine Learning Models
2.2. Dataset Sources
2.3. Data Preprocessing
2.4. Dataset Balancing
2.6. Interpretability in Machine Learning Models
3. Results
3.1. Prediction Performance
3.2. Interpretability
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Zhang W, Fang M, Dong D, Wang X, Ke X, Zhang L, et al. Development and validation of a CT-based radiomic nomogram for preoperative prediction of early recurrence in advanced gastric cancer. Radiother Oncol. 2020, Apr;145:13-20. [CrossRef]
- Liu B, Tan J, Wang X, Liu X. Identification of recurrent risk-related genes and establishment of support vector machine prediction model for gastric cancer. Neoplasma. 2018, Mar 14;65(3):360-366. [CrossRef]
- Zhou C, Hu J, Wang Y, Ji MH, Tong J, Yang JJ, et al. A machine learning-based predictor for the identification of the recurrence of patients with gastric cancer after operation. Sci Rep. 2021, Jan 15;11(1):1571. [CrossRef]
- Breiman, L. Random forest. Machine Learning. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Haykin, S. Neural networks: a comprehensive foundation. Prentice Hall PTR; 1994.
- Salzberg, SL. C4.5: Programs for Machine Learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., 1993. Mach Learn 16, 1994, 235–240. [CrossRef]
- Yoav Freund, Robert E Schapire. Experiments with a New Boosting Algorithm. International Conference on Machine Learning; Bari; 1996 July 3-6; 1996. p.148-156.
- Efron B, Tibshirani RJ. An Introduction to the Bootstrap. Chapman and Hall, New York. 1993.
- Breiman, L. Bagging predictors. Machine Learning. 2004, 24, 123–140. [Google Scholar] [CrossRef]
- Chawla N, Bowyer K, Hall LO, Kegelmeyer WP. SMOTE: Synthetic Minority Over-sampling Technique. J. Artif. Intell. Res. 2002, 16, 321–357. [CrossRef]
- Lundberg SM, Lee SA. Unified Approach to Interpreting Model Predictions. 2017. [CrossRef]
- Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B, et al. From Local Explanations to Global Understanding with Explainable AI for Trees. Nat Mach Intell. 2020, Jan;2(1):56-67. [CrossRef]
- Lundberg SM, Nair B, Vavilala MS, Horibe M, Eisses MJ, Adams T, et al. Explainable machine-learning predictions for the prevention of hypoxemia during surgery. Nat Biomed Eng. 2018, Oct;2(10):749-760. [CrossRef]
- Krawczyk B, Wozniak M, Schaefer G. Cost-sensitive decision tree ensembles for effective imbalanced classification. Appl. Soft Comput. 2014, 14, 554–562.
- C Seiffert, TM Khoshgoftaar, JV Hulse, A Napolitano. A Comparative Study of Data Sampling and Cost Sensitive Learning. 2008 IEEE International Conference on Data Mining Workshops; 2008, p. 46-52.
- Thai-Nghe N, Gantner Z, Schmidt-Thieme L. Cost-sensitive learning methods for imbalanced data. The 2010 International Joint Conference on Neural Networks (IJCNN). 2010, p. 1-8.
- Liu D, Lu M, Li J, Yang Z, Feng Q, Zhou M, et al. The patterns and timing of recurrence after curative resection for gastric cancer in China. World J Surg Oncol. 2016, Dec 8;14(1):305. [CrossRef]
- Lo SS, Wu CW, Chen JH, Li AF, Hsieh MC, Shen KH, et al. Surgical results of early gastric cancer and proposing a treatment strategy. Ann Surg Oncol. 2007, Feb;14(2):340-7. [CrossRef]
- Tokunaga M, Hiki N, Fukunaga T, Ohyama S, Yamaguchi T, Nakajima T. Better 5-year survival rate following curative gastrectomy in overweight patients. Ann Surg Oncol. 2009, Dec;16(12):3245-51. [CrossRef]
- Zheng D, Chen B, Shen Z, Gu L, Wang X, Ma X, et al. Prognostic factors in stage I gastric cancer: A retrospective analysis. Open Med (Wars). 2020, Aug 3;15(1):754-762. PMID: 33336033; PMCID: PMC7712043. [CrossRef]
- Seeneevassen L, Bessède E, Mégraud F, Lehours P, Dubus P, Varon C. Gastric Cancer: Advances in Carcinogenesis Research and New Therapeutic Strategies. Int J Mol Sci. 2021, Mar 26;22(7):3418. [CrossRef]
- Sato M, Miura K, Kageyama C, Sakae H, Obayashi Y, Kawahara Y, et al. Association of host immunity with Helicobacter pylori infection in recurrent gastric cancer. Infect Agent Cancer. 2019, Feb 11;14:4. [CrossRef]
- Huang S, Yang J, Fong S, Zhao Q. Artificial intelligence in cancer diagnosis and prognosis: Opportunities and challenges. Cancer Lett. 2020, Feb 28; 471:61-71. [CrossRef]
- Chang CC, Huang TH, Shueng PW, Chen SH, Chen CC, Lu CJ, et al. Developing a Stacked Ensemble-Based Classification Scheme to Predict Second Primary Cancers in Head and Neck Cancer Survivors. Int J Environ Res Public Health. 2021, Nov 27;18(23):12499. [CrossRef]






| Cost of FN | TP rate | FP rate | Precision | Recall | F1 score | ROC area | PRC area | Accuracy | Category |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 0.853 | 0.095 | 0.899 | 0.853 | 0.875 | 0.952 | 0.945 | 0.879 | Non- Recurrence |
| 0.905 | 0.147 | 0.860 | 0.905 | 0.882 | 0.952 | 0.954 | Recurrence | ||
| 2 | 0.799 | 0.066 | 0.924 | 0.799 | 0.857 | 0.954 | 0.948 | 0.866 | Non- Recurrence |
| 0.934 | 0.201 | 0.823 | 0.934 | 0.875 | 0.954 | 0.955 | Recurrence | ||
| 3 | 0.743 | 0.058 | 0.928 | 0.743 | 0.825 | 0.953 | 0.947 | 0.842 | Non- Recurrence |
| 0.942 | 0.257 | 0.785 | 0.942 | 0.857 | 0.953 | 0.953 | Recurrence | ||
| 5 | 0.666 | 0.039 | 0.945 | 0.666 | 0.782 | 0.953 | 0.947 | 0.814 | Non- Recurrence |
| 0.961 | 0.334 | 0.742 | 0.961 | 0.838 | 0.953 | 0.954 | Recurrence |
| Algorithm | TP rate | FP rate | Precision | Recall | F1 score | ROC area | PRC area | Accuracy | Category |
|---|---|---|---|---|---|---|---|---|---|
| MLP | 0.835 | 0.112 | 0.882 | 0.835 | 0.858 | 0.909 | 0.91 | 0.862 | Non-Recurrence |
| 0.888 | 0.165 | 0.843 | 0.888 | 0.865 | 0.909 | 0.883 | Recurrence | ||
| C4.5 | 0.812 | 0.123 | 0.869 | 0.812 | 0.839 | 0.874 | 0.849 | 0.844 | Non- Recurrence |
| 0.877 | 0.188 | 0.823 | 0.877 | 0.849 | 0.874 | 0.826 | Recurrence | ||
| AdaBoost C4.5 | 0.859 | 0.115 | 0.882 | 0.859 | 0.87 | 0.933 | 0.924 | 0.872 | Non- Recurrence |
| 0.885 | 0.141 | 0.863 | 0.885 | 0.873 | N0.933 | 0.937 | Recurrence | ||
| Bagging C4.5 | 0.829 | 0.111 | 0.882 | 0.829 | 0.855 | 0.941 | 0.932 | 0.859 | Non- Recurrence |
| 0.889 | 0.171 | 0.839 | 0.889 | 0.863 | 0.941 | 0.945 | Recurrence | ||
| Random Forest | 0.853 | 0.095 | 0.899 | 0.853 | 0.875 | 0.952 | 0.945 | 0.879 | Non- Recurrence |
| 0.905 | 0.147 | 0.860 | 0.905 | 0.882 | 0.952 | 0.954 | Recurrence |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).