Submitted:
13 September 2025
Posted:
15 September 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
1.1. Background on Chronic Kidney Disease (CKD)
1.2. Importance of Early Detection
1.3. Role of Machine Learning in Medical Diagnosis
1.4. Motivation for Using Gradient Boosting and Feature Selection
2. Literature Review
2.1. Machine Learning Applications in Healthcare
2.2. Previous Studies Using Gradient Boosting
2.3. Feature Selection Techniques in Medical Datasets
2.4. Summary and Gap Identification
3. Gradient Boosting–Based Chronic Kidney Disease Detection Framework with Integrated Feature Selection
3.1. Source of Data
3.2. Data Preprocessing and Cleaning
- Clinical data often contains a lot of missing values, outliers, and categorical variables.
- K-Nearest Neighbours (KNN) imputation is used for estimating the values missing. The technique works on the principle of averaging the closest neighbours in the feature space. The purpose of K-Nearest Neighbours imputation is to preserve the data integrity, without throwing away values from the sample.
- In order to ensure not to distort the model training dataset, the outliers are detected using Interquartile Range (IQR). The outliers are capped and/or removed.
- Categorical Encoding is a method that deals when the feature is subjective like if the person is diabetic or not or somebody has hypertension or not and gets coverts to label encoding or one-hot encoding.
- Normalization: Continuous features are normalized with Z-score standardization to bring values to a common scale:
- where is the original feature value, is the mean, and is the standard deviation.
3.3. Feature Selection Techniques
-
Filter Methods:
- ○
- Chi-square test evaluates the statistical dependence between categorical features and CKD labels, enabling the removal of irrelevant features.
- ○
- Correlation Analysis (Pearson’s r or Spearman’s ) identifies highly correlated continuous features with the target variable to guide selection.
- Wrapper Methods:
- Recursive Feature Elimination (RFE) works by removing features with the smallest involvement to the prediction. This is done by training a model iteratively until an optimum feature subset is found. In simple words, RFE’s duty is to find a balance between the complexity of the model and its accuracy.
-
Embedded Methods:
- Lasso Regression applies regularization that shrinks coefficients of less important features to zero during model fitting by minimizing:
- ○
- Tree-Based Importance utilizes embedded feature importance from gradient boosting decision trees computed by the reduction in loss when splitting on the feature.
3.4. Gradient Boosting Algorithm
3.5. Parameter Tuning and Optimization:
- ○
- Learning rate (): Controls contribution of each tree, typically between 0.01 and 0.3.
- ○
- Number of trees (): More trees can improve accuracy but increase training time and risk of overfitting.
- ○
- Maximum tree depth (): Controls model complexity, deeper trees capture more interactions but risk overfitting.
- ○
- Subsampling rate and column sampling control randomness to prevent overfitting.Grid search or randomized search with cross-validation is used for optimal hyperparameter selection.

3.6. Experimental Setup
- Train-Test Split / Cross-Validation:
- Evaluation Metrics:
- ○
- Accuracy:
- ○
- Precision:
- ○
- Recall (Sensitivity):
- ○
- F1-Score:
- ○
- Area Under Receiver Operating Characteristic Curve (AUC-ROC):
4. Results and Analysis
4.1. Feature Selection Outcomes
4.2. Model Performance Comparison
4.2.1. With and Without Feature Selection
4.2.2. Across Different Feature Selection Techniques
4.3. Evaluation Metrics
4.4. Visualization of Results
4.4.1. Confusion Matrix
4.4.2. ROC Curves
4.4.3. Feature Importance Plots
4.5. Discussion of Findings
5. Discussion
5.1. Interpretation of Results in Clinical Context
5.2. Benefits of the Proposed Approach
5.3. Comparison with Existing Methods
5.4. Limitations of the Study
5.5. Potential for Real-World Implementation
6. Conclusion
Future Enhancements
References
- Sharma, T., Reddy, D. N., Kaur, C., Godla, S. R., Salini, R., Gopi, A., & Baker El-Ebiary, Y. A. (2024). Federated Convolutional Neural Networks for Predictive Analysis of Traumatic Brain Injury: Advancements in Decentralized Health Monitoring. International Journal of Advanced Computer Science & Applications, 15(4). [CrossRef]
- KP, A., & John, J. (2021). The Impact Of COVID-19 On Children And Adolescents: An Indianperspectives And Reminiscent Model. Int. J. of Aquatic Science, 12(2), 472-482.
- Akhila, K. P., & John, J. (2024). Deliberate democracy and the MeToo movement: Examining the impact of social media feminist discourses in India. In The Routledge International Handbook of Feminisms in Social Work (pp. 513-525). Routledge.
- John, J., & Akhila, K. P. (2019). Deprivation of Social Justice among Sexually Abused Girls: A Background Study.
- Labhane, S., Akhila, K. P., Rane, A. M., Siddiqui, S., Mirshad Rahman, T. M., & Srinivasan, K. (2023). Online Teaching at Its Best: Merging Instructions Design with Teaching and Learning Research; An Overview. Journal of Informatics Education and Research, 3(2).
- Mohammed, M. A., Fatma, G., Akhila, K. P., & Sarwar, S. (2023). Discussion on the role of video games in childhood studying. European Chemical Bulletin, Budapest, 12(7), 318-341.
- Akhila, K. P., & John, J. (2021). The Impact Of COVID-19 On Children And Adolescents: An Indianperspectives And Reminiscent Model. Science, 12(02), 2021.
- Prabhu Kavin, B., Karki, S., Hemalatha, S., Singh, D., Vijayalakshmi, R., Thangamani, M., ... & Adigo, A. G. (2022). Machine learning-based secure data acquisition for fake accounts detection in future mobile communication networks. Wireless Communications and Mobile Computing, 2022(1), 6356152. [CrossRef]
- Kalaiselvi, B., & Thangamani, M. (2020). An efficient Pearson correlation based improved random forest classification for protein structure prediction techniques. Measurement, 162, 107885. [CrossRef]
- Raja, A. S., Peerbasha, S., Iqbal, Y. M., Sundarvadivazhagan, B., & Surputheen, M. M. (2023). Structural Analysis of URL For Malicious URL Detection Using Machine Learning. Journal of Advanced Applied Scientific Research, 5(4), 28-41. [CrossRef]
- Peerbasha, S., & Surputheen, M. M. (2021). Prediction of Academic Performance of College Students with Bipolar Disorder using different Deep learning and Machine learning algorithms. International Journal of Computer Science & Network Security, 21(7), 350-358.
- Mohan, M., Veena, G. N., Pavitha, U. S., & Vinod, H. C. (2023). Analysis of ECG data to detect sleep apnea using deep learning. Journal of Survey in Fisheries Sciences, 10(4S), 371-376.
- Vinod, H. C., & Niranjan, S. K. (2018, January). Multi-level skew correction approach for hand written Kannada documents. In International Conference on Information Technology & Systems (pp. 376-386). Cham: Springer International Publishing.
- Geeitha, S., & Thangamani, M. (2018). Incorporating EBO-HSIC with SVM for gene selection associated with cervical cancer classification. Journal of medical systems, 42(11), 225. [CrossRef]
- Thangamani, M., & Thangaraj, P. (2010). Integrated Clustering and Feature Selection Scheme for Text Documents. Journal of Computer Science, 6(5), 536. [CrossRef]
- Peerbasha, S., & Surputheen, M. M. (2021). A Predictive Model to identify possible affected Bipolar disorder students using Naive Baye’s, Random Forest and SVM machine learning techniques of data mining and Building a Sequential Deep Learning Model using Keras. International Journal of Computer Science & Network Security, 21(5), 267-274.
- Naveen, I. G., Peerbasha, S., Fallah, M. H., Jebaseeli, S. K., & Das, A. (2024, October). A machine learning approach for wastewater treatment using feedforward neural network and batch normalization. In 2024 First International Conference on Software, Systems and Information Technology (SSITCON) (pp. 1-5). IEEE.
- Vinod, H. C., & Niranjan, S. K. (2017, November). De-warping of camera captured document images. In 2017 IEEE International Symposium on Consumer Electronics (ISCE) (pp. 13-18). IEEE.
- Kakde, S., Pavitha, U. S., Veena, G. N., & Vinod, H. C. (2022). Implementation of A Semi-Automatic Approach to CAN Protocol Testing for Industry 4.0 Applications. Advances in Industry 4.0: Concepts and Applications, 5, 203.
- Gangadhar, C., Chanthirasekaran, K., Chandra, K. R., Sharma, A., Thangamani, M., & Kumar, P. S. (2022). An energy efficient NOMA-based spectrum sharing techniques for cell-free massive MIMO. International Journal of Engineering Systems Modelling and Simulation, 13(4), 284-288. [CrossRef]
- Surendiran, R., Aarthi, R., Thangamani, M., Sugavanam, S., & Sarumathy, R. (2022). A systematic review using machine learning algorithms for predicting preterm birth. International Journal of Engineering Trends and Technology, 70(5), 46-59. [CrossRef]
- Vinod, H. C., Niranjan, S. K., & Anoop, G. L. (2013). Detection, extraction and segmentation of video text in complex background. International Journal on Advanced Computer Theory and Engineering, 5, 117-123.
- Vinod, H. C., & Niranjan, S. K. (2020). Camera captured document de-warping and de-skewing. Journal of Computational and Theoretical Nanoscience, 17(9-10), 4398-4403. [CrossRef]
- Peerbasha, S., Iqbal, Y. M., Surputheen, M. M., & Raja, A. S. (2023). Diabetes prediction using decision tree, random forest, support vector machine, k-nearest neighbors, logistic regression classifiers. JOURNAL OF ADVANCED APPLIED SCIENTIFIC RESEARCH, 5(4), 42-54. [CrossRef]
- Peerbasha, S., Habelalmateen, M. I., & Saravanan, T. (2025, January). Multimodal Transformer Fusion for Sentiment Analysis using Audio, Text, and Visual Cues. In 2025 International Conference on Intelligent Systems and Computational Networks (ICISCN) (pp. 1-6). IEEE.
- Keshamma, E., Rohini, S., Sankara Rao, K., Madhusudhan, B., & Udaya Kumar, M. (2008). Tissue culture-independent in planta transformation strategy: an Agrobacterium tumefaciens-mediated gene transfer method to overcome recalcitrance in cotton (Gossypium hirsutum L.). Journal of cotton science, 12(3), 264-272.
- Sundaresha, S., Manoj Kumar, A., Rohini, S., Math, S. A., Keshamma, E., Chandrashekar, S. C., & Udayakumar, M. (2010). Enhanced protection against two major fungal pathogens of groundnut, Cercospora arachidicola and Aspergillus flavus in transgenic groundnut over-expressing a tobacco β 1–3 glucanase. European journal of plant pathology, 126(4), 497-508. [CrossRef]
- Thamilarasi, V., & Roselin, R. (2021, February). Automatic classification and accuracy by deep learning using cnn methods in lung chest X-ray images. In IOP Conference Series: Materials Science and Engineering (Vol. 1055, No. 1, p. 012099). IOP Publishing.
- Thamilarasi, V., & Roselin, R. (2019). Lung segmentation in chest X-ray images using Canny with morphology and thresholding techniques. Int. j. adv. innov. res, 6(1), 1-7.
- Thamilarasi, V., & Roselin, R. (2019). Automatic thresholding for segmentation in chest X-ray images based on green channel using mean and standard deviation. International Journal of Innovative Technology and Exploring Engineering (IJITEE), 8(8), 695-699.
- Asaithambi, A., & Thamilarasi, V. (2023, March). Classification of lung chest X-ray images using deep learning with efficient optimizers. In 2023 IEEE 13th Annual Computing and Communication Workshop and Conference (CCWC) (pp. 0465-0469). IEEE.
- Thamilarasi, V., & Roselin, R. (2021). U-NET: convolution neural network for lung image segmentation and classification in chest X-ray images. INFOCOMP: Journal of Computer Science, 20(1), 101-108.
- Thamilarasi, V., Naik, P. K., Sharma, I., Porkodi, V., Sivaram, M., & Lawanyashri, M. (2024, March). Quantum computing-navigating the frontier with Shor’s algorithm and quantum cryptography. In 2024 International conference on trends in quantum computing and emerging business technologies (pp. 1-5). IEEE.
- Inbaraj, R., & Ravi, G. (2020). A survey on recent trends in content based image retrieval system. Journal of Critical Reviews, 7(11), 961-965.
- Inbaraj, R., & Ravi, G. (2021). Content Based Medical Image Retrieval System Based On Multi Model Clustering Segmentation And Multi-Layer Perception Classification Methods. Turkish Online Journal of Qualitative Inquiry, 12(7).
- Chary, S. S., Bhikshapathi, D. V. R. N., Vamsi, N. M., & Kumar, J. P. (2024). Optimizing entrectinib nanosuspension: quality by design for enhanced oral bioavailability and minimized fast-fed variability. BioNanoScience, 14(4), 4551-4569. [CrossRef]
- Kumar, J. P., Ismail, Y., Reddy, K. T. K., Panigrahy, U. P., Shanmugasundaram, P., & Babu, M. K. (2022). PACLITAXEL NANOSPONGES’FORMULA AND IN VITRO EVALUATION. Journal of Pharmaceutical Negative Results, 13(7), 2733-2740.
- Kumar, J. P., Rao, C. M. P., Singh, R. K., Garg, A., & Rajeswari, T. (2024). A comprehensive review on blood brain delivery methods using nanotechnology. Tropical Journal of Pharmaceutical and Life Sciences, 11(3), 43-52. [CrossRef]
- Sharma, S., Krishna, K. M., Joshi, S. S., Radhakrishnan, M., Palaniappan, S., Dussa, S., ... & Dahotre, N. B. (2023). Laser based additive manufacturing of tungsten: Multi-scale thermo-kinetic and thermo-mechanical computational model and experiments. Acta Materialia, 259, 119244. [CrossRef]
- Palaniappan, S., Joshi, S. S., Sharma, S., Radhakrishnan, M., Krishna, K. M., & Dahotre, N. B. (2024). Additive manufacturing of FeCrAl alloys for nuclear applications-A focused review. Nuclear Materials and Energy, 40, 101702. [CrossRef]
- Mazumder, S., Man, K., Radhakrishnan, M., Pantawane, M. V., Palaniappan, S., Patil, S. M., ... & Dahotre, N. B. (2023). Microstructure enhanced biocompatibility in laser additively manufactured CoCrMo biomedical alloy. Biomaterials Advances, 150, 213415. [CrossRef]
- Inbaraj, R., & Ravi, G. (2021). Multi Model Clustering Segmentation and Intensive Pragmatic Blossoms (Ipb) Classification Method based Medical Image Retrieval System. Annals of the Romanian Society for Cell Biology, 25(3), 7841-7852.
- Inbaraj, R., & Ravi, G. (2020). Content Based Medical Image Retrieval Using Multilevel Hybrid Clustering Segmentation with Feed Forward Neural Network. Journal of Computational and Theoretical Nanoscience, 17(12), 5550-5562. [CrossRef]
- Sankara Rao, K., Sreevathsa, R., Sharma, P. D., Keshamma, E., & Udaya Kumar, M. (2008). In planta transformation of pigeon pea: a method to overcome recalcitrancy of the crop to regeneration in vitro. Physiology and Molecular Biology of Plants, 14(4), 321-328. [CrossRef]
- Keshamma, E., Sreevathsa, R., Kumar, A. M., Reddy, K. N., Manjulatha, M., Shanmugam, N. B., ... & Udayakumar, M. (2012). Agrobacterium-mediated in planta transformation of field bean (Lablab purpureus L.) and recovery of stable transgenic plants expressing the cry 1AcF gene. Plant Molecular Biology Reporter, 30(1), 67-78. [CrossRef]
- Entoori, K., Sreevathsa, R., Arthikala, M. K., Kumar, P. A., Kumar, A. R. V., Madhusudhan, B., & Makarla, U. (2008). A chimeric cry1X gene imparts resistance to Spodoptera litura and Helicoverpa armigera in the transgenic groundnut. EurAsia J BioSci, 2, 53-65.
- Keshamma, E., Rohini, S., Rao, K. S., Madhusudhan, B., & Kumar, M. U. (2008). Molecular biology and physiology tissue culture-independent In Planta transformation strategy: an Agrobacterium tumefaciens-mediated gene transfer method to overcome recalcitrance in cotton (Gossypium hirsutum L.). J Cotton Sci, 12, 264-272.
- Saravanan, V., Sumalatha, A., Reddy, D. N., Ahamed, B. S., & Udayakumar, K. (2024, October). Exploring Decentralized Identity Verification Systems Using Blockchain Technology: Opportunities and Challenges. In 2024 5th IEEE Global Conference for Advancement in Technology (GCAT) (pp. 1-6). IEEE.
- Arunachalam, S., Kumar, A. K. V., Reddy, D. N., Pathipati, H., Priyadarsini, N. I., & Ramisetti, L. N. B. (2025). Modeling of chimp optimization algorithm node localization scheme in wireless sensor networks. Int J Reconfigurable & Embedded Syst, 14(1), 221-230. [CrossRef]
- Saravanan, V., Upender, T., Ruby, E. K., Deepalakshmi, P., Reddy, D. N., & SN, A. (2024, October). Machine Learning Approaches for Advanced Threat Detection in Cyber Security. In 2024 5th IEEE Global Conference for Advancement in Technology (GCAT) (pp. 1-6). IEEE.
- Reddy, D. N., Venkateswararao, P., Vani, M. S., Pranathi, V., & Patil, A. (2025). HybridPPI: A Hybrid Machine Learning Framework for Protein-Protein Interaction Prediction. Indonesian Journal of Electrical Engineering and Informatics (IJEEI), 13(2).
- Rao, A. S., Reddy, Y. J., Navya, G., Gurrapu, N., Jeevan, J., Sridhar, M., ... & Anand, D. High-performance sentiment classification of product reviews using GPU (parallel)-optimized ensembled methods. [CrossRef]
- NULI, M., KUMAR, J. P., KORNI, R., & PUTTA, S. (2024). Cadmium Toxicity: Unveiling the Threat to Human Health. Indian Journal of Pharmaceutical Sciences, 86(5).
- Nelson, V. K., Nuli, M. V., Ausali, S., Gupta, S., Sanga, V., Mishra, R., ... & Jha, N. K. (2024). Dietary anti-inflammatory and anti-bacterial medicinal plants and its compounds in bovine mastitis associated impact on human life. Microbial Pathogenesis, 192, 106687. [CrossRef]
- Putta, S., & Silakabattini, K. (2020). Protective Effect of Tylophora indica against Streptozotocin Induced Pancreatic and Liver Dysfunction in Wistar Rats. Biomedical and Pharmacology Journal, 13(4), 1755-1763. [CrossRef]
- Kumar, J., Radhakrishnan, M., Palaniappan, S., Krishna, K. M., Biswas, K., Srinivasan, S. G., ... & Dahotre, N. B. (2024). Cr content dependent lattice distortion and solid solution strengthening in additively manufactured CoFeNiCrx complex concentrated alloys–a first principles approach. Materials Today Communications, 40, 109485. [CrossRef]
- Radhakrishnan, M., Sharma, S., Palaniappan, S., Pantawane, M. V., Banerjee, R., Joshi, S. S., & Dahotre, N. B. (2024). Influence of thermal conductivity on evolution of grain morphology during laser-based directed energy deposition of CoCrxFeNi high entropy alloys. Additive Manufacturing, 92, 104387. [CrossRef]
- Mazumder, S., Palaniappan, S., Pantawane, M. V., Radhakrishnan, M., Patil, S. M., Dowden, S., ... & Dahotre, N. B. (2023). Electrochemical response of heterogeneous microstructure of laser directed energy deposited CoCrMo in physiological medium. Applied Physics A, 129(5), 332. [CrossRef]
- Niasi, K. S. K., Kannan, E., & Suhail, M. M. (2016). Page-level data extraction approach for web pages using data mining techniques. International Journal of Computer Science and Information Technologies, 7(3), 1091-1096.
- Niasi, K. S. K., & Kannan, E. Multi Agent Approach for Evolving Data Mining in Parallel and Distributed Systems using Genetic Algorithms and Semantic Ontology.
- Vidyabharathi, D., Mohanraj, V., Kumar, J. S., & Suresh, Y. (2023). Achieving generalization of deep learning models in a quick way by adapting T-HTR learning rate scheduler. Personal and Ubiquitous Computing, 27(3), 1335-1353. [CrossRef]
- Jaishankar, B., Ashwini, A. M., Vidyabharathi, D., & Raja, L. (2023). A novel epilepsy seizure prediction model using deep learning and classification. Healthcare analytics, 4, 100222. [CrossRef]
- Hamed, S., Mesleh, A., & Arabiyyat, A. (2021). Breast cancer detection using machine learning algorithms. International Journal of Computer Science and Mobile Computing, 10(11), 4-11. [CrossRef]
- Raja, M. W., & Nirmala, K. INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY AN EXTREME PROGRAMMING METHOD FOR E-LEARNING COURSE FOR WEB APPLICATION DEVELOPMENT.
- Banu, S. S., Niasi, K. S. K., & Kannan, E. (2019). Classification Techniques on Twitter Data: A Review. Asian Journal of Computer Science and Technology, 8(S2), 66-69. [CrossRef]
- Mubsira, M., & Niasi, K. S. K. (2018). Prediction of Online Products using Recommendation Algorithm.
- Niasi, K. S. K., & Kannan, E. (2016). Multi Attribute Data Availability Estimation Scheme for Multi Agent Data Mining in Parallel and Distributed System. International Journal of Applied Engineering Research, 11(5), 3404-3408.
- Marimuthu, M., Mohanraj, G., Karthikeyan, D., & Vidyabharathi, D. (2023). RETRACTED: Safeguard confidential web information from malicious browser extension using Encryption and Isolation techniques. Journal of Intelligent & Fuzzy Systems, 45(4), 6145-6160.
- Lavanya, R., Vidyabharathi, D., Kumar, S. S., Mali, M., Arunkumar, M., Aravinth, S. S., ... & Tesfayohanis, M. (2023). [Retracted] Wearable Sensor-Based Edge Computing Framework for Cardiac Arrhythmia Detection and Acute Stroke Prediction. Journal of Sensors, 2023(1), 3082870.
- Selvam, P., Faheem, M., Dakshinamurthi, V., Nevgi, A., Bhuvaneswari, R., Deepak, K., & Sundar, J. A. (2024). Batch normalization free rigorous feature flow neural network for grocery product recognition. IEEE Access, 12, 68364-68381. [CrossRef]
- Raja, M. W., & Nirmala, D. K. (2016). Agile development methods for online training courses web application development. International Journal of Applied Engineering Research ISSN, 0973-4562.
- Raja, M. W. (2024). Artificial intelligence-based healthcare data analysis using multi-perceptron neural network (MPNN) based on optimal feature selection. SN Computer Science, 5(8), 1034. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).