Preprint Article Version 2 Preserved in Portico This version is not peer-reviewed

Cost-sensitive Ensemble Feature Ranking and Automatic Threshold Selection for Chronic Kidney Disease Diagnosis

Version 1 : Received: 27 May 2020 / Approved: 28 May 2020 / Online: 28 May 2020 (11:50:02 CEST)
Version 2 : Received: 5 July 2020 / Approved: 6 July 2020 / Online: 6 July 2020 (09:56:15 CEST)

A peer-reviewed article of this Preprint also exists.

Imran Ali, S.; Ali, B.; Hussain, J.; Hussain, M.; Satti, F.A.; Park, G.H.; Lee, S. Cost-Sensitive Ensemble Feature Ranking and Automatic Threshold Selection for Chronic Kidney Disease Diagnosis. Appl. Sci. 2020, 10, 5663. Imran Ali, S.; Ali, B.; Hussain, J.; Hussain, M.; Satti, F.A.; Park, G.H.; Lee, S. Cost-Sensitive Ensemble Feature Ranking and Automatic Threshold Selection for Chronic Kidney Disease Diagnosis. Appl. Sci. 2020, 10, 5663.

Abstract

Automated medical diagnosis is one of the important machine learning applications in the domain of healthcare. In this regard, most of the approaches primarily focus on optimizing the accuracy of classification models. In this research, we argue that unlike general-purpose classification problems, medical applications, such as chronic kidney disease (CKD) diagnosis, require special treatment. In the case of CKD, apart from model performance, other factors such as the cost of data acquisition may also be taken into account to enhance the applicability of the automated diagnosis system. In this research, we have proposed two techniques for cost-sensitive feature ranking. An ensemble of decision tree models is employed in both the techniques for computing the worth of a feature in the CKD dataset. An automatic threshold selection heuristic is also introduced which is based on the intersection of features’ worth and their accumulated cost. A set of experiments are conducted to evaluate the efficacy of the proposed techniques on both tree-based and non-tree based classification models. The proposed approaches are also evaluated against several comparative techniques. Furthermore, it is demonstrated that the proposed techniques select around 1/4th of the original CKD features while reducing the cost by a factor of 7.42 of the original feature set. Based on the extensive experimentation it is concluded that the proposed techniques employing feature-cost interaction heuristic tend to select feature subsets that are both useful and cost-effective.

Keywords

Cost-sensitive feature selection; ensemble models; decision tree classifiers; chronic kidney disease; random forests; gradient boosted trees

Subject

Computer Science and Mathematics, Computer Science

Comments (1)

Comment 1
Received: 6 July 2020
Commenter: ALI SYED IMRAN
Commenter's Conflict of Interests: Author
Comment: The manuscript is extensively revised and updated based on several valuable comments and suggestions from the reviewers. The revised manuscript is updated in terms of both substance and form.  Following are some of the key updates incorporated into the current version of the manuscript:
1. Section 1. 'Introduction', is revised and extended to rectify issues of logical coherence and the objectives of the study are re-stated and elaborated
2. Section 2. 'Literature Review', includes discussion regarding the types of feature selection methods, a table summarizing the types and the mention of a couple of similar approaches 
3. Section 3. 'Proposed Methodology', algorithms/pseudo-code for the proposed approaches, and a few changes in the presentation. 
4. Section 4, 'Experimentation', this section has been extensively revised and extended. A set of seven classifiers are used to evaluate the methodologies (previously only the decision tree-based classifiers were used), the addition of new experiments to demonstrate the effectiveness of the ensemble models, it is also demonstrated that although the ensemble models are based on decision tree family, they are not redundant. evaluation metrics are revised and extended, a couple of new methods are added for the comparative analysis, all the figures are updated for a better presentation
5. 'Abstract' and 'Conclusion' are updated accordingly as per the revised and extended experimentation
6. Miscellaneous changes, issues related to language, presentation and exposition are thoroughly dealt with in the current revised version
+ Respond to this comment

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 1
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.