Submitted:
27 October 2023
Posted:
30 October 2023
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Literature Review
2.1. Machine Learning Approaches
2.2. Ensemble Learning Approaches
2.3. Feature Selection Algorithms
2.4. Our Contributions Compared with Literature
3. Data Gathering
4. Proposed EHMFFL Algorithm
4.1. Feature Selection using PCC-GWO
| Algorithm 1. Feature Selection using PCC-GWO algorithm. |
| Input: |
| Full heart disease dataset |
| Output: |
| Optimal Feature Subset for Machine Learning Model |
| Heuristic Feature Selection: Calculation of Importance Scores using PCC: |
|
|
|
|
|
| Metaheuristic Feature Selection: Final Feature Subset Selection using GWO: |
|
|
|
|
|
|
|
|
4.1.1. Calculating Importance Score of Features using PCC
- 1)
- The correlation coefficient of each feature i with the class is computed as CCi.
- 2)
- The correlation coefficient of each feature i in relation to the other features is calculated as CFi.
- 3)
- The importance score of each feature i can be calculated as ISi=CCi/CFi.
4.1.2. Feature Subset Selection using GWO
- Alpha
- (α): the finest solution
- Alpha
- Beta (β): the second best solution
- Alpha
- Delta (δ): the third best solutions
- Alpha
- Omega (ω): the rest of grey wolves
4.2. Ensemble Learning Model
5. Evaluation and Findings
5.1. Performance Metrics
- True Positive (TP): the number of correctly identified positive instances inside the desired class.
- True Negative (TN): the number of correctly identified negative instances outside the desired class.
- False Positive (FP): the number of incorrectly predicted positive samples when the actual target was negative.
- False Negative (FN): the number of incorrectly predicted negative samples when the actual target was positive.
5.2. Experimental Findings
5.3.1. Analysis of Correlation Heat Map (CHM)
- The first section of the matrix compares age, sex, and blood pressure. The correlation between age and blood pressure is weakly positive (0.28), while the correlation between sex and blood pressure is weakly negative (-0.098).
- The second section compares cholesterol and blood sugar. Cholesterol and blood sugar have a weakly negative correlation (-0.057).
- The third section compares restecg, thalach, exang, and oldpeak. Resting electrocardiogram results (restecg) and exercise-induced angina (exang) have a weakly positive correlation (0.14), while maximum heart rate achieved during exercise (thalach) has a weakly negative correlation (-0.044) with ST depression induced by exercise relative to rest (oldpeak).
- The fourth section compares the number of major vessels colored by fluoroscopy (ca) with the other variables. There is a weakly positive correlation between ca and age (0.12), and a weakly positive correlation between ca and cholesterol (0.097).
- The fifth section compares the different types of chest pain (cp) and their correlations with the other variables. Chest pain type 0 (cp_0) has a weakly positive correlation with ca (0.14), while chest pain type 1 (cp_1) has a weakly negative correlation with thal2 (-0.15). Chest pain type 2 (cp_2) has a weakly positive correlation with fbs (0.084), and chest pain type 3 (cp_3) has a weakly positive correlation with age (0.048).
- The final section of the matrix compares the slope of the peak exercise ST segment (slope) and the two types of thalassemia (thal and thal2). There is a weakly positive correlation between slope and thal2 (0.18), and a weakly negative correlation between slope and thal (-0.42).
- The values in the matrix represent the correlations between each pair of variables. A positive value indicates a positive correlation (as one variable increases, so does the other), while a negative value indicates a negative correlation (as one variable increases, the other decreases).
- For example, we can see that age is highly negatively correlated with itself (correlation coefficient of -1.00) since it is impossible for someone's age to be negatively correlated with their own age. Sex is negatively correlated with BP and positively correlated with cholesterol levels. We can also see that the ST depression is positively correlated with exercise-induced angina, Thallium stress test results, and chest pain types 3 and 4.
- Some notable correlations include a positive correlation between age and BP (r=0.27), a negative correlation between age and max HR (r=-0.4), and a positive correlation between chest pain type 3 and ST depression (r=0.35). There also appear to be some negative correlations between certain variables, such as sex and chest pain type 3 (r=-0.26) and slope of ST 3 and thal2 (r=-0.24).
5.3.2. Analysis of Receiver Operating Characteristic (ROC)
5.3. Comparison With Existing Techniques
6. Conclusion
Author Contributions
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Das, S., Sharma, R., Gourisaria, M. K., Rautaray, S. S., & Pandey, M. (2020). Heart disease detection using core machine learning and deep learning techniques: A comparative study. International Journal on Emerging Technologies, 11(3), 531-538.
- Hasan, T. T., Jasim, M. H., & Hashim, I. A. (2018, December). FPGA design and hardware implementation of heart disease diagnosis system based on NVG-RAM classifier. In 2018 Third Scientific Conference of Electrical Engineering (SCEE) (pp. 33-38). IEEE.
- Rahman, A. U., Saeed, M., Mohammed, M. A., Jaber, M. M., & Garcia-Zapirain, B. (2022). A novel fuzzy parameterized fuzzy hypersoft set and riesz summability approach based decision support system for diagnosis of heart diseases. Diagnostics, 12(7), 1546. [CrossRef]
- Javid, I., Alsaedi, A. K. Z., & Ghazali, R. (2020). Enhanced accuracy of heart disease prediction using machine learning and recurrent neural networks ensemble majority voting method. International Journal of Advanced Computer Science and Applications, 11(3). [CrossRef]
- Muhsen, D. K., Khairi, T. W. A., & Alhamza, N. I. A. (2021). Machine learning system using modified random forest algorithm. In Intelligent Systems and Networks: Selected Articles from ICISN 2021, Vietnam (pp. 508-515). Springer Singapore.
- Mastoi, Q. U. A., Wah, T. Y., Mohammed, M. A., Iqbal, U., Kadry, S., Majumdar, A., & Thinnukool, O. (2022). Novel DERMA fusion technique for ECG heartbeat classification. Life, 12(6), 842. [CrossRef]
- Nahar, J., Imam, T., Tickle, K. S., & Chen, Y. P. P. (2013). Computational intelligence for heart disease diagnosis: A medical knowledge driven approach. Expert systems with applications, 40(1), 96-104. [CrossRef]
- Lee, H. G., Noh, K. Y., & Ryu, K. H. (2007). Mining biosignal data: coronary artery disease diagnosis using linear and nonlinear features of HRV. In Emerging Technologies in Knowledge Discovery and Data Mining: PAKDD 2007 International Workshops Nanjing, China, May 22-25, 2007 Revised Selected Papers 11 (pp. 218-228). Springer Berlin Heidelberg.
- Sudhakar, K., & Manimekalai, D. M. (2014). Study of heart disease prediction using data mining. International journal of advanced research in computer science and software engineering, 4(1), 1157-1160.
- Khazaee, A. (2013). Heart beat classification using particle swarm optimization. International Journal of Intelligent Systems and Applications, 5(6), 25.
- Xing, Y., Wang, J., & Zhao, Z. (2007, November). Combination data mining methods with new medical data to predicting outcome of coronary heart disease. In 2007 International Conference on Convergence Information Technology (ICCIT 2007) (pp. 868-872). IEEE.
- Breiman, L. (1996). Bagging predictors. Machine learning, 24, 123-140.
- Schapire, R. E., & Singer, Y. (1998, July). Improved boosting algorithms using confidence-rated predictions. In Proceedings of the eleventh annual conference on Computational learning theory (pp. 80-91).
- Miao, K. H., & Miao, J. H. (2018). Coronary heart disease diagnosis using deep neural networks. International journal of advanced computer science and applications, 9(10).
- Vijayashree, J., & Sultana, H. P. (2018). A machine learning framework for feature selection in heart disease classification using improved particle swarm optimization with support vector machine classifier. Programming and Computer Software, 44, 388-397. [CrossRef]
- Waigi, D., Choudhary, D. S., Fulzele, D. P., & Mishra, D. (2020). Predicting the risk of heart disease using advanced machine learning approach. Eur. J. Mol. Clin. Med, 7(7), 1638-1645.
- Tuli, S., Basumatary, N., Gill, S. S., Kahani, M., Arya, R. C., Wander, G. S., & Buyya, R. (2020). HealthFog: An ensemble deep learning based Smart Healthcare System for Automatic Diagnosis of Heart Diseases in integrated IoT and fog computing environments. Future Generation Computer Systems, 104, 187-200. [CrossRef]
- Jindal, H., Agrawal, S., Khera, R., Jain, R., & Nagrath, P. (2021). Heart disease prediction using machine learning algorithms. In IOP conference series: materials science and engineering (Vol. 1022, No. 1, p. 012072). IOP Publishing.
- Sarra, R. R., Dinar, A. M., Mohammed, M. A., & Abdulkareem, K. H. (2022). Enhanced heart disease prediction based on machine learning and χ2 statistical optimal feature selection model. Designs, 6(5), 87. [CrossRef]
- Aliyar Vellameeran, F., & Brindha, T. (2022). A new variant of deep belief network assisted with optimal feature selection for heart disease diagnosis using IoT wearable medical devices. Computer Methods in Biomechanics and Biomedical Engineering, 25(4), 387-411. [CrossRef]
- Latha, C. B. C., & Jeeva, S. C. (2019). Improving the accuracy of prediction of heart disease risk based on ensemble classification techniques. Informatics in Medicine Unlocked, 16, 100203. [CrossRef]
- Ali, F., El-Sappagh, S., Islam, S. R., Kwak, D., Ali, A., Imran, M., & Kwak, K. S. (2020). A smart healthcare monitoring system for heart disease prediction based on ensemble deep learning and feature fusion. Information Fusion, 63, 208-222. [CrossRef]
- Shorewala, V. (2021). Early detection of coronary heart disease using ensemble techniques. Informatics in Medicine Unlocked, 26, 100655. [CrossRef]
- Ghasemi Darehnaei, Z., Shokouhifar, M., Yazdanjouei, H., & Rastegar Fatemi, S. M. J. (2022). SI-EDTL: swarm intelligence ensemble deep transfer learning for multiple vehicle detection in UAV images. Concurrency and Computation: Practice and Experience, 34(5), e6726. [CrossRef]
- Shokouhifar, A., Shokouhifar, M., Sabbaghian, M., & Soltanian-Zadeh, H. (2023). Swarm intelligence empowered three-stage ensemble deep learning for arm volume measurement in patients with lymphedema. Biomedical Signal Processing and Control, 85, 105027. [CrossRef]
- Nagarajan, S. M., Muthukumaran, V., Murugesan, R., Joseph, R. B., Meram, M., & Prathik, A. (2022). Innovative feature selection and classification model for heart disease prediction. Journal of Reliable Intelligent Environments, 8(4), 333-343. [CrossRef]
- Al-Yarimi, F. A. M., Munassar, N. M. A., Bamashmos, M. H. M., & Ali, M. Y. S. (2021). Feature optimization by discrete weights for heart disease prediction using supervised learning. Soft Computing, 25, 1821-1831. [CrossRef]
- Ahmad, G. N., Ullah, S., Algethami, A., Fatima, H., & Akhter, S. M. H. (2022). Comparative study of optimum medical diagnosis of human heart disease using machine learning technique with and without sequential feature selection. ieee access, 10, 23808-23828. [CrossRef]
- Pathan, M. S., Nag, A., Pathan, M. M., & Dev, S. (2022). Analyzing the impact of feature selection on the accuracy of heart disease prediction. Healthcare Analytics, 2, 100060. [CrossRef]
- Zhang, D., Chen, Y., Chen, Y., Ye, S., Cai, W., Jiang, J., ... & Chen, M. (2021). Heart disease prediction based on the embedded feature selection method and deep neural network. Journal of Healthcare Engineering, 2021, 1-9. [CrossRef]
- https://archive.ics.uci.edu/ml/datasets/heart+disease.
- http://archive.ics.uci.edu/ml/datasets/Statlog+%28Heart%29.
- Jensen, R. (2005). Combining rough and fuzzy sets for feature selection (Doctoral dissertation, University of Edinburgh).
- Shokouhifar, M., & Farokhi, F. (2010, December). An artificial bee colony optimization for feature subset selection using supervised fuzzy C_means algorithm. In 3rd International conference on information security and artificial intelligent (ISAI) (pp. 427-432).
- Shokouhifar, M., & Jalali, A. (2017). Optimized sugeno fuzzy clustering algorithm for wireless sensor networks. Engineering applications of artificial intelligence, 60, 16-25. [CrossRef]
- Shokouhifar, M. (2021). FH-ACO: Fuzzy heuristic-based ant colony optimization for joint virtual network function placement and routing. Applied Soft Computing, 107, 107401. [CrossRef]
- Behmanesh-Fard, N., Yazdanjouei, H., Shokouhifar, M., & Werner, F. (2023). Mathematical Circuit Root Simplification Using an Ensemble Heuristic–Metaheuristic Algorithm. Mathematics, 11(6), 1498. [CrossRef]
- Shokouhifar, M., Sohrabi, M., Rabbani, M., Molana, S. M. H., & Werner, F. (2023). Sustainable Phosphorus Fertilizer Supply Chain Management to Improve Crop Yield and P Use Efficiency Using an Ensemble Heuristic–Metaheuristic Optimization Algorithm. Agronomy, 13(2), 565. [CrossRef]
- Sohrabi, M., Zandieh, M., & Shokouhifar, M. (2023). Sustainable inventory management in blood banks considering health equity using a combined metaheuristic-based robust fuzzy stochastic programming. Socio-Economic Planning Sciences, 86, 101462. [CrossRef]
- Xie, W., Li, W., Zhang, S., Wang, L., Yang, J., & Zhao, D. (2022). A novel biomarker selection method combining graph neural network and gene relationships applied to microarray data. BMC bioinformatics, 23(1), 303. [CrossRef]
- Pearson, K. (1894). Contributions to the mathematical theory of evolution. Philosophical Transactions of the Royal Society of London. A, 185, 71-110.
- Mirjalili, S., Mirjalili, S. M., & Lewis, A. (2014). Grey wolf optimizer. Advances in engineering software, 69, 46-61.
- Grover, P., Chaturvedi, K., Zi, X., Saxena, A., Prakash, S., Jan, T., & Prasad, M. (2023). Ensemble Transfer Learning for Distinguishing Cognitively Normal and Mild Cognitive Impairment Patients Using MRI. Algorithms, 16(8), 377. [CrossRef]













| Feature | Description | Type | Values |
|---|---|---|---|
| Age | Age of the patients | Numeric | Years |
| Sex | Gender of patients | Categorial | M, F |
| Ca | Number of major vessels | Categorial | 0-4 |
| Chol | Serum cholesterol | Numeric | mg/dl |
| Exang | Exercise induced angina | Categorial | Yes=1, No=0 |
| Cp | Chest pain type | Categorial | Male=1, Female=0 |
| Oldpeak | ST depression induced by exercise relative to rest | Numeric | 0-6.2 |
| Fbs | Fasting blood sugar | Categorial | mg/dl |
| Restecg | Resting electrocardiographic | Categorial | 0, 1, 2 |
| Thal | Normal; Fixed defect; Reversible defect | Categorial | 0, 1, 2, 3 |
| Thalach | Maximum heart rate achieved | Numeric | 71-202 |
| Slope | the slope of the peak exercise ST segment | Categorial | 0, 1, 2 |
| Trestbps | Resting blood pressure | Numeric | 94-200 |
| Num | Heart disease status | Categorial | Yes/No |
| Parameter | Value |
|---|---|
| Number of grey wolves (PopSize) | 30 |
| Number of iterations (MaxIter) | 100 |
| Search domain | {0,1} |
| Solution dimension | No. Features |
| Algorithms | Accuracy | Precision | Recall | Specificity | F1-score |
|---|---|---|---|---|---|
| LR | 85.2 | 90.3 | 82.4 | 88.9 | 86.2 |
| DT | 82 | 84.8 | 82.4 | 81.5 | 83.6 |
| RF | 90.2 | 96.7 | 85.3 | 96.3 | 90.6 |
| NB | 85.2 | 87.9 | 85.3 | 85.2 | 86.6 |
| SVM | 86.9 | 88.2 | 88.2 | 85.2 | 88.2 |
| KNN | 83.6 | 87.5 | 82.4 | 85.2 | 84.8 |
| XGBoost | 88.5 | 96.6 | 82.4 | 96.3 | 88.9 |
| EHMFFL (Proposed) | 91.8 | 91.4 | 94.1 | 88.9 | 92.8 |
| Algorithms | Accuracy | Precision | Recall | Specificity | F1-score |
|---|---|---|---|---|---|
| LR | 79.6 | 80 | 82.8 | 76 | 81.4 |
| DT | 81.5 | 85.2 | 79.3 | 84 | 82.1 |
| RF | 84.4 | 86.2 | 85.5 | 76 | 87.4 |
| NB | 77.8 | 79.3 | 79.3 | 76 | 79.3 |
| SVM | 83.3 | 81.3 | 89.7 | 76 | 85.2 |
| KNN | 80.8 | 84.5 | 78.9 | 83 | 81.6 |
| XGBoost | 85.2 | 88.9 | 82.8 | 88 | 85.7 |
| EHMFFL (Proposed) | 88.9 | 92.6 | 86.2 | 92 | 89.3 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).