Submitted:
01 February 2025
Posted:
03 February 2025
You are already at the latest version
Abstract
Cardiovascular disease (CVD) is a major global health challenge, with various factors such as lifestyle choices, lipid profiles, and genetic predispositions contributing to its development. Early and accurate prediction of CVD risk is critical for effective prevention and management strategies. This study aims to develop robust predictive models for CVD risk by integrating demographic, health, genetic, and imaging data, utilizing machine learning (ML) and deep learning (DL). The primary objective is to leverage X-ray image data analyzed through convolutional neural networks (CNNs) to improve the accuracy of risk prediction models. A secondary objective is to explore the potential of combining structured patient data, including demographic information, lipid profiles, and genetic markers, with advanced imaging features to provide a comprehensive and precise CVD risk assessment. By integrating ML and DL algorithms, the study also aims to identify key features and patterns that could contribute to early detection and personalized healthcare interventions for individuals at risk of CVD. The dataset used in this study consists of 264 patients, incorporating demographic details, health indicators, lipid profiles, genetic markers, lifestyle factors, and X-ray images to detect heart-related abnormalities. Data preprocessing techniques such as normalization, resizing, and augmentation were applied to ensure consistency across the dataset. Feature selection was performed using techniques like LASSO and correlation analysis to identify the most predictive variables. Various ML models, including Support Vector Machines (SVM), Random Forest (RF), and Decision Trees (DT), were trained on the curated data. CNNs were employed for image data analysis, and Grad-CAM was used to visualize and interpret the model's decision-making process. The results showed promising outcomes, with ML models achieving up to 95% accuracy, and CNNs using X-ray images achieving 100% accuracy. This research highlights the potential of integrating ML and DL algorithms for early detection and personalized healthcare strategies in CVD.
Keywords:
1. Introduction
Literature Review
2. Material and Methods Selection
2.1. Dataset
2.2. Methods Selection
2.3. Pre-Processing and Taining
3. Result
3.1. Classification using Machine Learning Algorithms
- The first row represents the actual instances of class 0. Out of all these instances, the model correctly classified 96 as class 0 (true positives, A: 96) and incorrectly classified 1 as class 1 (false negatives, B: 1 in row a). This means that among the individuals who actually belong to class 0, 96 were accurately identified, but 1 was misclassified as class 1.
- The second row represents the actual instances of class 1. Here, the model correctly identified 166 as class 1 (true negatives, B: 166 in row b) and incorrectly classified 1 as class 0 (false positives, A: 1 in row b). Thus, out of the individuals who actually belong to class 1, 166 were correctly predicted, while 1 was wrongly predicted as class 0.

- Support Vector Machine (SVM): Achieving the highest accuracy of 100%, indicating its exceptional performance in accurately predicting cardiovascular disease risk based on personal lifestyle factors and lipid profiles.
- Decision Tree: With an accuracy of 99.2%, this algorithm also demonstrated high predictive capability, closely trailing behind the SVM.

- Random Forest: Scoring 98.6%, this method showed robust performance, making it a strong contender for accurate risk prediction.
- Convolutional Neural Network (CNN): Although effective, the CNN recorded the lowest accuracy among the algorithms at 92%.

3.2. Classification with CNN
| TP Rate | FP Rate | Precision | Recall | F-Measure | MCC | ROC Area | PRC Area | Class |
|---|---|---|---|---|---|---|---|---|
| 0.950 | 0.050 | 0.950 | 0.950 | 0.950 | 0.900 | 0.975 | 0.975 | 0 |
| 0.950 | 0.050 | 0.950 | 0.950 | 0.950 | 0.900 | 0.975 | 0.975 | 1 |
| 0.950 | 0.050 | 0.950 | 0.950 | 0.950 | 0.900 | 0.975 | 0.975 | Weighted Avg. |
Conclusion
Future Scope
Conflict of Interest
References
- Dalal, S., Goel, P., Onyema, E. M., Alharbi, A., Mahmoud, A., Algarni, M. A., & Awal, H. (2023). Application of machine learning for cardiovascular disease risk prediction. Computational Intelligence and Neuroscience, 2023(1), 9418666. [CrossRef]
- Peng, M., Hou, F., Cheng, Z., Shen, T., Liu, K., Zhao, C., & Zheng, W. (2023). Prediction of cardiovascular disease risk based on major contributing features. Scientific Reports, 13(1), 4778. [CrossRef]
- Ordikhani, M., Saniee Abadeh, M., Prugger, C., Hassannejad, R., Mohammadifard, N., & Sarrafzadegan, N. (2022). An evolutionary machine learning algorithm for cardiovascular disease risk prediction. Plos one, 17(7), e0271723. [CrossRef]
- Dritsas, E., Alexiou, S., & Moustakas, K. (2022). Cardiovascular Disease Risk Prediction with Supervised Machine Learning Techniques. ICT4AWE, 1, 315-321. [CrossRef]
- Qian, X., Li, Y., Zhang, X., Guo, H., He, J., Wang, X., ... & Guo, S. (2022). A cardiovascular disease prediction model based on routine physical examination indicators using machine learning methods: a cohort study. Frontiers in cardiovascular medicine, 9, 854287. [CrossRef]
- Chinnasamy, P., Kumar, S. A., Navya, V., Priya, K. L., & Boddu, S. S. (2022). Machine learning based cardiovascular disease prediction. Materials Today: Proceedings, 64, 459-463. [CrossRef]
- Pal, M., Parija, S., Panda, G., Dhama, K., & Mohapatra, R. K. (2022). Risk prediction of cardiovascular disease using machine learning classifiers. Open Medicine, 17(1), 1100-1113. [CrossRef]
- Ansarullah, S. I., Saif, S. M., Kumar, P., & Kirmani, M. M. (2022). Significance of visible non-invasive risk attributes for the initial prediction of heart disease using different machine learning techniques. Computational intelligence and neuroscience, 2022. [CrossRef]
- Ramesh, T. R., Lilhore, U. K., Poongodi, M., Simaiya, S., Kaur, A., & Hamdi, M. (2022). Predictive analysis of heart diseases with machine learning approaches. Malaysian Journal of Computer Science, 132-148. [CrossRef]
- Badhan, P. K. (2024). An Approach to Pattern Prediction and Early Recognition of Lung Cancer Employing Machine Learning Techniques. In Revolutionizing Healthcare: AI Integration with IoT for Enhanced Patient Outcomes (pp. 267-278). Cham: Springer Nature Switzerland. [CrossRef]
- Swathy, M., & Saruladha, K. (2022). A comparative study of classification and prediction of Cardio-Vascular Diseases (CVD) using Machine Learning and Deep Learning techniques. ICT Express, 8(1), 109-116. [CrossRef]
- Uddin, M. N., & Halder, R. K. (2021). An ensemble method based multilayer dynamic system to predict cardiovascular disease using machine learning approach. Informatics in Medicine Unlocked, 24, 100584. [CrossRef]
- Hossen, M. A., Tazin, T., Khan, S., Alam, E., Sojib, H. A., Monirujjaman Khan, M., & Alsufyani, A. (2021). Supervised machine learning-based cardiovascular disease analysis and prediction. Mathematical Problems in Engineering, 2021, 1-10.
- Rahim, A., Rasheed, Y., Azam, F., Anwar, M. W., Rahim, M. A., & Muzaffar, A. W. (2021). An integrated machine learning framework for effective prediction of cardiovascular diseases. IEEE Access, 9, 106575-106588. [CrossRef]
- Kim, J. O., Jeong, Y. S., Kim, J. H., Lee, J. W., Park, D., & Kim, H. S. (2021). Machine learning-based cardiovascular disease prediction model: A cohort study on the Korean national health insurance service health screening database. Diagnostics, 11(6), 943. [CrossRef]
- Faizal, A. S. M., Thevarajah, T. M., Khor, S. M., & Chang, S. W. (2021). A review of risk prediction models in cardiovascular disease: Conventional approach vs. artificial intelligence approach. Computer Methods and Programs in Biomedicine, 207, 106190. [CrossRef]
- Islam, S., Jahan, N., & Khatun, M. E. (2020, March). Cardiovascular disease forecast using machine learning paradigms. In 2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC) (pp. 487-490). IEEE.
- Krittanawong, C., Virk, H. U. H., Bangalore, S., Wang, Z., Johnson, K. W., Pinotti, R., ... & Tang, W. W. (2020). Machine learning prediction in cardiovascular diseases: A meta-analysis. Scientific Reports, 10(1), 16057.
- Doust, J. A., Bonner, C., & Bell, K. J. (2020). Future directions in cardiovascular disease risk prediction. Australian Journal of General Practice, 49(8), 488-494.
- Quesada, J. A., Lopez-Pineda, A., Gil-Guillén, V. F., Durazo-Arvizu, R., Orozco-Beltrán, D., López-Domenech, A., & Carratalá-Munuera, C. (2019). Machine learning to predict cardiovascular risk. International Journal of Clinical Practice, 73(10), e13389. [CrossRef]
- Dinh, A., Miertschin, S., Young, A., & Mohanty, S. D. (2019). A data-driven approach to predicting diabetes and cardiovascular disease with machine learning. BMC Medical Informatics and Decision Making, 19(1), 1-15.
- Pate, A., Emsley, R., Ashcroft, D. M., Brown, B., & Van Staa, T. (2019). The uncertainty with using risk prediction models for individual decision making: An exemplar cohort study examining the prediction of cardiovascular disease in English primary care. BMC Medicine, 17, 1-16.
- Badhan, P. K., & Kaur, M. (2024). Early Detection of Parkinson Disease through Biomedical Analysis. Available at SSRN 4854893.
- Meshref, H. (2019). Cardiovascular disease diagnosis: A machine learning interpretation approach. International Journal of Advanced Computer Science and Applications, 10(12). [CrossRef]
- Alaa, A. M., Bolton, T., Di Angelantonio, E., Rudd, J. H., & Van der Schaar, M. (2019). Cardiovascular disease risk prediction using automated machine learning: A prospective study of 423,604 UK Biobank participants. PLOS ONE, 14(5), e0213653. [CrossRef]
- Chandralekha, M., & Shenbagavadivu, N. (2018). Performance analysis of various machine learning techniques to predict cardiovascular disease: An empirical study. Applied Mathematics and Information Sciences, 12(1), 217-226. [CrossRef]
- Dimopoulos, A. C., Nikolaidou, M., Caballero, F. F., Engchuan, W., Sanchez-Niubo, A., Arndt, H., ... & Panagiotakos, D. B. (2018). Machine learning methodologies versus cardiovascular risk scores, in predicting disease risk. BMC Medical Research Methodology, 18, 1-11.
- Chiuve, S. E., Cook, N. R., Shay, C. M., Rexrode, K. M., Albert, C. M., Manson, J. E., ... & Rimm, E. B. (2014). Lifestyle-based prediction model for the prevention of CVD: The healthy heart score. Journal of the American Heart Association, 3(6), e000954.



| Fields | Description |
|---|---|
| Patient | ID number assigned to each individual. |
| Patient_Age | Age of the individual in years. |
| Patient Gender | Gender of the individual. Values: 1: Women, 2: Men |
| Patient Height | Height of the individual in cm. |
| Patient_Weight | Weight of the individual in kg. |
| ap_hi | Systolic blood pressure. |
| ap_lo | Diastolic blood pressure. |
| Cholesterol | Cholesterol level. Values: 1: Normal, 2: Above normal, 3: Well above normal. |
| Gluc | Glucose level. Values: 1: Normal, 2: Above normal, 3: Well above normal |
| Smoking | Indicates whether the patient smokes. Values: 0: No, 1: Yes |
| Cardiovascular Risk Score | Calculated cardiovascular risk score based on lipid profile and other risk factors. Values: Moderate, High, Low |
| Family History | Family history of cardiovascular diseases related to lipid disorders. Values: Yes, No |
| Lab Tests | Results of additional laboratory tests relevant to lipid profile. Values: Fasting glucose: 100, Liver function tests normal, Normal lipid panel |
| Genetic Markers | Genetic variants associated with lipid metabolism or cardiovascular risk. Values: rs7412(T; T), rs1800562(A; G), rs429358(C; T) |
| Comorbidities | Presence of other medical conditions affecting lipid metabolism or cardiovascular health. Values: Hypertension, Type 2 Diabetes, Obesity, Hyperlipidemia, None |
| Dietary Habits | Dietary intake relevant to lipid metabolism. Values: Balanced diet, Low in saturated fats, High consumption of processed foods, Mediterranean diet. |
| Physical Activity Level | The individual's level of physical activity. Values: Sedentary, Lightly active, Moderately active, Very active. |
| Alcohol Consumption | Indicates the individual's alcohol consumption. Values: None, Moderate, Heavy. |
| Medication | Medications taken by the individual relevant to lipid or cardiovascular health. Values: Statins, Blood pressure medication, None. |
| Dietary Supplements | Use of dietary supplements relevant to cardiovascular health. Values: Fish oil, Fiber supplements, Multivitamins, None |
| Body Mass Index (BMI) | Calculated Body Mass Index. Values: Numeric value calculated from height and weight. |
| Waist Circumference | Waist circumference in cm. |
| Physical Exam Findings | Results of physical examinations relevant to cardiovascular health. Values: Normal, Abnormal findings (e.g., heart murmurs, edema). |
| Mental Health Status | Information on mental health, which can impact cardiovascular health. Values: No mental health issues, Anxiety, Depression, Stress |
| Sleep Quality | Quality and duration of sleep. Values: Poor, Average, Good |
| Socioeconomic Status | Socioeconomic factors that might influence health. Values: Low, Middle, High. |
| Dietary Patterns | More detailed breakdown of dietary patterns. Values: Vegetarian, Vegan, Paleo, Keto, High carb, Low carb. |
| Ethnicity | Ethnic background of the individual. Values: Specific ethnic categories (e.g., Caucasian, African American, Asian, Hispanic, etc.). |
| Blood Lipid Levels | Detailed lipid profile including LDL, HDL, and triglycerides. Values: Numeric values for LDL, HDL, Triglycerides. |
| Heart Rate | Resting heart rate in beats per minute. Values: Numeric value |
| Previous Cardiovascular Events | History of previous cardiovascular events like heart attack or stroke. Values: None, Heart attack, Stroke, Other. |
| Inflammatory Markers | Levels of inflammatory markers relevant to cardiovascular health. Values: CRP (C-reactive protein) levels, ESR (erythrocyte sedimentation rate). |
| Occupational Risk Factors | Occupational hazards that might impact cardiovascular health. Values: None, Exposure to stress, Sedentary job, Exposure to pollutants. |
| Environmental Factors | Environmental influences on health. Values: Urban, Suburban, Rural. |
| Method | Description |
|---|---|
| Correlation Analysis | Identifies relationships between variables by examining the correlation matrix to pinpoint strong correlations with the target variable and among features. |
| Variance Thresholding | Removes features with low variance, which are less likely to be useful for prediction. |
| Recursive Feature Elimination (RFE) | Iteratively selects the most significant predictors by fitting a model and removing the least important features until the desired number is reached. |
| LASSO (Least Absolute Shrinkage and Selection Operator) | Adds a penalty to the magnitude of coefficients, effectively performing variable selection and regularization. |
| Preprocessing Step | Description |
|---|---|
| Normalization | Datasets (demographic data, health indicators, genetic markers, personal lifestyle factors, and X-ray images) were normalized to ensure uniformity across features. |
| Resizing | X-ray images were resized for consistency in model input. |
| Augmentation | Data augmentation techniques were applied to X-ray images to enhance model robustness. |
| Feature Selection | Utilized methods like correlation analysis and LASSO to identify predictive variables for cardiovascular disease risk. |
| Data Split | Dataset of 264 patients was split into training (e.g., 80%) and testing (e.g., 20%) subsets. |
| ML & DL Models | Machine learning (ML) and deep learning (DL) models (e.g., SVM, RF, DT, CNN) were trained and tested. |
| X-ray Image Preprocessing | CNNs analyzed X-ray images after normalization, resizing, and augmentation to ensure uniformity in inputs. |
| Interpretability | Grad-CAM was applied to CNNs for visualizing and interpreting model decisions. |
| Performance Metrics | Accuracy and F1-score were used to evaluate model performance for cardiovascular disease risk prediction. |
| Cross-validation | Models were optimized using cross-validation to improve robustness. |



Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).