Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Clustering of Cardiovascular Disease Patients Using Data Mining Techniques with Principal Component Analysis and K-Medoids

Version 1 : Received: 2 August 2020 / Approved: 4 August 2020 / Online: 4 August 2020 (03:56:19 CEST)

How to cite: Irwansyah, E.; Salim Pratama, E.; Ohyver, M. Clustering of Cardiovascular Disease Patients Using Data Mining Techniques with Principal Component Analysis and K-Medoids. Preprints 2020, 2020080074. https://doi.org/10.20944/preprints202008.0074.v1 Irwansyah, E.; Salim Pratama, E.; Ohyver, M. Clustering of Cardiovascular Disease Patients Using Data Mining Techniques with Principal Component Analysis and K-Medoids. Preprints 2020, 2020080074. https://doi.org/10.20944/preprints202008.0074.v1

Abstract

Cardiovascular disease is the number one cause of death in the world and Quoting from WHO, around 31% of deaths in the world are caused by cardiovascular diseases and more than 75% of deaths occur in developing countries. The results of patients with cardiovascular disease produce many medical records that can be used for further patient management. This study aims to develop a method of data mining by grouping patients with cardiovascular disease to determine the level of patient complications in the two clusters. The method applied is principal component analysis (PCA) which aims to reduce the dimensions of the large data available and the techniques of data mining in the form of cluster analysis which implements the K-Medoids algorithm. The results of data reduction with PCA resulted in five new components with a cumulative proportion variance of 0.8311. The five new components are implemented for cluster formation using the K-Medoids algorithm which results in the form of two clusters with a silhouette coefficient of 0.35. Combination of techniques of Data reduction by PCA and the application of the K-Medoids clustering algorithm are new ways for grouping data of patients with cardiovascular disease based on the level of patient complications in each cluster of data generated.

Keywords

data mining; cardiovascular diseases; cluster analysis; principle component analysis

Subject

Computer Science and Mathematics, Probability and Statistics

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.