Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Machine Learning in Apache Spark Environment for Diagnosis of Diabetes

Version 1 : Received: 9 November 2021 / Approved: 10 November 2021 / Online: 10 November 2021 (09:00:39 CET)

How to cite: Saravi, F.B.; Moghanian, S.; Javidi, G.; Sheybani, E.O. Machine Learning in Apache Spark Environment for Diagnosis of Diabetes. Preprints 2021, 2021110200 (doi: 10.20944/preprints202111.0200.v1). Saravi, F.B.; Moghanian, S.; Javidi, G.; Sheybani, E.O. Machine Learning in Apache Spark Environment for Diagnosis of Diabetes. Preprints 2021, 2021110200 (doi: 10.20944/preprints202111.0200.v1).

Abstract

Disease-related data and information collected by physicians, patients, and researchers seem insignificant at first glance. Still, the same unorganized data contain valuable information that is often hidden. The task of data mining techniques is to extract patterns to classify the data accurately. One of the various Data mining and its methods have been used often to diagnose various diseases. In this study, a machine learning (ML) technique based on distributed computing in the Apache Spark computing space is used to diagnose diabetics or hidden pattern of the illness to detect the disease using a large dataset in real-time. Implementation results of three ML techniques of Decision Tree (DT) technique or Random Forest (RF) or Support Vector Machine (SVM) in the Apache Spark computing environment using the Scala programming language and WEKA show that RF is more efficient and faster to diagnose diabetes in big data.

Keywords

Diabetes; Diagnosis; Machine Learning; Wireless Body Area Networks; Apache Spark; Feature Selection

Subject

ENGINEERING, Biomedical & Chemical Engineering

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our diversity statement.

Leave a public comment
Send a private comment to the author(s)
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.