Diabetes is one of the fatal diseases that play a vital role in the growth of other diseases in the human body. Controlling and curing diabetes in its early stages is the most significant technique to avoid its effects of diabetes. However, lack of awareness and expensive clinical tests are the primary reasons to skip clinical diagnosis and take preventive methods in lower-income countries like Bangladesh, Pakistan, and India. From this perspective, the study aims to build an automated machine learning (ML) model, which will predict diabetes at an early stage using socio-demographic characteristics rather than clinical attributes. Because clinical features are not always known to all people from lower-income countries. To find the best fit supervised ML classifier of the model, we applied six classification algorithms and found that RF outperformed with an accuracy of 99.36%. In addition, the most significant risk factors were found based on the SHAP value by all the applied classifiers. The study reveals that polyuria, polydipsia, and delayed healing are the most significant risk factors for growing diabetes. The findings indicate that the proposed model is highly capable of predicting diabetes in the early stages.
Computer Science and Mathematics, Artificial Intelligence and Machine Learning
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.