Submitted:
21 May 2025
Posted:
22 May 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
- A comprehensive architecture for big data analytics in H-IoT environments, featuring four integrated layers: Data Acquisition, Edge Processing, Cloud Analytics, and Application.
- A hybrid edge-cloud architecture that balances computational requirements with privacy preservation and latency constraints.
- Novel data processing algorithms that efficiently handle heterogeneous health data streams while preserving privacy and ensuring HIPAA compliance.
- Novel data integration techniques for harmonizing heterogeneous health data streams.
- Advanced machine learning models optimized for multi-disease prognosis, with particular emphasis on cardiovascular diseases, diabetes, and respiratory disorders.
- Privacy-preserving analytics methods suitable for sensitive healthcare data.
- Experimental validation of the proposed system using real-world datasets, demonstrating significant improvements in prediction accuracy and system performance.
2. Related Work
2.1. Big Data Analytics in Healthcare
2.2. Healthcare Internet of Things (H-IoT)
2.3. Edge-Cloud Computing in H-IoT
2.4. Multi-Disease Prognosis Using Machine Learning
3. Proposed System Architecture and Methodology
3.1. System Architecture Overview
3.2. Data Acquisition and Preprocessing Layer
- Wearable Sensors: Devices that continuously monitor vital signs such as heart rate, blood pressure, oxygen saturation, and physical activity levels.
- Implantable Devices: Advanced sensors that track internal physiological parameters, including blood glucose levels, cardiac rhythm, and respiratory function.
- Environmental Monitors: Devices that capture contextual information such as temperature, humidity, air quality, and other environmental factors that may impact patient health.
- Smart Medical Devices: Specialized equipment used in clinical settings, including smart inhalers, connected glucometers, and digital stethoscopes.
3.3. Edge Processing Layer
- Edge Gateways: It is intermediate computing nodes that aggregate data from multiple IoT devices, perform initial processing, and manage communication with the cloud infrastructure.
- Local Analytics Modules: It is lightweight machine learning models deployed at the edge for real-time anomaly detection and preliminary risk assessment.
- Privacy Preservation Unit: It is the components that are responsible for implementing privacy-preserving techniques such as data minimization, differential privacy, and secure multi-party computation.
- Edge Storage: It is temporary storage solutions that retain recent data for immediate access and analysis, implementing circular buffer mechanisms to manage storage constraints.
- Task Partitioning: An intelligent scheduler determines which analytical tasks should be executed at the edge versus in the cloud. Time-critical tasks (e.g., anomaly detection) are prioritized for edge processing, while computationally intensive tasks (e.g., model training) are offloaded to the cloud.
- Incremental Learning: We implement incremental learning algorithms that enable models to be partially trained at the edge and then refined in the cloud. This approach reduces communication overhead while maintaining model accuracy.
- Federated Learning: To preserve privacy while leveraging data across multiple sources, we employed federated learning techniques where model updates, rather than raw data, are shared between edge nodes and the cloud.
- Adaptive Resource Allocation: The proposed system dynamically allocates computational resources based on workload patterns, patient criticality, and available bandwidth to optimize overall system performance.
3.4. Cloud Analytics Layer
- Data Integration Hub: It is the components that are responsible for aggregating and harmonizing data from multiple edge nodes, resolving inconsistencies, and creating unified patient profiles.
- Advanced Analytics Engine: It is High-performance computing infrastructure that executes complex machine learning algorithms for multi-disease prognosis.
- Knowledge Repository: It is a semantic database that stores clinical guidelines, disease models, and historical patterns for reference by analytical processes.
- Distributed Computing Framework: It is a scalable processing environment based on Apache Spark that enables parallel execution of data-intensive tasks.
3.5. Application Layer
- Visualization Dashboard: It is an interactive interface presenting risk assessments, trend analyses, and intervention recommendations intuitively for healthcare providers.
- Alert Management System: It is a component that generates and delivers notifications based on risk thresholds and clinical guidelines.
- Decision Support Modules: It is advisory systems that provide evidence-based recommendations for patient management based on predictive insights.
- Integration APIs: It is standardized interfaces that enable seamless integration with electronic health record systems, hospital information systems, and other healthcare IT infrastructure.
3.6. Multi-Disease Prognosis Models
- Hierarchical Disease Ontology: We organized diseases into a hierarchical structure based on medical ontologies (e.g., Disease Ontology) to capture relationships between conditions and enable transfer learning between related diseases.
- Multi-Task Deep Learning: Our primary model employed a multi-task deep neural network architecture with shared lower layers that capture common features across diseases and specialized upper layers for disease-specific predictions.
- Temporal Modeling: To capture disease progression patterns, we incorporated recurrent neural network components (LSTM and GRU) that model temporal dependencies in patient data over varying time scales.
- Attention Mechanisms: We implemented attention mechanisms that automatically identify the most relevant features and time periods for each disease, improving model interpretability and accuracy.
- Ensemble Integration: Predictions from multiple specialized models are integrated using an ensemble approach that weights models based on their performance for specific patient subgroups and disease categories.
3.7. Privacy Preservation Module
4. Implementation and Experimental Setup
4.1. Implementation Details
4.2. Datasets
- MIMIC-III Clinical Database [48]: it is a large, freely available database comprising de-identified health data associated with approximately 40,000 critical care patients. We extracted vital signs, laboratory measurements, medication records, and diagnostic codes for patients with cardiovascular diseases, diabetes, and respiratory disorders.
- 2. PhysioNet Wearable Health Dataset [49]: it is a collection of continuous physiological measurements from wearable devices, including heart rate, electrocardiogram (ECG), blood oxygen levels, and activity metrics from 10,000 subjects over six months.
- UK Biobank [50]: it is a large-scale biomedical database containing genetic and health data from half a million UK participants. We used a subset focusing on cardiovascular diseases, diabetes, and respiratory conditions.
4.3. Evaluation Metrics
- A cloud-only architecture without edge processing.
- Single-disease prediction models without multi-task learning.
- Traditional machine learning approaches without deep learning components.
4.4. Experimental Setup and Scenarios
- Baseline Scenario: All data processing and analytics performed exclusively in the cloud, representing traditional approaches to health data analytics.
- Edge-Only Scenario: Maximum processing performed at the edge, with minimal cloud involvement, representing extreme edge computing approaches.
- Hybrid Scenario: Our proposed hybrid edge-cloud architecture with dynamic task allocation based on computational requirements and urgency.
5. Results and Discussion
5.1. Prediction Performance
5.2. System Performance
5.3. Edge-Cloud Performance Analysis
5.4. Privacy Preservation Effectiveness
5.5. Scalability Analysis
5.6. Clinical Relevance and Interpretability
6. Conclusion and Future Work
References
- Manogaran, G., et al., A new architecture of Internet of Things and big data ecosystem for secured smart healthcare monitoring and alerting system. Future Generation Computer Systems, 2018. 82: p. 375-387. [CrossRef]
- Islam, S.R., et al., The internet of things for health care: a comprehensive survey. IEEE access, 2015. 3: p. 678-708. [CrossRef]
- Ahmadi, H., et al., The application of internet of things in healthcare: a systematic literature review and classification. Universal Access in the Information Society, 2019. 18: p. 837-869. [CrossRef]
- Badawy, M., N. Ramadan, and H.A. Hefny, Big data analytics in healthcare: data sources, tools, challenges, and opportunities. Journal of Electrical Systems Information Technology, 2024. 11(1): p. 63.
- Pramanik, P.K.D., S. Pal, and M. Mukhopadhyay, Healthcare big data: A comprehensive overview. Research anthology on big data analytics, architectures, applications, 2022: p. 119-147.
- Islam, M.S., et al. A systematic review on healthcare analytics: application and theoretical perspective of data mining. in Healthcare. 2018. MDPI.
- Batko, K. and A. Ślęzak, The use of Big Data Analytics in healthcare. Journal of big Data, 2022. 9(1): p. 3. [CrossRef]
- de Gomez, M.R.C., A Comprehensive Introduction to Healthcare Data Analytics. Journal of Biomedical Sustainable Healthcare Applications, 2024: p. 44-53.
- Raghupathi, W. and V. Raghupathi, Big data analytics in healthcare: promise and potential. Health information science systems, 2014. 2: p. 1-10. [CrossRef]
- 1Divyashree, N. and N.P. KS, Improved clinical diagnosis using predictive analytics. IEEE Access, 2022. 10: p. 75158-75175.
- Chen, M., et al., 5G-smart diabetes: Toward personalized diabetes diagnosis with healthcare big data clouds. IEEE Communications Magazine, 2018. 56(4): p. 16-23. [CrossRef]
- Dang, L.M., et al., A survey on internet of things and cloud computing for healthcare. Electronics, 2019. 8(7): p. 768. [CrossRef]
- Adeghe, E.P., C.A. Okolo, and O.T. Ojeyinka, The role of big data in healthcare: A review of implications for patient outcomes and treatment personalization. World Journal of Biology Pharmacy Health Sciences, 2024. 17(3): p. 198-204. [CrossRef]
- Wang, Y., L. Kung, and T.A. Byrd, Big data analytics: Understanding its capabilities and potential benefits for healthcare organizations. Technological forecasting social change, 2018. 126: p. 3-13. [CrossRef]
- Šajnović, U., et al., Internet of things and big data analytics in preventive healthcare: a synthetic review. Electronics, 2024. 13(18): p. 3642.
- Qi, K., Advancing hospital healthcare: achieving IoT-based secure health monitoring through multilayer machine learning. Journal of Big Data, 2025. 12(1): p. 1. [CrossRef]
- Adeoye, S. and R. Adams, Leveraging Artificial Intelligence for Predictive Healthcare: A Data-Driven Approach to Early Diagnosis and Personalized Treatment. Cogniz. J. Multidiscip. Stud, 2024. 4: p. 80-97. [CrossRef]
- Beam, A.L. and I.S. Kohane, Big data and machine learning in health care. Jama, 2018. 319(13): p. 1317-1318. [CrossRef]
- Ding, X., et al., Wearable sensing and telehealth technology with potential applications in the coronavirus pandemic. IEEE reviews in biomedical engineering, 2020. 14: p. 48-70. [CrossRef]
- Mia, M., A.F.N. Masruriyah, and A.R. Pratama, The Utilization of Decision Tree Algorithm In Order to Predict Heart Disease. Jurnal Sisfotek Global, 2022. 12(2): p. 138-142. [CrossRef]
- Orcutt, M., The Rocket Fuel for Biden’s “Cancer Moonshot”? Big Data. Retrieved April 29, 2018. 2016.
- Brindha, P.G., et al. Brain tumor detection from MRI images using deep learning techniques. in IOP conference series: materials science and engineering. 2021. IOP Publishing.
- Hossain, T., et al. Brain tumor detection using convolutional neural network. in 2019 1st international conference on advances in science, engineering and robotics technology (ICASERT). 2019. IEEE.
- Khairandish, M.O., et al., A hybrid CNN-SVM threshold segmentation approach for tumor detection and classification of MRI brain images. Irbm, 2022. 43(4): p. 290-299. [CrossRef]
- Abd El-Latif, A.A., A.A.A.-E.-A.A. El-Latif, and S.E. Venegas-Andraca, Security and Privacy Preserving for IoT and 5G Networks. 2022: Springer.
- Kumar, M., et al., Healthcare Internet of Things (H-IoT): Current trends, future prospects, applications, challenges, and security issues. Electronics, 2023. 12(9): p. 2050. [CrossRef]
- Ali, T.E., et al., Trends, prospects, challenges, and security in the healthcare internet of things. Computing, 2025. 107(1): p. 28.
- Awad, N., et al., Publishing anonymized set-valued data via disassociation towards analysis. Future Internet, 2020. 12(4): p. 71. [CrossRef]
- Melki, R., H.N. Noura, and A. Chehab. Lightweight and secure D2D authentication & key management based on PLS. in 2019 IEEE 90th Vehicular Technology Conference (VTC2019-Fall). 2019. IEEE.
- Noura, H., et al. Lightweight stream cipher scheme for resource-constrained IoT devices. in 2019 International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob). 2019. IEEE.
- Shi, W., et al., Edge computing: Vision and challenges. IEEE internet of things journal, 2016. 3(5): p. 637-646. [CrossRef]
- Singh, A. and K. Chatterjee, Edge computing based secure health monitoring framework for electronic healthcare system. Cluster Computing, 2023. 26(2): p. 1205-1220. [CrossRef]
- Rong, G., et al., An edge-cloud collaborative computing platform for building AIoT applications efficiently. Journal of Cloud computing, 2021. 10(1): p. 36. [CrossRef]
- Xu, G., IoT-assisted ECG monitoring framework with secure data transmission for health care applications. IEEE Access, 2020. 8: p. 74586-74594. [CrossRef]
- Prabhu, M., S.S. NB, and S.N. Rao. Rescutrack: An edge computing-enabled Vitals Monitoring System for first responders. in 2022 IEEE 3rd Global Conference for Advancement in Technology (GCAT). 2022. IEEE.
- Gia, T.N., et al. Fog computing in healthcare internet of things: A case study on ecg feature extraction. in 2015 IEEE international conference on computer and information technology; ubiquitous computing and communications; dependable, autonomic and secure computing; pervasive intelligence and computing. 2015. IEEE.
- Kumari, A., et al., Multimedia big data computing and Internet of Things applications: A taxonomy and process model. Journal of Network Computer Applications, 2018. 124: p. 169-195. [CrossRef]
- Jagatheesaperumal, S.K., et al., An IoT-based framework for personalized health assessment and recommendations using machine learning. Mathematics, 2023. 11(12): p. 2758. [CrossRef]
- Esteva, A., et al., Dermatologist-level classification of skin cancer with deep neural networks. nature, 2017. 542(7639): p. 115-118.
- Dutta, A., et al., Early prediction of diabetes using an ensemble of machine learning models. International Journal of Environmental Research Public Health, 2022. 19(19): p. 12378. [CrossRef]
- Miotto, R., et al., Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Scientific reports, 2016. 6(1): p. 26094. [CrossRef]
- Choi, E., et al. GRAM: graph-based attention model for healthcare representation learning. in Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 2017.
- Wang, X., F. Wang, and J. Hu. A multi-task learning framework for joint disease risk prediction and comorbidity discovery. in 2014 22nd International Conference on Pattern Recognition. 2014. IEEE.
- Alnaim, A.K. and A.M. Alwakeel, Machine-learning-based IoT–edge computing healthcare solutions. Electronics, 2023. 12(4): p. 1027. [CrossRef]
- Braunstein, M.L., Healthcare in the age of interoperability: the promise of fast healthcare interoperability resources. IEEE pulse, 2018. 9(6): p. 24-27. [CrossRef]
- Dwork, C., et al. Calibrating noise to sensitivity in private data analysis. in Theory of Cryptography: Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006. Proceedings 3. 2006. Springer.
- Lundberg, S.M. and S.-I. Lee, A unified approach to interpreting model predictions. Advances in neural information processing systems, 2017. 30.
- Johnson, A.E., et al., MIMIC-III, a freely accessible critical care database. Scientific data, 2016. 3(1): p. 1-9. [CrossRef]
- Goldberger, A.L., et al., PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. circulation, 2000. 101(23): p. e215-e220.
- Sudlow, C., et al., UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS medicine, 2015. 12(3): p. e1001779. [CrossRef]




| Disease Category | Proposed Approach | Single-Disease Models | Traditional ML |
|---|---|---|---|
| Cardiovascular | 0.912 ± 0.015 | 0.851 ± 0.022 | 0.823 ± 0.019 |
| Diabetes | 0.935 ± 0.018 | 0.847 ± 0.017 | 0.831 ± 0.023 |
| Respiratory | 0.893 ± 0.021 | 0.825 ± 0.025 | 0.809 ± 0.024 |
| Renal | 0.882 ± 0.016 | 0.839 ± 0.019 | 0.817 ± 0.022 |
| Neurological | 0.845 ± 0.023 | 0.815 ± 0.026 | 0.789 ± 0.025 |
| Disease Category | Metric | Baseline (Cloud-Only) | Edge-Only | Hybrid (Proposed) |
|---|---|---|---|---|
| Cardiovascular | AUC-ROC | 0.84 | 0.79 | 0.91 |
| Sensitivity | 0.82 | 0.77 | 0.89 | |
| Specificity | 0.85 | 0.80 | 0.92 | |
| F1 Score | 0.83 | 0.78 | 0.90 | |
| Diabetes | AUC-ROC | 0.86 | 0.81 | 0.93 |
| Sensitivity | 0.83 | 0.79 | 0.91 | |
| Specificity | 0.87 | 0.82 | 0.94 | |
| F1 Score | 0.85 | 0.80 | 0.92 | |
| Respiratory | AUC-ROC | 0.82 | 0.76 | 0.86 |
| Sensitivity | 0.80 | 0.74 | 0.87 | |
| Specificity | 0.83 | 0.77 | 0.90 | |
| F1 Score | 0.81 | 0.75 | 0.88 | |
| Metric | Baseline (Cloud-Only) | Edge-Only | Hybrid (Proposed) |
|---|---|---|---|
| Average Latency (ms) | 870 | 120 | 210 |
| Throughput (events/s) | 12,500 | 8,200 | 18,600 |
| CPU Utilization (%) | 85 | 92 | 76 |
| Memory Utilization (%) | 78 | 89 | 72 |
| Bandwidth Consumption (MB/s) | 42 | 5 | 12 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).