Submitted:
21 December 2024
Posted:
23 December 2024
You are already at the latest version
Abstract
The growing intricacy of medical data presents a significant challenge for electronic health record (EHR) systems, which are required to integrate multimodal data in order to enhance diagnostic precision and treatment efficacy. This paper proposes a novel GPT-4-based approach to enhance medical data processing capabilities in EHR systems. In particular, the GPT-4 was initially employed to encode clinical text data, and a pre-training and fine-tuning strategy was devised based on domain knowledge with the objective of enhancing its comprehension of medical terminology and context in accordance with the specific requirements of the medical field. Secondly, in order to address the issue of incomplete text data in EHRs, we propose a context-based adaptive filling strategy to facilitate the dynamic completion of missing information and enhance the integrity of text data through the analysis of analogous historical records. In order to enhance the model's capacity to comprehend lengthy texts, we have employed a hierarchical attention mechanism to partition lengthy texts into multiple sub-blocks. Through the adjustment of hierarchical attention weights, GPT-4 is capable of effectively capturing cross-paragraph and cross-topic relationships. Furthermore, data augmentation technology has been employed to generate diverse training data through semantic extensions and data augmentation models, thereby enhancing GPT-4's adaptability to diverse inputs. The experimental results demonstrate that the proposed method has led to notable enhancements in prediction accuracy, text generation capability, and data processing efficiency.
Keywords:
1. Introduction
2. Related Work
3. Methodologies
3.1. Text Encoding and Pre-Training
3.2. Hierarchical Attention Mechanisms
4. Experiments
4.1. Experimental Setup
4.2. Experimental Analysis
5. Conclusions
References
- Li, Rui, Fenglong Ma, and Jing Gao. "Integrating multimodal electronic health records for diagnosis prediction." AMIA Annual Symposium Proceedings. Vol. 2021. 2022.
- Xu, Zhen, David R. So, and Andrew M. Dai. "Mufasa: Multimodal fusion architecture search for electronic health records." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 35. No. 12. 2021. [CrossRef]
- Soenksen, Luis R., et al. "Integrated multimodal artificial intelligence framework for healthcare applications." NPJ digital medicine 5.1 (2022): 149. [CrossRef]
- Wornow, Michael, et al. "The shaky foundations of large language models and foundation models for electronic health records." npj Digital Medicine 6.1 (2023): 135. [CrossRef]
- Zhang, Chaohe, et al. "M3care: Learning with missing modalities in multimodal healthcare data." Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2022.
- Kim, Joo-Chang, and Kyungyong Chung. "Multi-modal stacked denoising autoencoder for handling missing data in healthcare big data." IEEE access 8 (2020): 104933-104943. [CrossRef]
- Jörnsten, Rebecka, et al. "DNA microarray data imputation and significance analysis of differential expression." Bioinformatics 21.22 (2005): 4155-4161. [CrossRef]
- Dong, Weinan, et al. "Generative adversarial networks for imputing missing data for big data clinical research." BMC medical research methodology 21 (2021): 1-10. [CrossRef]
- Wang, Tongxin, et al. "MOGONET integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification." Nature communications 12.1 (2021): 3445.
- Chen, Hongling, et al. "Attention-Based Multi-NMF Deep Neural Network with Multimodality Data for Breast Cancer Prognosis Model." BioMed research international 2019.1 (2019): 9523719. [CrossRef]
- Yang, Jialiang, et al. "Prediction of HER2-positive breast cancer recurrence and metastasis risk from histopathological images and clinical information via multimodal deep learning." Computational and structural biotechnology journal 20 (2022): 333-342. [CrossRef]
- Alizadehsani, Roohallah, et al. "Non-invasive detection of coronary artery disease in high-risk patients based on the stenosis prediction of separate coronary arteries." Computer methods and programs in biomedicine 162 (2018): 119-127. [CrossRef]
- Yavru, İsmail Burak, and Sevcan Yılmaz Gündüz. "PREDICTING MYOCARDIAL INFARCTION COMPLICATIONS AND OUTCOMES WITH DEEP LEARNING." Eskişehir Technical University Journal of Science and Technology A-Applied Sciences and Engineering 23.2 (2022): 184-194.


Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).