Submitted:
20 June 2025
Posted:
20 June 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Related Work
3. Method
3.1. Data Preprocessing
3.2. Entity Representation
3.2.1. Patient Entity
3.2.2. Physician Entity
3.3. CDLD Model
3.3.1. Cyclic Dual-Network Training Mechanism
3.3.2. Two-Stage Model Architecture
3.3.3. Model Training Detail
3.4. Comparison Models
3.4. Evaluation Metrics
- CV: Coefficient of variation
- σ: Standard deviation
- μ: Mean
4. Results
4.1. CDLD Results
4.1.1. Full-LoS Results
4.1.2. Short-LoS (Between 2 Hours and 5 Days) Results
4.1.3. Long-LoS (Between 5 Days and 28 Days) Results
4.2. Comparative Results
4.2.1. With or Without Latent Trait Comparison
4.2.2. SHAP Analysis: Feature-Only vs Feature+Latent
5. Discussion and Limitations
5.1. Discussion
5.2. Limitations and Future Research Directions
6. Conclusion
Funding
Ethical Consideration
Acknowledgement
Data availability statement
References
- Baek H, Cho M, Kim S, Hwang H, Song M, Yoo S. Analysis of length of hospital stay using electronic health records: A statistical and data mining approach. PLoS One 2018, 13, e0195901. [Google Scholar] [CrossRef] [PubMed]
- Marshall A, Vasilakis C, El-Darzi E. Length of stay-based patient flow models: recent developments and future directions. Health Care Manag Sci. 2005, 8, 213–20. [Google Scholar] [CrossRef] [PubMed]
- Stone K, Zwiggelaar R, Jones P, Mac Parthaláin N. A systematic review of the prediction of hospital length of stay: Towards a unified framework. PLOS Digit Health 2022, 1, e0000017. [Google Scholar] [CrossRef] [PubMed]
- Lequertier V, Wang T, Fondrevelle J, Augusto V, Duclos A. Hospital Length of Stay Prediction Methods: A Systematic Review. Med Care 2021, 59, 929–38. [Google Scholar] [CrossRef] [PubMed]
- Gordon C, Phillips M, Beresin EV. 3 - The Doctor–Patient Relationship. In: Stern TA, Fricchione GL, Cassem NH, Jellinek MS, Rosenbaum JF, editors. Massachusetts General Hospital Handbook of General Hospital Psychiatry (Sixth Edition). Saint Louis: W.B. Saunders; 2010. p. 15-23.
- Dorr Goold S, Lipkin M, Jr. The doctor-patient relationship: challenges, opportunities, and strategies. J Gen Intern Med. 1999;14 Suppl 1:S26-33.
- Hoff T, Collinson GE. How Do We Talk About the Physician-Patient Relationship? What the Nonempirical Literature Tells Us. Med Care Res Rev. 2017, 74, 251–85. [Google Scholar] [CrossRef] [PubMed]
- Tschannen D, Kalisch BJ. The impact of nurse/physician collaboration on patient length of stay. Journal of Nursing Management. 2009;17:796-803.
- Luo, Z. Research on the optimize doctor-patient matching in China. Applied and Computational Engineering. 2024;87:20-5.
- Ward, P. Trust and communication in a doctor-patient relationship: a literature review. Arch Med. 2018;3:36.
- Gruenberg DA, Shelton W, Rose SL, Rutter AE, Socaris S, McGee G. Factors influencing length of stay in the intensive care unit. American Journal of critical care. 2006;15:502-9.
- Hill AJ, Jones DB, Woodworth L. Physician-patient race-match reduces patient mortality. Journal of Health Economics. 2023;92:102821.
- Greenwood BN, Carnahan S, Huang L. Patient–physician gender concordance and increased mortality among female heart attack patients. Proc Natl Acad Sci U S A. 2018;115(34):8569–74.
- Bernacki RE, Block SD. Communication about serious illness care goals: a review and synthesis of best practices. JAMA Intern Med. 2014;174(12):1994–2003.
- Yu J, Xing L, Tan X, Ren T, Li Z. Doctor-patient combined matching problem and its solving algorithms. IEEE Access. 2019;7:177723-33.
- Yang H, Duan S, Yan J, Cheng Y, Zhang Y. Research on doctor and patient matching accuracy of online medical treatment. Procedia Computer Science. 2022;214:793-800.
- Zhao M, Wang Y, Zhang X, Xu C. Online doctor-patient dynamic stable matching model based on regret theory under incomplete information. Socio-Economic Planning Sciences. 2023;87:101615.
- Rajkomar A, Oren E, Chen K, Dai AM, Hajaj N, Hardt M, et al. Scalable and accurate deep learning with electronic health records. NPJ Digit Med. 2018;1:18.
- Rim D, Nuriev S, Hong Y. Cyclic Training of Dual Deep Neural Networks for Discovering User and Item Latent Traits in Recommendation Systems. IEEE Access. 2025.
- Johnson AEW, Bulgarelli L, Shen L, Gayles A, Shammout A, Horng S, et al. MIMIC-IV, a freely accessible electronic health record dataset. Scientific Data. 2023;10:1.
- Chen J, Wen Y, Pokojovy M, Tseng T-L, McCaffrey P, Vo A, et al. Multi-modal learning for inpatient length of stay prediction. Computers in Biology and Medicine. 2024;171:108121.
- Date, CJ. A Guide to the SQL Standard: Addison-Wesley Longman Publishing Co., Inc.; 1989.
- Han TS, Murray P, Robin J, Wilkinson P, Fluck D, Fry CH. Evaluation of the association of length of stay in hospital and outcomes. Int J Qual Health Care. 2022;34.
- Damiani G, Pinnarelli L, Sommella L, Vena V, Magrini P, Ricciardi W. The Short Stay Unit as a new option for hospitals: a review of the scientific literature. Med Sci Monit. 2011;17:Sr15-9.
- Raju VG, Lakshmi KP, Jain VM, Kalidindi A, Padma V. Study the influence of normalization/transformation process on the accuracy of supervised classification. 2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT): IEEE; 2020. p. 729-35.
- Switrayana IN, Hammad R, Irfan P, Sujaka TT, Nasri MH. Comparative Analysis of Stock Price Prediction Using Deep Learning with Data Scaling Method. JTIM: Jurnal Teknologi Informasi dan Multimedia. 2025;7:78-90.
- Deepa B, Ramesh K. Epileptic seizure detection using deep learning through min max scaler normalization. Int J Health Sci. 2022;6:10981-96.
- Venugopalan J, Chanani N, Maher K, Wang MD. Novel data imputation for multiple types of missing data in intensive care units. IEEE journal of biomedical and health informatics. 2019;23:1243-50.
- Bejani MM, Ghatee M. A systematic review on overfitting control in shallow and deep neural networks. Artificial Intelligence Review. 2021;54:6391-438.
- Huang L, Qin J, Zhou Y, Zhu F, Liu L, Shao L. Normalization techniques in training dnns: Methodology, analysis and application. IEEE transactions on pattern analysis and machine intelligence. 2023;45:10173-96.
- Garbin C, Zhu X, Marques O. Dropout vs. batch normalization: an empirical study of their impact to deep learning. Multimedia tools and applications. 2020;79:12777-815.
- Wong T-T, Yeh P-Y. Reliable accuracy estimates from k-fold cross validation. IEEE Transactions on Knowledge and Data Engineering. 2019;32:1586-94.
- Shorten C, Khoshgoftaar TM. A survey on image data augmentation for deep learning. Journal of big data. 2019;6:1-48.
- Prechelt, L. Early stopping-but when? Neural Networks: Tricks of the trade: Springer; 2002. p. 55-69.
- Wang Z, Bovik AC. Mean squared error: Love it or leave it? A new look at signal fidelity measures. IEEE signal processing magazine. 2009;26:98-117.
- LeCun Y, Bengio Y, Hinton G. Deep learning. nature. 2015;521:436-44.
- Chen T, Guestrin C. Xgboost: A scalable tree boosting system. Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining2016. p. 785-94.
- Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, et al. Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems. 2017;30.
- Hancock JT, Khoshgoftaar TM. CatBoost for big data: an interdisciplinary review. Journal of big data. 2020;7:94.
- Prajit Ramachandran BZ, Quoc V. Le. Searching for Activation Functions. Vancouver, Canada2018.
- Chai T, Draxler RR. Root mean square error (RMSE) or mean absolute error (MAE)?–Arguments against avoiding RMSE in the literature. Geoscientific model development. 2014;7:1247-50.
- Bindu KH, Morusupalli R, Dey N, Rao CR. Coefficient of variation and machine learning applications: CRC Press; 2019.
- Shechtman, O. The coefficient of variation as an index of measurement reliability. Methods of clinical epidemiology: Springer; 2013. p. 39-49.
- Lundberg SM, Lee S-I. A unified approach to interpreting model predictions. Advances in neural information processing systems. 2017;30.
- Parsa AB, Movahedi A, Taghipour H, Derrible S, Mohammadian AK. Toward safer highways, application of XGBoost and SHAP for real-time accident detection and feature analysis. Accident Analysis & Prevention. 2020;136:105405.
- Nohara Y, Matsumoto K, Soejima H, Nakashima N. Explanation of machine learning models using shapley additive explanation and application for real data in hospital. Computer Methods and Programs in Biomedicine. 2022;214:106584.
- Nohara Y, Matsumoto K, Soejima H, Nakashima N. Explanation of machine learning models using improved shapley additive explanation. Proceedings of the 10th ACM international conference on bioinformatics, computational biology and health informatics2019. p. 546-.
- Zheng M, Wang F, Hu X, Miao Y, Cao H, Tang M. A method for analyzing the performance impact of imbalanced binary data on machine learning models. Axioms. 2022;11:607.
- Di Paolo A, Sarkozy F, Ryll B, Siebert U. Personalized medicine in Europe: not yet personal enough? BMC health services research. 2017;17:1-9.





| Dataset | Columns | Count | Values |
| admissions.csv | insurance | 6 | 1. Medicaid, 2. Medicare, 3. Private, 4. Other, 5. No charge, 6. UNKNOWN |
| language | 29→8 | 1. ‘English’: ‘English’ 2. ‘Spanish-Portuguese’: ‘Spanish’, ‘Portuguese’ 3. ‘East Asia’: ‘Chinese’, ‘Japanese’, ‘Korean’ 4. ‘Southeast Asia’: ‘Vietnamese’, ‘Khmer’, ‘Thai’ 5. ‘Europe’: ‘Russian’, ‘French’, ‘Modern Greek (1453-)’, ‘Polish’, ‘Italian’, ‘Armenian’ 6. ‘Middle East Asia’: ‘Arabic’, ‘Persian’, ‘Hindi’, ‘Bengali’ 7. ‘Africa’: ‘Amharic’, ‘Somali’, ‘Haitian’, 8. ‘Other’: Others |
|
| marital_status | 5 | 1. WIDOWED, 2. MARRIED, 3. SINGLE, 4. DIVORCED, 5. UNKNOWN |
|
| race | 33→7 | 1. ‘White’: ‘WHITE’, ‘WHITE - RUSSIAN’, ‘WHITE - OTHER EUROPEAN’, ‘WHITE - BRAZILIAN’, ‘WHITE – EASTERN EUROPEAN’ 2. ‘Black/African’: ‘BLACK/AFRICAN AMERICAN’, ‘BLACK/CAPE VERDEAN’, ‘BLACK/AFRICAN’, ‘BLACK/CARIBBEAN ISLAND’ 3. ‘Asian’: ‘ASIAN’, ‘ASIAN - CHINESE’, ‘ASIAN - SOUTHEAST ASIAN’, ‘ASIAN - KOREAN’, ‘ASIAN - ASIAN INDIAN’ 4. ‘Hispanic/Latino’: ‘HISPANIC/LATINO - SALVADORAN’, ‘HISPANIC/LATINO - PUERTO RICAN’, ‘HISPANIC/LATINO - GUATEMALAN’, ‘HISPANIC/LATINO - DOMINICAN’, ‘HISPANIC/LATINO - MEXICAN’, ‘HISPANIC OR LATINO’, ‘HISPANIC/LATINO - CUBAN’, ‘HISPANIC/LATINO - HONDURAN’, ‘HISPANIC/LATINO - CENTRAL AMERICAN’, ‘HISPANIC/LATINO - COLOMBIAN, ‘SOUTH AMERICAN’ 5. ‘Native American’: ‘AMERICAN INDIAN/ALASKA NATIVE’ 6. ‘Multiple Race/Ethnicity’: ‘MULTIPLE RACE/ETHNICITY’ 7. ‘Declined/Unknown’: ‘UNKNOWN’, ‘UNABLE TO OBTAIN’, ‘PATIENT DECLINED TO ANSWER’, missing values |
|
| patients.csv | gender | 2 | Female: 0, Male: 1 |
| anchor_age | Normalized values (0-1 range) | ||
| microbiologyevents.csv | isolate_num | Normalized values (0-1 range) | |
| omr.csv | high_BP (mmHg) | Normalized mean values of ‘high_BP’, ‘high_BP_lying’, ‘high_BP_Sitting’, ‘high_BP_Standing’, ‘high_BP_1’, and ‘high_BP_3’ (0-1 range) | |
| low_BP (mmHg) | Normalized mean values of ‘low_BP’, ‘low_BP_lying’, ‘low_BP_Sitting’, ‘low_BP_Standing’, ‘low_BP_1’, and ‘low_BP_3’ (0-1 range) | ||
| BMI (kg/m2) | Normalized mean values (0-1 range) | ||
| height (inches) | Normalized mean values (0-1 range) | ||
| weight (Lbs) | Normalized mean values (0-1 range) |
| average RMSE | best RMSE | standard deviation RMSE | CV (%) | |
| full-scope LoS | 0.0219 | 0.0212 | 0.0007 | 3.18 |
| short-LoS between 2 hours and 5 days | 0.1801 | 0.1767 | 0.0023 | 1.25 |
| long-LoS between 5 days and 28 days | 0.1597 | 0.1561 | 0.0023 | 1.43 |
| feature only | feature and latent trait | decrease rate (%) | |
| simple DNN | 0.0228 | 0.0214 | 6.1404 |
| XGBoost | 0.0226 | 0.0217 | 3.9823 |
| LightGBM | 0.0226 | 0.0216 | 4.4248 |
| CatBoost | 0.0227 | 0.0218 | 3.9648 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).