Submitted:
16 March 2026
Posted:
17 March 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
- I
- reviews related work in hypotension prediction; Section
- II
- describes the MOVER dataset and the methodology in- cluding data preprocessing, feature engineering, and model development; Section IV presents experimental results and analysis; Section V discusses findings, limitations, and clin- ical implications; and Section VI concludes the paper with recommendations for future work.
2. Literature Review
2.1. Intraoperative Hypotension: Definition and Clinical Impact
2.2. Machine Learning in Hypotension Prediction
2.3. Deep Learning for Time Series Prediction
2.4. The MOVER Dataset
2.5. Research Gap
3. Methodology
3.1. Dataset Description
- Patient Demographics: Age, sex, height, weight, ASA physical status classification
- Vital Signs: Heart rate (HR), systolic blood pressure (SBP), diastolic blood pressure (DBP), mean arterial pressure (MAP), oxygen saturation (SpO2), respiratory rate (RR), end-tidal CO2 (EtCO2), temperature
- Ventilation Parameters: Tidal volume, peak inspiratory pressure, positive end-expiratory pressure (PEEP), minute ventilation
- Medication Administration: Timestamps and doses of anesthetics, vasopressors, and other medications
- Clinical Events: Hypotension, hypertension, tachycardia, bradycardia, desaturation events with timestamps
- Surgical Information: Procedure type, duration, urgency status
3.2. Data Preprocessing
- 1)
-
Data Cleaning: The raw MOVER dataset underwent extensive cleaning procedures:
- 1)
-
Missing Value Handling: Vital sign measurements with missing values were handled using forward-fill for gaps≤ 5 minutes and linear interpolation for longer gaps.Records with ¿30% missing data were excluded.
- 2)
- Outlier Detection and Removal: Physiological implau- sible values were removed using clinically validated thresholds: HR (20-200 bpm), SBP (40-250 mmHg), DBP (20-150 mmHg), MAP (30-180 mmHg), SpO2 (50-100%), RR (4-40 breaths/min).
- 3)
- Artifact Removal: Motion artifacts and measurement errors were identified using median filtering and local outlier factor algorithms.
- 4)
- Temporal Alignment: All vital sign measurements were aligned to uniform 1-minute intervals with timestamps synchronized to procedure start time.
- 2)
- Event Definition and Labeling: Hypotension was defined as MAP ¡ 65 mmHg for at least 1 minute, consistent with recent clinical guidelines [4]. For each hypotensive event, prediction windows were created at 5, 10, and 15 minutes before onset. Non-hypotensive periods were sampled from time points at least 30 minutes away from any hypotensive event to ensure clean negative samples.
3.3. Feature Engineering
- 1)
- Statistical Features: For each vital sign, rolling statistics were calculated over windows of 5, 10, 15, and 30 minutes:
- Mean, median, standard deviation
- Minimum, maximum, range
3.4. Machine Learning Models
- 1)
- Tree-Based Ensemble Methods:XGBoost (Extreme Gradient Boosting): XGBoost implements gradient boosted decision trees with regularization to prevent overfitting. The objective function is:
- Skewness, kurtosis
- 25th, 50th, 75th percentiles
- Rate of change (first derivative)
- Acceleration (second derivative)
- 2)
- Temporal Features:
- Time since procedure start
- Time since last medication administration
- Time since last clinical event
- Cumulative duration of hypotension in last hour
- Trends using linear regression slopes over multiple win- dows
- 3)
- Interaction Features:
- Heart rate - blood pressure product (rate-pressure prod- uct)
- Shock index (HR/SBP)
- Pulse pressure (SBP - DBP)
- Mean arterial pressure variation
- Oxygen delivery index (MAP × SpO2)
- 4)
- Contextual Features:
- Patient demographics (age, sex, ASA class)
- Surgical procedure type and duration
- Anesthetic agents used
- Vasopressor administration history
- Fluid administration rates
- 2)
- Gradient Boosting Machines:Histogram-based Gra- dient Boosting (HGB): Optimized implementation that bins continuous features to improve computational efficiency. Pa- rameters: max iter=200, learning rate=0.1, max depth=5.
- 3)
- Support Vector Machines:SVM with RBF Kernel: Maps input features to high-dimensional space for nonlinear classification. I used C=1.0, gamma=’scale’, and class weight balancing.
- 4)
- Neural Networks:Long Short-Term Memory (LSTM) Networks: Designed to capture temporal dependencies in vital sign sequences. Architecture:
- Input layer: sequence length = 60 minutes
- LSTM layer 1: 128 units, return sequences=True
- Dropout: 0.3
- LSTM layer 2: 64 units
- Dropout: 0.3
- Dense layer: 32 units, ReLU activation
- Output layer: 1 unit, sigmoid activation
- 5)
- K-Nearest Neighbors: KNN with k=15 after empirical optimization, using Euclidean distance and uniform weights.
3.5. Experimental Setup
- 1)
- Train-Test Split: The dataset was partitioned as follows:
- Training set (70%): 3,739 procedures
- Validation set (15%): 801 procedures
- Test set (15%): 802 procedures
- 2)
- Cross-Validation Strategy: I employed 5-fold cross- validation on the training set for hyperparameter tuning. Time- series cross-validation with expanding windows was used to prevent look-ahead bias.
- 3)
- Class Imbalance Handling: The dataset exhibited im- balance between hypotensive and non-hypotensive periods. I addressed this using:
- SMOTE (Synthetic Minority Over-sampling Technique)
- Class weight adjustment (balanced mode)
- Undersampling of majority class in training
- 4)
- Evaluation Metrics: Models were evaluated using:
- TP +TN TP +TN +FP +FN
- Precision: T P
- 4)
- Patient factors: ASA status and age remained impor- tant, suggesting baseline vulnerability influences hy- potension risk.
- 5)
- Temporal patterns: Time since last vasopressor ad- ministration captured medication effects and washout periods.
3.6. Error Analysis
- Recall (Sensitivity): TP +FN
- Specificity: T N
- Precision×Recall Precision+Recall
- AUC-ROC: Area under Receiver Operating Characteristic
- AUC-PR: Area under Precision-Recall curve
- Time-to-event prediction accuracy
3.7. Implementation Details
4. Results
4.1. Model Performance Comparison
4.2. Prediction Window Analysis
- 5-minute: AUC-ROC = 0.973 ± 0.005
- 10-minute: AUC-ROC = 0.942 ± 0.008
- 15-minute: AUC-ROC = 0.908 ± 0.011
4.3. Confusion Matrix Analysis
- True Positives: 1,847
- True Negatives: 1,891
- False Positives: 119
- False Negatives: 107
4.4. Feature Importance Analysis
- 1)
- MAP trends: Recent MAP trends (last 5-10 minutes) were the strongest predictors, highlighting the impor- tance of trajectory rather than absolute values.
- 2)
- Heart rate variability: Increased variability often pre- ceded hypotensive events, possibly reflecting autonomic instability.
- 3)
- Shock index: The combination of heart rate and blood pressure proved more predictive than either parameter alone.
- 1)
- Borderline MAP values (65-70 mmHg): 34% of false positives occurred when MAP was 65-70 mmHg but not meeting hypotension threshold.
- 2)
- Rapid hemodynamic changes: 28% of false negatives involved sudden hypotension development (¡2 minutes) that prediction windows missed.
- 3)
- Surgical manipulation: 22% of errors coincided with major surgical events (aortic cross-clamping, rapid blood loss) not captured in features.
- 4)
- Medication effects: 16% of errors occurred shortly after vasoactive drug administration with unpredictable responses.
4.5. Computational Performance
- XGBoost inference time: 4.2 ± 0.8 ms per prediction
- LSTM inference time: 28.5 ± 3.2 ms per prediction
- Feature engineering: 12.3 ± 1.5 ms per window
- Total pipeline: ¡50 ms, suitable for 1-minute prediction cycles
5. Discussion
5.1. Principal Findings
5.2. Comparison with Prior Work
5.3. Clinical Implications
- 1)
- Early Warning Systems: Real-time alerts could notify anesthesiologists of impending hypotension, enabling proactive interventions such as fluid boluses, vasopressor administration, or adjustment of anesthetic depth.
- 2)
- Risk Stratification: Preoperative risk assessment could identify high-risk patients for enhanced monitoring and preventive strategies.
- 3)
- Treatment Guidance: Models could help optimize va- sopressor timing and dosing by predicting response to interventions.
- 4)
- Quality Improvement: Aggregate analysis of predicted vs. actual events could identify systematic issues in hemodynamic management.
5.4. Interpretability and Clinical Trust
5.5. Limitations
- 1)
- Single hypotension definition: Using MAP ¡ 65 mmHg may not capture all clinically relevant hypotensive events. Alternative definitions (e.g., relative decreases from baseline) might yield different results.
- 2)
- Data quality variability: Despite preprocessing, some measurement artifacts may persist, potentially affecting model performance.
- 3)
- Generalizability: While MOVER includes multiple cen- ters, all are academic medical centers; performance in community hospitals requires validation.
- 4)
- Intervention confounding: The dataset includes real- world clinical interventions (vasopressors, fluids) that alter the natural history of hypotension, potentially in- troducing confounding.
- 5)
- Binary classification: My approach treats hypotension prediction as binary classification, while continuous risk scoring might provide more nuanced information.
5.6. Future Directions
- 1)
- Multi-center prospective validation: Deploy models in prospective studies across diverse clinical settings to assess real-world performance.
- 2)
- Integration with electronic health records: Develop pipelines for real-time data extraction and model deploy- ment within existing clinical workflows.
- 3)
- Personalized prediction: Explore patient-specific model adaptation to account for individual hemodynamic responses.
- 4)
- Causal modeling: Investigate causal relationships be- tween features and hypotension to guide interventions.
- 5)
- Multi-task learning: Simultaneously predict multiple outcomes (hypotension, hypertension, desaturation) for comprehensive risk assessment.
- 6)
- Explainable AI: Develop enhanced interpretability methods to provide actionable insights for clinicians.
6. Conclusions
Acknowledgments
References
- Bijker, J. B.; van Klei, W. A.; Kappen, T. H.; van Wolfswinkel, L.; Moons, K.; Kalkman, C. J. Incidence of intraoperative hypotension as a function of the chosen definition: literature definitions applied to a retrospective cohort using automated data collection. Anesthesiology 2007, 107, 213–220. [Google Scholar] [CrossRef] [PubMed]
- Walsh, M.; Devereaux, P. J.; Garg, A. X.; Kurz, A.; Turan, A.; Rodseth, R. K.; Cywinski, J.; Thabane, L.; Sessler, D. I. Relationship between intraoperative mean arterial pressure and clinical outcomes after noncardiac surgery: toward an empirical definition of hypotension. Anesthesiology 2013, 119, 507–515. [Google Scholar] [CrossRef] [PubMed]
- Salmasi, V.; Maheshwari, K.; Yang, D.; Mascha, E. J.; Singh, A.; Sessler, D. I.; Kurz, A. Relationship between intraoperative hypotension, defined by either reduction from baseline or absolute thresholds, and acute kidney and myocardial injury after noncardiac surgery: a retro- spective cohort analysis. Anesthesiology 2017, 126, 47–65. [Google Scholar] [PubMed]
- Sessler, D. I.; Bloomstone, J. A.; Aronson, S.; Berry, C.; Gan, T. J.; Lumb, A. B.; Mythen, A. M.; Pearse, R. M.; Mythen, M. G. Perioperative Quality Initiative consensus statement on intraoperative blood pressure, risk and outcomes for elective surgery. British Journal of Anaesthesia 2019, 122, 563–574. [Google Scholar] [CrossRef] [PubMed]
- MOVER: Medical Informatics Operating Room Vitals and Events Repository. UCI Machine Learning Repository. 2023. Available online: https://archive.ics.uci.edu/dataset/877/mover.
- Hatib, F.; Jian, Z.; Buddi, S.; Lee, C.; Settels, J.; Sibert, K.; Rinehart, J.; Cannesson, M. Machine-learning algorithm to predict hypotension based on high-fidelity arterial pressure waveform analysis. Anesthesi- ology 2018, 129, 663–674. [Google Scholar] [CrossRef] [PubMed]
- Lee, S.; Lee, H. C.; Chu, Y. S.; Song, S. B.; Ahn, G. J.; Lee, H.; Yang, S.; Koh, S. B. Prediction of hypotension during cesarean section with machine learning. Journal of Clinical Medicine 2021, 10, 1704. [Google Scholar]
- Kendale, S.; Kulkarni, P.; Rosenberg, A. D.; Wang, J. Supervised machine-learning predictive analytics for prediction of postinduction hypotension. Anesthesiology 2018, 129, 675–688. [Google Scholar] [CrossRef] [PubMed]
- Chen, X.; Xu, D.; Zhang, G.; Mukkamala, S. Forecasting hypotension in intensive care units using deep learning models. IEEE Journal of Biomedical and Health Informatics 2020, 24, 2345–2354. [Google Scholar]
- Sadeghi, R.; Banerjee, T.; Romine, W. Early prediction of hy- potension during intraoperative cases using attention-based models. Computers in Biology and Medicine 2022, 145, 105452. [Google Scholar]
- Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016; pp. 785–794. [Google Scholar]
- Chawla, N. V.; Bowyer, K. W.; Hall, L. O.; Kegelmeyer, W. P. SMOTE: synthetic minority over-sampling technique. Journal of Ar- tificial Intelligence Research 2002, 16, 321–357. [Google Scholar] [CrossRef]
- Lundberg, S. M.; Lee, S. I. A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems 2017, 4765–4774. [Google Scholar]
- Bishop, C.M. Bishop, Pattern Recognition and Machine Learning; Springer, 2006. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Computation 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
| Characteristic | Value |
|---|---|
| Number of procedures | 5,342 |
| Total recording hours | 42,736 |
| Patients with hypotension events | 2,847 (53.3%) |
| Total hypotension events | 18,942 |
| Mean procedure duration | 8.0 ± 3.2 hours |
| Patient age (years) | 52.3 ± 18.7 |
| Female patients | 2,831 (53.0%) |
| ASA Class I-II | 3,314 (62.0%) |
| ASA Class III-IV | 2,028 (38.0%) |
| Rank | Feature | Information Gain |
|---|---|---|
| 1 | MAP trend (last 5 min) | 0.342 |
| 2 | Heart rate variability | 0.298 |
| 3 | Shock index trend | 0.276 |
| 4 | MAP - baseline difference | 0.265 |
| 5 | Pulse pressure variation | 0.251 |
| 6 | Rate-pressure product | 0.243 |
| 7 | SBP trend (last 10 min) | 0.238 |
| 8 | Time since last vasopressor | 0.229 |
| 9 | ASA physical status | 0.221 |
| 10 | Age | 0.218 |
| 11 | MAP standard deviation | 0.215 |
| 12 | Respiratory rate variability | 0.209 |
| 13 | EtCO2 trend | 0.203 |
| 14 | Oxygen delivery index | 0.198 |
| 15 | Procedure duration | 0.192 |
| 16 | SpO2 variability | 0.187 |
| 17 | Fluid administration rate | 0.181 |
| 18 | Heart rate trend | 0.176 |
| 19 | Temperature change | 0.169 |
| 20 | Anesthetic depth index | 0.162 |
| Model | Accuracy (%) | Precision (%) | Recall (%) | F1-Score (%) | AUC-ROC | AUC-PR |
|---|---|---|---|---|---|---|
| XGBoost | 94.2 ± 0.8 | 93.8 ± 1.1 | 94.5 ± 0.9 | 94.1 ± 0.9 | 0.973 ± 0.005 | 0.951 ± 0.007 |
| Random Forest | 92.7 ± 1.0 | 92.1 ± 1.3 | 93.2 ± 1.1 | 92.6 ± 1.1 | 0.961 ± 0.007 | 0.938 ± 0.009 |
| HGB | 93.1 ± 0.9 | 92.8 ± 1.2 | 93.5 ± 1.0 | 93.1 ± 1.0 | 0.965 ± 0.006 | 0.943 ± 0.008 |
| SVM (RBF) | 88.4 ± 1.4 | 87.6 ± 1.8 | 89.1 ± 1.5 | 88.3 ± 1.5 | 0.924 ± 0.010 | 0.892 ± 0.012 |
| LSTM | 93.5 ± 1.1 | 92.9 ± 1.4 | 94.0 ± 1.2 | 93.4 ± 1.2 | 0.968 ± 0.007 | 0.947 ± 0.009 |
| MLP | 90.2 ± 1.3 | 89.5 ± 1.6 | 90.8 ± 1.4 | 90.1 ± 1.4 | 0.941 ± 0.009 | 0.915 ± 0.011 |
| KNN (k=15) | 82.6 ± 1.9 | 81.3 ± 2.2 | 83.5 ± 2.0 | 82.4 ± 2.0 | 0.872 ± 0.015 | 0.831 ± 0.018 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).