Submitted:
23 March 2025
Posted:
26 March 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Methods
2.1. Accident Data Collection and Preprocessing
2.1.1. Accident Data Collection
2.1.2. Dataset Preprocessing
2.2. Feature Selection
2.2.1. Mutual Information (MI) Definition
2.2.2. Maximum Relevance Criterion
2.2.3. Minimum Redundancy Criterion
2.2.4. mRMR Optimization Objective
2.3. Prediction Model
2.3.1. Decision Tree
2.3.2. Random Forest
2.3.3. LightGBM
2.3.4. XGBoost
2.3.5. CatBoost
2.3.6. AdaBoost
2.4. Model Interpretation and Feature Analysis
2.4.1. Shapley Values
2.4.2. Additivity Property
2.4.3. Consistency
2.4.4. Efficiency for Tree Models (Tree SHAP)

3. Data Description
3.1. Descriptive Statistics of Accident Data
3.2. Feature Statistical Description
4. Experimental Results
4.1. Accident Feature Analysis
4.1.1. Feature Importance Analysis
4.1.2. Feature Dependency Analysis
4.1.3. Bivariate SHAP Contribution Analysis
4.1.4. Instance-Level SHAP Interpretation
5. Implication and Limitations
5.1. Implications
5.2. Limitations
5.3. Future Work
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
Appendix A.1

References
- European Automobile Manufacturers’ Association (ACEA). ACEA Pocket Guide 2024 - 2025; Report No. N/A. European Automobile Manufacturers’ Association: Brussels, Belgium, 2024.
- International Transport Forum (ITF). Road Safety Annual Report 2024; OECD Publishing: Paris, France, 2024.
- Guo, W., Li, J., Song, X., & Zhang, W. (2025). A game-theoretic driver steering model with individual risk perception field generation. Accident Analysis & Prevention, 211, 107869. [CrossRef]
- International Organization for Standardization. ISO 3833:1977 - Road vehicles - Types - Terms and definitions. ISO, 1977.
- World Health Organization. Global status report on road safety 2023. Geneva: World Health Organization; 2023. Licence: CC BY-NC-SA 3.0 IGO.
- National Center for Statistics and Analysis. Traffic safety facts 2022: A compilation of motor vehicle traffic crash data; Report No. DOT HS 813 656. National Highway Traffic Safety Administration: Washington, DC, USA, 2024.
- Guo, W., Song, X., Zhang, W., Li, J., & Wu, X. (2024). Game-Theoretic Shared Control Strategy for Cooperative Collision Avoidance Under Extreme Conditions. IEEE Transactions on Vehicular Technology. [CrossRef]
- Chand, Arun, Jayesh, S., and Bhasi, A. B. Road traffic accidents: An overview of data sources, analysis techniques and contributing factors. Materials Today: Proceedings, vol. 47, pp. 5135–5141, 2021. [CrossRef]
- Pourroostaei Ardakani, S., Liang, X., Mengistu, K. T., So, R. S., Wei, X., He, B., & Cheshmehzangi, A. (2023). Road car accident prediction using a machine-learning-enabled data analysis. Sustainability, 15(7), 5939. [CrossRef]
- Setiadi, D. R. I. M., Islam, H. M. M., Trisnapradika, G. A., & Herowati, W. (2024). Analyzing preprocessing impact on machine learning classifiers for cryotherapy and immunotherapy dataset. Journal of Future Artificial Intelligence and Technologies, 1(1), 39–50.
- Otchere, D. A., Ganat, T. O. A., Ojero, J. O., Tackie-Otoo, B. N., & Taki, M. Y. (2022). Application of gradient boosting regression model for the evaluation of feature selection techniques in improving reservoir characterisation predictions. Journal of Petroleum Science and Engineering, 208, 109244 . [CrossRef]
- Song, D., Yang, X., Yang, Y., Cui, P., and Zhu, G. Bivariate joint analysis of injury severity of drivers in truck-car crashes accommodating multilayer unobserved heterogeneity. Accident Analysis & Prevention, vol. 190, p. 107175, 2023. [CrossRef]
- Gong, H., Fu, T., Sun, Y., Guo, Z., Cong, L., Hu, W., and Ling, Z. Two-vehicle driver-injury severity: A multivariate random parameters logit approach. Analytic Methods in Accident Research, vol. 33, p. 100190, 2022. [CrossRef]
- Kim, J.-K., Ulfarsson, G. F., Kim, S., and Shankar, V. N. Driver-injury severity in single-vehicle crashes in California: a mixed logit analysis of heterogeneity due to age and gender. Accident Analysis & Prevention, vol. 50, pp. 1073–1081, 2013. [CrossRef]
- Cerwick, D. M., Gkritza, K., Shaheed, M. S., and Hans, Z. A comparison of the mixed logit and latent class methods for crash severity analysis. Analytic Methods in Accident Research, vol. 3, pp. 11–27, 2014. [CrossRef]
- Santos, K., Dias, J. P., and Amado, C. A literature review of machine learning algorithms for crash injury severity prediction. Journal of Safety Research, vol. 80, pp. 254–269, 2022. [CrossRef]
- Ching-Hsue Cheng, Jun-He Yang, and Po-Chien Liu, "Rule-based classifier based on accident frequency and three-stage dimensionality reduction for exploring the factors of road accident injuries," PLoS One, vol. 17, no. 8, pp. e0272956, 2022. Public Library of Science San Francisco, CA USA. [CrossRef]
- Fazle Subhan, Yasir Ali, and Shengchuan Zhao, "Unraveling preference heterogeneity in willingness-to-pay for enhanced road safety: A hybrid approach of machine learning and quantile regression," Accident Analysis & Prevention, vol. 190, p. 107176, 2023. [CrossRef]
- J. Li, J. Liu, P. Liu, and Y. Qi, "Analysis of Factors Contributing to the Severity of Large Truck Crashes," Entropy, vol. 22, no. 11, p. 1191, Oct. 2020. [CrossRef]
- G. Shiran, R. Imaninasab, and R. Khayamim, "Crash Severity Analysis of Highways Based on Multinomial Logistic Regression Model, Decision Tree Techniques, and Artificial Neural Network: A Modeling Comparison," Sustainability, vol. 13, no. 10, p. 5670, May 2021. [CrossRef]
- Y. Liang, H. Yuan, Z. Wang, Z. Wan, T. Liu, B. Wu, S. Chen, and X. Tang, "Nonlinear effects of traffic statuses and road geometries on highway traffic accident severity: A machine learning approach," PLoS One, vol. 19, no. 11, p. e0314133, 2024. Public Library of Science San Francisco, CA USA. [CrossRef]
- Jamal, A., Zahid, M., Tauhidur Rahman, M., Al-Ahmadi, H. M., Almoshaogeh, M., Farooq, D., & Ahmad, M. (2021). Injury severity prediction of traffic crashes with ensemble machine learning techniques: A comparative study. International Journal of Injury Control and Safety Promotion, 28(4), 408–427. [CrossRef]
- Shaik, M. E., Islam, M. M., & Hossain, Q. S. (2021). A review on neural network techniques for the prediction of road traffic accident severity. Asian Transport Studies, 7, 100040. [CrossRef]
- Yu, L., Du, B., Hu, X., Sun, L., Han, L., & Lv, W. (2021). Deep spatio-temporal graph convolutional network for traffic accident prediction. Neurocomputing, 423, 135–147. [CrossRef]
- Alhaek, F., Liang, W., Rajeh, T. M., Javed, M. H., & Li, T. (2024). Learning spatial patterns and temporal dependencies for traffic accident severity prediction: A deep learning approach. Knowledge-Based Systems, 286, 111406. [CrossRef]
- Yan, M., & Shen, Y. (2022). Traffic accident severity prediction based on random forest. Sustainability, 14(3), 1729. [CrossRef]
- Wu, S., Yuan, Q., Yan, Z., & Xu, Q. (2021). Analyzing accident injury severity via an extreme gradient boosting (XGBoost) model. Journal of Advanced Transportation, 2021(1), 3771640. [CrossRef]
- Guo, M., Yuan, Z., Janson, B., Peng, Y., Yang, Y., & Wang, W. (2021). Older pedestrian traffic crashes severity analysis based on an emerging machine learning XGBoost. Sustainability, 13(2), 926. [CrossRef]
- Dong, S., Khattak, A., Ullah, I., Zhou, J., & Hussain, A. (2022). Predicting and analyzing road traffic injury severity using boosting-based ensemble learning models with SHAPley Additive exPlanations. International Journal of Environmental Research and Public Health, 19(5), 2925. [CrossRef]
- Zahid, M., Habib, M. F., Ijaz, M., Ameer, I., Ullah, I., Ahmed, T., & He, Z. (2024). Factors affecting injury severity in motorcycle crashes: Different age groups analysis using CatBoost and SHAP techniques. Traffic Injury Prevention, 25(3), 472–481. [CrossRef]
- Ahmed, S., Hossain, M. A., Bhuiyan, M. M. I., & Ray, S. K. (2021). A comparative study of machine learning algorithms to predict road accident severity. In 2021 20th International Conference on Ubiquitous Computing and Communications (IUCC/CIT/DSCI/SmartCNS) (pp. 390–397).
- Wen, X., Xie, Y., Jiang, L., Li, Y., & Ge, T. (2022). On the interpretability of machine learning methods in crash frequency modeling and crash modification factor development. Accident Analysis & Prevention, 168, 106617. [CrossRef]
- Lundberg, S. M., Erion, G., Chen, H., et al. (2020). From local explanations to global understanding with explainable AI for trees. Nature Machine Intelligence, 2, 56–67. [CrossRef]
- Ahmed, S., Hossain, M. A., Ray, S. K., Bhuiyan, M. M. I., & Sabuj, S. R. (2023). A study on road accident prediction and contributing factors using explainable machine learning models: Analysis and performance. Transportation Research Interdisciplinary Perspectives, 19, 100814. [CrossRef]
- Boo, Y., & Choi, Y. (2022). Comparison of mortality prediction models for road traffic accidents: an ensemble technique for imbalanced data. BMC Public Health, 22(1), 1476. [CrossRef]
- Wongvorachan, T., He, S., & Bulut, O. (2023). A comparison of undersampling, oversampling, and SMOTE methods for dealing with imbalanced classification in educational data mining. Information, 14(1), 54. [CrossRef]
- Peng, H., Long, F., & Ding, C. (2005). Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8), 1226–1238. [CrossRef]
- Toğaçar, M., Ergen, B., Çömert, Z., & Özyurt, F. (2020). A deep feature learning model for pneumonia detection applying a combination of mRMR feature selection and machine learning models. IRBM, 41(4), 212–222. [CrossRef]
- HK, R., HA, D., MS, P. K., S, S., & GH, Y. (2024). A robust framework for Alzheimer’s disease detection and staging: incorporating multi-feature integration, MRMR feature selection, and Random Forest classification. Multimedia Tools and Applications, 1–29. [CrossRef]
- Wang, G., Lauri, F., & Hassani, A. H. E. (2022). Feature selection by mRMR method for heart disease diagnosis. IEEE Access, 10, 100786-100796. [CrossRef]
- S. Rezvani and X. Wang, A broad review on class imbalance learning techniques, Applied Soft Computing, vol. 143, p. 110415, 2023. [CrossRef]
- Kingsford, C., and Salzberg, S. L. What are decision trees? Nature Biotechnology, vol. 26, no. 9, pp. 1011–1013, 2008. Nature Publishing Group US New York.
- L. Breiman, "Random forests," Machine Learning, vol. 45, pp. 5–32, 2001. [CrossRef]
- G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y. Liu, "LightGBM: A highly efficient gradient boosting decision tree," Advances in Neural Information Processing Systems, vol. 30, 2017.
- T. Chen and C. Guestrin, "XGBoost: A scalable tree boosting system," in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794, 2016.
- L. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush, and A. Gulin, "CatBoost: Unbiased boosting with categorical features," Advances in Neural Information Processing Systems, vol. 31, 2018.
- Y. Freund and R. E. Schapire, "A decision-theoretic generalization of on-line learning and an application to boosting," Journal of Computer and System Sciences, vol. 55, no. 1, pp. 119–139, 1997. [CrossRef]
- S. M. Lundberg, G. Erion, H. Chen, et al., "From local explanations to global understanding with explainable AI for trees," Nature Machine Intelligence, vol. 2, pp. 56–67, 2020. [CrossRef]
- National Center for Statistics and Analysis. (2024, April). Fatality Analysis Reporting System analytical user’s manual, 1975-2022 (Report No. DOT HS 813 556). National Highway Traffic Safety Administration.
- A. Agresti, Categorical Data Analysis, John Wiley & Sons, 2013.











| Database | Participants | Case Number | Driver numbers for different accident vehicles | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Total | PV-PV | Total | PV | Motorcycle | Truck | Bus | Others | |||
| CRSS | 1 | 121288 | - | 121288 | 107169 | 8585 | 3768 | 418 | 1348 | |
| 2 | 204916 | 175910 | 409832 | 379053 | 9341 | 14218 | 1590 | 5630 | ||
| >2 | 26938 | - | 87755 | 83664 | 883 | 2226 | 122 | 860 | ||
| FARS | 1 | 134982 | - | 134982 | 111851 | 14360 | 5462 | 370 | 2939 | |
| 2 | 86718 | 49551 | 173436 | 132888 | 18678 | 17512 | 580 | 3778 | ||
| >2 | 19931 | - | 69080 | 57565 | 3685 | 6361 | 177 | 1292 | ||
| CRSS+FARS | - | 594773 | 225461 | 996373 | 872190 | 55532 | 49547 | 3257 | 15847 | |
| Variables | (%) | Injury Severity (%) | Variables | (%) | Injury Severity (%) | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 2 | 3 | 0 | 1 | 2 | 3 | ||||
| Database | Divided, Unprotected (2) | 27.7 | 50.2 | 31.2 | 6.6 | 12 | |||||
| CRSS | 63.3 | 68.7 | 27.7 | 3.2 | 0.4 | Divided, Protected (3) | 13.4 | 58 | 22.8 | 5.6 | 13.6 |
| FARS | 36.7 | 5 | 25.5 | 18.4 | 51.1 | One-Way (4) | 7.4 | 72.5 | 19.9 | 3.1 | 4.6 |
| CRSS+FARS | 100 | 45.3 | 26.9 | 8.8 | 19 | Roadway Profile (TWY_PRFL) | |||||
| Location at rural or urban (RUR_URBN) | Level (1) | 85.9 | 47.5 | 26.9 | 8.1 | 17.5 | |||||
| Rural (1) | 67.9 | 53 | 25.6 | 7.7 | 13.7 | Grade (2) | 7.2 | 36.1 | 27.7 | 11.4 | 24.8 |
| Urban (2) | 32.1 | 29.1 | 29.5 | 11.3 | 30.2 | Uphill (3) | 2.4 | 28.8 | 25.7 | 15 | 30.5 |
| Light condition (LGT_COND) | Downhill (4) | 2.7 | 31.2 | 24.5 | 13.6 | 30.7 | |||||
| Daylight (1) | 69.8 | 50.6 | 26.8 | 7.2 | 15.4 | Hillcrest (5) | 1.5 | 23.2 | 27.3 | 15.2 | 34.3 |
| Dawn (2) | 1.6 | 34 | 25 | 12 | 29.1 | Sag (6) | 0.2 | 26.6 | 31.2 | 11.6 | 30.7 |
| Dusk (3) | 2.6 | 47.6 | 24.8 | 9.3 | 18.3 | Roadway Alignment (TWY_ALGN) | |||||
| Dark-Lighted (4) | 15 | 41.1 | 30.5 | 9.4 | 19 | Straight (1) | 91.6 | 47.2 | 27.4 | 8.1 | 17.4 |
| Dark-Not Lighted (5) | 11 | 19.2 | 22.9 | 17.5 | 40.4 | Curve Right (2) | 4.6 | 31.1 | 19.1 | 14.9 | 35 |
| Atmospheric Conditions (ATM_COND) | Curve Left (3) | 3.8 | 18 | 24 | 19.5 | 38.5 | |||||
| Clear (1) | 74.4 | 46.3 | 26.6 | 8.6 | 18.5 | Type of Intersection (TWY_INTT) | |||||
| Cloudy (2) | 14.8 | 43.1 | 27.6 | 9.3 | 20.1 | Four-Way Intersection (1) | 33.7 | 45.9 | 33.3 | 6.8 | 14 |
| Rain (3) | 8.6 | 44.5 | 27.6 | 8.6 | 19.2 | T-Intersection (2) | 14.4 | 46.2 | 34.1 | 6.2 | 13.5 |
| Snow (4) | 1.2 | 41.3 | 26.8 | 11.1 | 20.9 | Roundabout (3) | 0.2 | 84.4 | 13.4 | 0.6 | 1.6 |
| Fog (5) | 0.6 | 17.7 | 24.6 | 17.6 | 40.1 | Five-Point, or more (4) | 0.2 | 55.5 | 29.7 | 5.9 | 8.9 |
| Other (6) | 0.3 | 22 | 26.4 | 17.3 | 34.4 | Y-Intersection (5) | 0 | 47.2 | 33.3 | 2.8 | 16.7 |
| Trafficway Description (TWY_TYPE) | Other Type (6) | 0.4 | 29.8 | 29.8 | 11.3 | 29.1 | |||||
| Not Divided (1) | 51.5 | 35.6 | 26.6 | 11.6 | 26.2 | Not an Intersection (7) | 51.1 | 44.7 | 20.6 | 10.9 | 23.9 |
| Variables | (%) | Injury Severity (%) | Variables | (%) | Injury Severity (%) | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 2 | 3 | 0 | 1 | 2 | 3 | ||||
| Attempted Avoidance Maneuver (DRV_MANU) | Intersect Paths ( Straight Path) (7) | 14.3 | 26.1 | 40.6 | 9.9 | 23.5 | |||||
| No Avoidance Maneuver (1) | 23.7 | 62.6 | 17.6 | 5.7 | 14.2 | Opposite Direction (Angle Sideswipe) (8) | 4.2 | 12.4 | 28.4 | 16.5 | 42.8 |
| Braking (2) | 3.5 | 51.6 | 32.8 | 7.5 | 8.2 | Head-On (9) | 14.7 | 2.2 | 16.1 | 27.1 | 54.6 |
| Unknown (3) | 68 | 40.7 | 29.5 | 9.3 | 20.5 | Other Types (10) | 0.3 | 8.2 | 23.4 | 10.9 | 57.5 |
| Accelerating (4) | 0.1 | 39.5 | 29.3 | 10.9 | 20.4 | Travel Speed of This Vehicle (TRV_SPD1) | |||||
| Braking and Steering (5) | 1 | 24.7 | 35.9 | 18.3 | 21 | 0 mph | 16.2 | 84.6 | 11.3 | 1.4 | 2.7 |
| Releasing Brakes (6) | 0 | 22.9 | 41.7 | 2.1 | 33.3 | 0-20 mph | 19 | 82.2 | 8.8 | 2.2 | 6.8 |
| Braking and Unknown Steering Direction (7) | 0.2 | 13.4 | 42.9 | 18.8 | 25 | 21-40 mph | 27.1 | 38.1 | 41.5 | 7.2 | 13.2 |
| Steering (8) | 3.5 | 20.9 | 28.6 | 19 | 31.5 | 41-60 mph | 28.2 | 15.5 | 34.2 | 16.2 | 34.1 |
| Accelerating and Steering (9) | 0 | 26.1 | 19.5 | 16.3 | 38.2 | >60 mph | 9.4 | 13.7 | 25.9 | 17.4 | 43.1 |
| Pre-Impact Stability (PIM_STAB) | Travel Speed of Other Vehicle (TRV_SPD2) | ||||||||||
| Tracking (1) | 92.6 | 48 | 27.1 | 8.2 | 16.7 | 0 mph | 10 | 73 | 21.2 | 2.4 | 3.4 |
| Skidding Longitudinally (2) | 1.4 | 26.5 | 31.1 | 14.5 | 27.9 | 0-20 mph | 18.9 | 79.3 | 15.9 | 2.4 | 2.3 |
| Skidding Laterally (3) | 1.1 | 5.6 | 8.5 | 14.9 | 70.9 | 21-40 mph | 30 | 46.8 | 35.8 | 6.4 | 11 |
| Other (4) | 4.9 | 10.2 | 24.9 | 17.2 | 47.7 | 41-60 mph | 30.9 | 23.1 | 28.4 | 14.7 | 33.8 |
| Crash Type (ACC_TYPE) | >60 mph | 10.1 | 18.2 | 21.4 | 16.2 | 44.2 | |||||
| Rear End (1) | 32.1 | 74.9 | 18.6 | 2.1 | 4.4 | Damage Area Count (DMG_ARCT) | |||||
| Same Direction (Angle,Sideswipe) (2) | 7 | 79 | 11 | 1.8 | 8.2 | 1 | 55.9 | 52.9 | 26.1 | 6.9 | 14 |
| Miss Control (3) | 1.2 | 90.1 | 6.5 | 1.1 | 2.4 | 2-5 | 35.2 | 44.1 | 30.9 | 8.4 | 16.6 |
| Turn Into Path (4) | 12.2 | 42.1 | 37.2 | 5.9 | 14.8 | 6-9 | 4.7 | 4.7 | 20.1 | 22.2 | 53 |
| Turn Across Path (5) | 13.5 | 36 | 43.8 | 7.9 | 12.2 | 10-12 | 4.2 | 1.1 | 10.4 | 21.9 | 66.6 |
| Opposite direction (Forward Impact) (6) | 0.4 | 1.7 | 34.7 | 21.9 | 41.7 | ||||||
| Variables | (%) | Injury Severity (%) | Variables | (%) | Injury Severity (%) | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 2 | 3 | 0 | 1 | 2 | 3 | ||||
| The gender of this driver (DRV_SEX1) | <1 | 4.7 | 54.8 | 21.5 | 6.5 | 17.2 | |||||
| Male (1) | 55.9 | 44.7 | 23.7 | 9.5 | 22 | 1-10 | 55.9 | 49.3 | 24.7 | 7.6 | 18.4 |
| Female (2) | 44.1 | 46.2 | 30.8 | 7.9 | 15.2 | 11-20 | 34.1 | 39.6 | 29.8 | 10.5 | 20.1 |
| Drug involvement of this driver (DRV_DUG1) | >20 | 5.3 | 32.4 | 35.3 | 12.7 | 19.7 | |||||
| No (1) | 96.6 | 46.8 | 27.2 | 8.3 | 17.7 | The Weight of This Vehicle (VEH_WGT1) | |||||
| Yes (2) | 3.4 | 3.9 | 17.1 | 22.5 | 56.6 | 1700-2700 lbs. | 8.5 | 38.2 | 21.3 | 8.2 | 32.5 |
| Alcohol Test Result of this driver (DRV_ALC1) | 2701-3700 lbs. | 51.9 | 44.8 | 25.9 | 8.4 | 21 | |||||
| 0 mg/dL | 93.7 | 48 | 27.4 | 8.2 | 16.5 | 3701-4700 lbs. | 26.3 | 48.3 | 28.3 | 9.2 | 14.2 |
| 1-80 mg/dL | 1 | 2.7 | 14.8 | 15.9 | 66.5 | 4701-5700 lbs. | 11.2 | 47.4 | 30.6 | 10.1 | 11.9 |
| >80 mg/dL | 5.4 | 7.7 | 18.7 | 19 | 54.6 | >5700 lbs. | 2.1 | 40.7 | 34.6 | 10.9 | 13.9 |
| The Other Driver’s Gender (DRV_SEX2) | The Weight of Other Vehicle (VEH_WGT2) | ||||||||||
| Male (1) | 58 | 41 | 26.6 | 9.8 | 22.7 | 1700-2700 lbs. | 8 | 47.77 | 35.4 | 9.45 | 7.38 |
| Female (2) | 42 | 51.4 | 27.2 | 7.5 | 13.9 | 2701-3700 lbs. | 50.1 | 48.77 | 28.8 | 8.91 | 13.52 |
| Drug Involvement of Other Driver (DRV_DUG2) | 3701-4700 lbs. | 27 | 44.53 | 24.96 | 8.47 | 22.04 | |||||
| No (1) | 96.6 | 46.7 | 26.9 | 8.3 | 18.2 | 4701-5700 lbs. | 12.4 | 35.65 | 19.91 | 8.51 | 35.93 |
| Yes (2) | 3.4 | 7.8 | 25.9 | 22.9 | 43.5 | >5700 lbs. | 2.5 | 25.54 | 14.95 | 9.9 | 49.61 |
| Alcohol Test Result of other driver (DRV_ALC2) | Vehicle Body Type of This Vehicle (VEH_BDY1) | ||||||||||
| 0 mg/dL | 93.7 | 47.7 | 26.8 | 8 | 17.6 | Pickup (1) | 15.7 | 45 | 27.7 | 10.4 | 16.9 |
| 1-80 mg/dL | 0.9 | 6.6 | 29 | 23 | 42 | SUV (2) | 30 | 50.2 | 28.7 | 8.4 | 12.7 |
| >80 mg/dL | 5.4 | 11.6 | 27.2 | 21.2 | 40.1 | Minivan (3) | 2.1 | 69.1 | 27.8 | 2.8 | 0.3 |
| Seat Belt Type and Usage Status (SAF_REST) | Cargo Van (4) | 0.3 | 80.8 | 17.5 | 1.5 | 0.2 | |||||
| Not Used (1) | 12.7 | 9.9 | 14.8 | 16.9 | 58.4 | VAN (5) | 1.9 | 11 | 30.2 | 18.8 | 40 |
| Two-Point (2) | 0.9 | 41.3 | 35.2 | 8.4 | 15.1 | Sedan (6) | 40.7 | 43.2 | 25.8 | 8.5 | 22.5 |
| Three-Point (3) | 84.7 | 50.9 | 28.4 | 7.5 | 13.2 | Coupe (7) | 3.7 | 38.6 | 22.9 | 9.4 | 29.1 |
| Others (4) | 1.7 | 34 | 33 | 16 | 17 | Wagon (8) | 0.7 | 41.8 | 23 | 6.6 | 28.7 |
| Air bag deployment (SAF_ARBG) | Hatchback (9) | 4.1 | 41.1 | 24.4 | 7.9 | 26.6 | |||||
| Not Deployed (1) | 58 | 71.2 | 17.8 | 3 | 8.1 | Convertible (10) | 0.8 | 39.1 | 21.8 | 8.5 | 30.6 |
| Curtain (2) | 0.1 | 10.2 | 30.5 | 15.3 | 44.1 | Vehicle Body Type of Other Vehicle (VEH_BDY2) | |||||
| Side (3) | 1.2 | 19 | 39 | 8.2 | 34 | Pickup (1) | 17.3 | 34.9 | 20.8 | 9.2 | 35.1 |
| Front (4) | 1.2 | 18.8 | 38.7 | 8.2 | 34.4 | SUV (2) | 29.8 | 48.2 | 25 | 7.8 | 19 |
| Combined (5) | 16 | 10.1 | 39 | 16.7 | 34.2 | Minivan (3) | 2.1 | 67.1 | 28.9 | 3.6 | 0.4 |
| Other (6) | 24.4 | 8.7 | 39.8 | 17.4 | 34 | Cargo Van (4) | 0.4 | 68.3 | 27.6 | 3.9 | 0.2 |
| The Age of This Vehicle (VEH_AGE1) | VAN (5) | 2 | 9.2 | 23 | 18 | 49.8 | |||||
| <1 | 5 | 53.7 | 30.3 | 6.9 | 9.1 | Sedan (6) | 39.5 | 48.3 | 29.9 | 9.1 | 12.8 |
| 1-10 | 56.8 | 49.7 | 29.3 | 7.8 | 13.1 | Coupe (7) | 3.6 | 42.1 | 31.4 | 10.5 | 16 |
| 11-20 | 32.7 | 39.5 | 23.6 | 10.4 | 26.6 | Wagon (8) | 0.7 | 46.1 | 31.7 | 10 | 12.2 |
| >20 | 5.4 | 26.7 | 17.3 | 12.1 | 43.9 | Hatchback (9) | 3.8 | 48.8 | 30.9 | 8.6 | 11.7 |
| The Age of Other Vehicle (VEH_AGE2) | Convertible (10) | 0.8 | 41.9 | 34.3 | 10.3 | 13.6 | |||||
| Models | Injury | Imbalanced Data (%) | Balanced Data (%) | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Severity | Precision | Recall | F1-score | Accuracy | Precision | Recall | F1-score | Accuracy | |
| Level0 | 87.93 | 90.81 | 89.34 | 85.19 | 87.19 | 86.17 | |||
| Level1 | 72.17 | 74.02 | 73.08 | 79.12 | 76.99 | 78.04 | |||
| XGBoost | Level2 | 51.55 | 20.95 | 29.79 | 86.66 | 85.88 | 86.27 | ||
| Level3 | 75.57 | 88.07 | 81.34 | 88.42 | 89.55 | 88.98 | |||
| Average | 71.80 | 68.46 | 68.39 | 79.52 | 84.85 | 84.90 | 84.87 | 84.90 | |
| Level0 | 87.85 | 90.07 | 88.95 | 84.69 | 86.85 | 85.76 | |||
| Level1 | 71.01 | 74.28 | 72.61 | 79.74 | 74.47 | 77.01 | |||
| Random Forest | Level2 | 54.96 | 15.33 | 23.97 | 86.00 | 87.88 | 86.93 | ||
| Level3 | 72.74 | 88.33 | 79.78 | 88.11 | 89.75 | 88.92 | |||
| Average | 71.64 | 67.01 | 66.33 | 78.81 | 84.63 | 84.74 | 84.66 | 83.74 | |
| Level0 | 88.16 | 90.20 | 89.17 | 84.38 | 86.43 | 85.39 | |||
| Level1 | 71.60 | 74.19 | 72.87 | 75.83 | 75.05 | 75.44 | |||
| CatBoost | Level2 | 49.39 | 23.59 | 31.93 | 81.07 | 79.31 | 80.18 | ||
| Level3 | 75.82 | 86.42 | 80.77 | 84.68 | 85.38 | 85.03 | |||
| Average | 71.24 | 68.60 | 68.69 | 79.22 | 81.49 | 81.54 | 81.51 | 81.52 | |
| Level0 | 87.88 | 90.28 | 89.06 | 83.87 | 86.03 | 84.93 | |||
| Level1 | 71.18 | 72.30 | 71.74 | 73.84 | 74.27 | 74.06 | |||
| LightGBM | Level2 | 42.18 | 24.21 | 30.76 | 77.72 | 73.80 | 75.71 | ||
| Level3 | 75.32 | 83.86 | 79.36 | 81.04 | 82.67 | 81.85 | |||
| Average | 69.14 | 67.66 | 67.73 | 78.31 | 79.12 | 79.19 | 79.14 | 79.15 | |
| Level0 | 88.81 | 84.61 | 86.66 | 85.03 | 77.86 | 81.29 | |||
| Level1 | 63.38 | 77.87 | 69.88 | 69.39 | 78.04 | 73.46 | |||
| AdaBoost | Level2 | 51.16 | 20.92 | 29.70 | 80.52 | 74.09 | 77.17 | ||
| Level3 | 77.01 | 82.07 | 79.46 | 80.65 | 83.89 | 82.24 | |||
| Average | 70.09 | 66.37 | 66.43 | 76.62 | 78.90 | 78.47 | 78.54 | 78.45 | |
| Level0 | 84.50 | 86.85 | 85.66 | 80.26 | 81.56 | 80.91 | |||
| Level1 | 63.95 | 63.95 | 63.95 | 64.50 | 65.64 | 65.07 | |||
| Decision Tree | Level2 | 37.56 | 24.15 | 29.40 | 65.36 | 66.99 | 66.17 | ||
| Level3 | 69.71 | 70.59 | 70.15 | 75.53 | 71.07 | 73.23 | |||
| Average | 63.93 | 61.39 | 62.29 | 72.00 | 71.42 | 71.32 | 71.34 | 71.27 | |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).