Preprint
Article

This version is not peer-reviewed.

Calibrated and Explainable Gradient Boosting for Road Traffic Crash Severity Prediction: SHAP Audit and Cross-Jurisdiction Transfer Evaluation

Submitted:

11 May 2026

Posted:

12 May 2026

You are already at the latest version

Abstract
Forecasting crash severity is critical for emergency response, infrastructure spending & risk communication. Although machine learning has been widely applied to this problem, three shortcomings prevent its practical application: poorly calibrated probability scores, SHAP-based explanations whose faithfulness has not been verified, and models never tested in different regions. The proposed framework, termed SAE-XCrash (Safety-Aware and Explainable Crash Severity Prediction), considers all three using two public datasets - US-Accidents (7.0 million records, 2016-2023) and UK STATS19 (approximately 1,010,000 records, 2016-2022). Notably, the US-Accidents severity label refers to traffic disruption duration, not injury outcome, and results should be interpreted accordingly. Previously unknown label-schema drift led to a revised binary target with Severity 4 as only the positive class; strict temporal splits are used throughout. FIVE classifiers are compared. Post hoc isotonic Calibration reduces expected calibration. Error by 97.3 percent while maintaining negligible discrimination loss. A four-step SHAP audit confirms that explanations genuinely reflect model behavior: deletion-based per-budget faithfulness gaps exceed the 0.05 threshold at every feature budget (min gap=0.066, p<0.0001), though the aggregate trapezoidal AUC is borderline due to scale compression at AUPRC≈0.13, and insertion gaps are statistically significant at more than ten percent of features. Explanation stability holds under conservative noise levels but degrades at realistic perturbation magnitudes mainly in spatially sparse geohash cells. In a three-tier cross-dataset transfer experiment - zero-shot, recalibration and full retraining - spatial memorization is the major generalization barrier while temporal features transfer smoothly between jurisdictions.
Keywords: 
;  ;  ;  ;  ;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated