Forecasting crash severity is critical for emergency response, infrastructure spending & risk communication. Although machine learning has been widely applied to this problem, three shortcomings prevent its practical application: poorly calibrated probability scores, SHAP-based explanations whose faithfulness has not been verified, and models never tested in different regions. The proposed framework, termed SAE-XCrash (Safety-Aware and Explainable Crash Severity Prediction), considers all three using two public datasets - US-Accidents (7.0 million records, 2016-2023) and UK STATS19 (approximately 1,010,000 records, 2016-2022). Notably, the US-Accidents severity label refers to traffic disruption duration, not injury outcome, and results should be interpreted accordingly. Previously unknown label-schema drift led to a revised binary target with Severity 4 as only the positive class; strict temporal splits are used throughout. FIVE classifiers are compared. Post hoc isotonic Calibration reduces expected calibration. Error by 97.3 percent while maintaining negligible discrimination loss. A four-step SHAP audit confirms that explanations genuinely reflect model behavior: deletion-based per-budget faithfulness gaps exceed the 0.05 threshold at every feature budget (min gap=0.066, p<0.0001), though the aggregate trapezoidal AUC is borderline due to scale compression at AUPRC≈0.13, and insertion gaps are statistically significant at more than ten percent of features. Explanation stability holds under conservative noise levels but degrades at realistic perturbation magnitudes mainly in spatially sparse geohash cells. In a three-tier cross-dataset transfer experiment - zero-shot, recalibration and full retraining - spatial memorization is the major generalization barrier while temporal features transfer smoothly between jurisdictions.