Remote sensing-based change detection for infrastructure monitoring demands methods that are simultaneously accurate, robust to severe class imbalance, and transparent in their decision logic. This study proposes MS-HySAN, a hybrid change-detection framework that addresses these requirements through three coordinated design decisions: (i) a truncated, attention-augmented Siamese encoder that serves as a frozen feature extractor rather than an end-to-end pixel classifier, (ii) a latent–physical fusion strategy that concatenates multi-scale CNN difference features with physically interpretable spectral-index differences, and (iii) a LightGBM classifier that performs internal sparse feature selection and exposes gradient-based SHAP attributions for post-hoc analysis. The framework is evaluated on high-resolution PlanetScope imagery (4-band and 8-band) over a national highway construction corridor in Indore, India, using 21 acquisitions from 2022–2025 with geographic k-fold cross-validation to enforce spatial independence. Experimental results show that the proposed hybrid model consistently outperforms conventional deep learning baselines including U-Net and Siamese U-Net across bi-temporal multi-class change-detection tasks, and competes with bi-temporal architectures (ChangeFormer, SNUNet, BIT) under the same training conditions. A SHAP interpretability analysis reveals complementary and physically meaningful contributions from the learned deep features and the handcrafted spectral indices, validating the fusion strategy. In the best-case setting, MS-HySAN (bi-temporal, indices + reflectance) achieves an overall mean F1-score of 0.95 (Kappa: 0.90), outperforming the corresponding deep baseline by +6 F1 points while maintaining stable cross-fold performance.