Preprint
Article

This version is not peer-reviewed.

Hierarchical Reconciliation of Fifty-One Years of Highway–Rail Grade Crossing Data with Fuzzy and AI Methods

Submitted:

09 March 2026

Posted:

11 March 2026

You are already at the latest version

Abstract
Highway–rail grade crossing (HRGC) safety research relies on federal incident and inventory datasets that span multiple decades. However, inconsistencies in geographic identifiers and incomplete reconstruction of crossing denominators can distort exposure-based rate metrics. This study develops, documents, and validates a reproducible nine-stage reconciliation pipeline applied to 51 years (1975–2025) of national HRGC incident data from the Federal Railroad Administration Form 57 and Form 71 datasets. The hierarchical pipeline integrated deterministic alignment and AI-assisted inference to produce an audited, geographically consistent dataset. The study formalizes four longitudinal county-level exposure metrics that quantify spatiotemporal risk. These metrics include accumulated incidents per million population (AIPM), accumulated incidents per crossing (AIPC), crossings per million population (CPM), and crossings per 100 square miles (CPHSM). All four metrics exhibited pronounced right-skewness: AIPM, CPM, and CPHSM approximated exponential forms, and AIPC approximated a log-normal form. Anderson–Darling tests detected statistically significant tail deviations in three metrics; CPM did not reject the exponential fit at conventional significance levels. Spatial analysis shows coherent regional concentration in incident rates in the Central Plains and lower Mississippi corridors. The national time series exhibits a late-1970s plateau, sustained exponential decline beginning around 1980, and stabilization but persistent incident rates after 2001. Population-normalized AIPM remained statistically indistinguishable between the reconciled and record-dropped datasets; however, crossing-based metrics changed materially when reconstructing denominators from the reconciled crossing universe. Median ratio comparisons confirmed that incident-only denominators introduced substantial measurement bias in local risk assessment. State-level rank reversals persisted even when omnibus distributional tests failed to reject equality. By formalizing multistage data cleaning and quantifying its analytical impact over an unprecedented longitudinal horizon, this study establishes denominator integrity and geographic reconciliation as prerequisites for valid HRGC exposure assessment and provides a replicable platform for future predictive modeling.
Keywords: 
;  ;  ;  ;  ;  ;  ;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated