Submitted:
10 May 2025
Posted:
12 May 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
Contributions
- Benchmark: Release of the Copenhagen-Flow dataset with rich exogenous covariates.
- Model: A lightweight architecture that couples graph attention with frequency-domain transformers, reducing temporal-attention cost from to .
- Protocol: A communication-frugal FL scheme—encrypted FedAvg plus Top-K sparsification—requiring ≤5 MB per round.
- Adaptation: An ADWIN-triggered meta-learning loop that restores accuracy within 15 min of drift onset.
- Empirics: Comprehensive comparison against eight baselines; FedST-GNN yields a 5% MAE and 7% RMSE versus the strongest deep baseline, with a median latency of 38 ms on commodity GPUs.
Research Questions
- RQ1
- Can a federated ST-GNN match or exceed centralised baselines while respecting data-sovereignty constraints?
- RQ2
- How quickly and by how much can lightweight drift detection plus meta-adaptation recover accuracy after abrupt demand shifts?
- RQ3
- What communication and compute budgets are required to deliver real-time inference on commodity hardware?
1.1. Regulatory motivation and privacy design
2. Literature Review
2.1. Classical and Deep Approaches to Passenger-Flow Forecasting
Statistical foundations.
Sequence Models
Graph Neural Networks
Spectral and Frequency-Domain Innovations
2.2. Federated Learning in Intelligent Transportation
2.3. Concept Drift Detection and Mitigation
Signal-Based Detectors
Model-Based Adaptation
Open Questions Under Federation
2.4. Synthesis and Research Gap
- Graph neural architectures outperform sequence-only models when spatial correlations are strong.
- FL can match centralised accuracy if communication and heterogeneity are properly handled.
- Drift adaptation can halve post-drift error, but has rarely been integrated with federated ST-GNNs.
| Study | Model | Privacy | Drift | MAE (pax) | Key Outcome |
|---|---|---|---|---|---|
| Li 2021 [10] | SARIMA | central | no | 4.71 | MAPE spikes to 25% during events |
| Yu 2018 [3] | STGCN | central | no | 2.96 | 11% gain over LSTM baseline |
| Zhou 2021 [14] | Informer | central | no | 2.62 | FFT attention lowers RMSE by 7 pp |
| Li 2020 [7] | FedProx-GRU | FL | no | 3.05 | FedProx stabilises heterogeneous clients |
| Gupta 2024 [16] | DP-Transf. | FL + DP | no | 3.28 | DP noise adds 3–4 pp MAE |
3. Materials and Methods
3.1. Dataset Acquisition and Pre–Processing
Descriptive Statistics
Feature Engineering
3.2. Graph Construction
3.3. Problem Definition
3.4. FedST-GNN Architecture
Spatial Encoder (GATv2)
Frequency-Domain Transformer
Output Head
3.5. Federated Optimisation Protocol
| Listing 1. Drift-Aware Federated Training of FedST-GNN |
|
3.6. Computational Complexity
3.7. Baselines and Evaluation Metrics
3.8. Implementation Details
4. Results
4.1. Point and Probabilistic Accuracy
4.2. Horizon-Specific Performance
4.3. Calibration and Sharpness
4.4. Concept-Drift Resilience
4.5. Ablation and Parameter Efficiency
4.6. Latency, Throughput, and Resource Footprint
4.7. Statistical Robustness
5. Discussion
5.1. Interpretation of Core Results
Synergy of Spatial and Spectral Mechanisms
Effectiveness of Drift Adaptation
Federated Parity and Communication Frugality
Operational Viability
5.2. Comparison with Prior Work
5.3. Practical Implications
5.4. Limitations
- Single-city evaluation. Although Copenhagen exhibits multimodal complexity, transferability to radial or hub-and-spoke networks remains unverified.
- Absence of differential privacy. Secure aggregation precludes gradient inspection but does not provide formal guarantees.
- Homogeneous client models. All operators share identical architecture; accommodating heterogeneous capacities would necessitate split or personalised FL variants.
- External feature sparsity. Social-media signals, traffic incidents, and fare policy changes are not yet incorporated; such exogenous covariates could further mitigate drift.
5.5. Future Work
- Multi-city generalisation. Deploying the pipeline in Gothenburg and Singapore will test robustness across divergent network topologies and cultural behaviours.
- Differentially private gradients. Integrating Rényi DP noise and privacy accounting to bound attacker inference risk.
- Heterogeneous and hierarchical FL. Exploring FedBN and FedMSA to accommodate client-specific batch-norm or sub-architectures.
- Closed-loop optimisation. Coupling forecasts with reinforcement-learning agents for dynamic vehicle re-routing and headway adaptation.
6. Conclusions
- FedST-GNN reduced mean absolute error by 5–7% and root-mean-square error by 7% relative to the strongest deep baseline (TFT) while maintaining tight probabilistic calibration.
- Drift adaptation lowered peak error during a half-marathon disruption by 41%, achieving recovery within seven minutes of alarm.
- Median inference latency remained below 40 ms at 1 600 queries/s on a single GTX 1660 SUPER, and communication overhead was held to 5 MB per 15-minute federation round.
- Public release of data, code, and reproducibility scripts establishes an open benchmark for privacy-preserving transit analytics.
Funding
Author Contributions
References
- Jin, G.; Liang, Y.; Fang, Y.; Shao, Z.; Huang, J.; Zhang, J.; Zheng, Y. Spatio-Temporal Graph Neural Networks for Predictive Learning in Urban Computing: A Survey. IEEE Trans. Knowl. Data Eng. 2023, Early Access, 1–23. [Google Scholar] [CrossRef]
- Gama, J.; Žliobaitė, I.; Bifet, A.; Pechenizkiy, M.; Bouchachia, A. A Survey on Concept Drift Adaptation. ACM Comput. Surv. 2014, 46, 44:1–44:37. [Google Scholar] [CrossRef]
- Yu, B.; Yin, H.; Zhu, Z. Spatio-Temporal Graph Convolutional Networks: A Deep Learning Framework for Traffic Forecasting. In Proceedings of AAAI 2018; AAAI Press, 2018; pp. 3634–3641. [Google Scholar]
- Guo, S.; Lin, Y.; Feng, N.; Song, C.; Wan, H. Attention Based Spatial–Temporal Graph Convolutional Networks for Traffic Flow Forecasting. In Proceedings of AAAI 2019; AAAI Press, 2019; pp. 922–929. [Google Scholar]
- Bai, L.; Yao, L.; Li, C.; Wang, X.; Wang, C. Adaptive Graph Convolutional Recurrent Network for Traffic Forecasting. In Advances in Neural Information Processing Systems 33 (NeurIPS 2020); Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. F., Lin, H., Eds.; Curran Associates: Red Hook, NY, USA, 2020; pp. 2693–2704. [Google Scholar]
- Kairouz, P.; McMahan, H. B.; Avent, B.; et al. Advances and Open Problems in Federated Learning. arXiv 2019, arXiv:1912.04977. [Google Scholar]
- Li, T.; Sahu, A. K.; Talwalkar, A.; Smith, V. Federated Optimisation in Heterogeneous Networks. Proc. MLSys 2020, 2, 429–450. [Google Scholar]
- Bifet, A.; Gavaldà, R. Learning from Time-Changing Data with Adaptive Windowing. In Proceedings of SIAM SDM 2007; SIAM, 2007; pp. 443–448. [Google Scholar]
- Nichol, A.; Achiam, J.; Schulman, J. On First-Order Meta-Learning Algorithms. arXiv 2018, arXiv:1803.02999. [Google Scholar]
- Li, W.; Sui, L.; Zhou, M.; Dong, H. Short-Term Passenger-Flow Forecast for Urban Rail Transit Based on Multi-Source Data. EURASIP J. Wirel. Commun. Netw. 2021, 9. [Google Scholar] [CrossRef]
- Ma, X.; Tao, Z.; Wang, Y.; Yu, H.; Wang, Y. Long Short-Term Memory Neural Network for Traffic Speed Prediction Using Remote Microwave Sensor Data. Transp. Res. Part C 2015, 54, 187–197. [Google Scholar] [CrossRef]
- Lim, B.; Arik, S. O.; Pfister, T. Temporal Fusion Transformers for Interpretable Multi-Horizon Time-Series Forecasting. Int. J. Forecast. 2021, 37, 1748–1764. [Google Scholar] [CrossRef]
- Li, Y.; Yu, R.; Shahabi, C.; Liu, Y. Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting. In Proceedings of ICLR 2018; OpenReview, 2018. [Google Scholar]
- Zhou, H.; Zhang, S.; Peng, J.; et al. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. In Proceedings of AAAI 2021; AAAI Press, 2021; pp. 11106–11115. [Google Scholar]
- Wu, Z.; Pan, S.; Long, G.; Jiang, J.; Zhang, C. Graph WaveNet for Deep Spatial–Temporal Graph Modelling. In Proceedings of IJCAI-19; IJCAI, 2019; pp. 1907–1913. [Google Scholar]
- Gupta, S.; Torra, V. Differentially Private Traffic Flow Prediction Using Transformers: A Federated Approach. In Computer Security—ESORICS 2023 Workshops; LNCS 14398; Springer, 2024; pp. 260–271. [Google Scholar]
- Xavier, B. M.; Martinello, M.; Trois, C.; et al. Fast Learning Enabled by In-Network Drift Detection. In Proceedings of APNet 2024; ACM, 2024; pp. 129–134. [Google Scholar]
- Aji, A. F.; Heafield, K. Sparse Communication for Distributed Gradient Descent. arXiv 2018, arXiv:1704.05021. [Google Scholar]
- Beutel, D.; Topal, T.; Qian, H.; et al. Flower: A Friendly Federated Learning Research Framework. arXiv 2022, arXiv:2007.14390. [Google Scholar]
- Little, J. D. C. A Proof for the Queuing Formula L=λW. Oper. Res. 1961, 9, 383–387. [Google Scholar] [CrossRef]
- Diebold, F. X.; Mariano, R. S. Comparing Predictive Accuracy. J. Bus. Econ. Stat. 1995, 13, 253–263. [Google Scholar] [CrossRef]
- Lee-Thorp, J.; Ainslie, J.; Eckstein, I.; Ontañón, S. FNet: Mixing Tokens with Fourier Transforms. In Findings of ACL–IJCNLP 2021; ACL, 2021; pp. 2622–2632. [Google Scholar]





| Attribute | Value | Attribute | Value |
|---|---|---|---|
| Observation period | Jan 2022 – Dec 2024 | Minute-level observations | |
| Stops () | 312 | Directed edges () | |
| Bus routes | 42 | Metro lines | 3 |
| Mean boardings min−1 | Std. boardings min−1 | ||
| Missing GPS ratio | Anomalous fare taps |
| Model | Tuned hyper-parameters (search range) |
|---|---|
| ARIMA | ; seasonal on weekly cycle |
| Prophet | Changepoint prior ; holidays prior |
| XGBoost | Trees ; learning rate ; max depth |
| LSTM/GRU | Layers ; hidden units ; dropout |
| STGCN | Chebyshev order ; channels ; kernel size |
| DCRNN | Diffusion steps ; units ; scheduled sampling |
| TFT | Hidden size ; attention heads ; dropout |
| Parameter | Value | Justification |
|---|---|---|
| Window length L | 12 min | Captures two bus headways (median 6 min) |
| Hidden dimension d | 64 | Empirical elbow in validation loss |
| Learning rate | Stable AdamW convergence | |
| Batch size B | 512 windows | GPU utilisation ≈72% |
| Dropout | Mitigates over-fitting | |
| Quantiles | 80% predictive interval | |
| FedAvg round | 15 min | Network backhaul capacity |
| Top-K sparsity | <5 MB per round on 100 Mbps link |
| Model | MAE | RMSE | sMAPE (%) | CRPS |
|---|---|---|---|---|
| ARIMA | 3.90 (0.05) | 4.88 (0.07) | 13.07 | 2.31 |
| Prophet | 3.75 (0.04) | 4.63 (0.06) | 12.52 | 2.24 |
| XGBoost | 2.95 (0.03) | 3.68 (0.04) | 10.01 | 1.72 |
| LSTM | 2.88 (0.03) | 3.59 (0.04) | 9.86 | 1.69 |
| GRU | 2.85 (0.03) | 3.55 (0.04) | 9.73 | 1.66 |
| STGCN | 2.69 (0.03) | 3.39 (0.04) | 9.12 | 1.58 |
| DCRNN | 2.61 (0.03) | 3.26 (0.04) | 8.94 | 1.54 |
| TFT | 2.57 (0.03) | 3.24 (0.04) | 8.98 | 1.55 |
| FedST-GNN | 2.43 (0.02) | 3.12 (0.04) | 8.46 | 1.47 |
| Comparator | DM | p-value |
|---|---|---|
| TFT | 0.0046 | |
| DCRNN | 0.0016 | |
| STGCN | ||
| XGBoost |
| Variant | MAE | MAE | Parameters (M) |
|---|---|---|---|
| Full FedST-GNN | 2.43 | – | 3.2 |
| No-Drift | 2.57 | +0.14 | 3.2 |
| No-Freq | 2.55 | +0.12 | 3.0 |
| No-Graph | 2.60 | +0.17 | 2.8 |
| TFT | 2.57 | +0.14 | 3.1 |
| Metric | FedST-GNN | TFT | Relative (%) |
|---|---|---|---|
| Median latency (ms) | 38 | 44 | –13.6 |
| P99 latency (ms) | 82 | 104 | –21.2 |
| GPU util. (%) | 71 | 78 | –9.0 |
| GPU memory (GiB) | 3.9 | 4.7 | –17.0 |
| Energy (J req−1) | 0.78 | 1.03 | –24.3 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).