Federated Drift-Aware Graph Neural Forecasting for Real-Time Passenger Flow

Aivar Sakhipov; Zhanbai Uzdenbayev; Diar Begisbayev; Aruzhan Mektepbayeva; Ramazan Seiitbek; Didar Yedilkhan

doi:10.20944/preprints202505.0877.v1

Submitted:

10 May 2025

Posted:

12 May 2025

You are already at the latest version

Abstract

Transit operators need accurate and privacy-preserving passenger-flow forecasts to enable dynamic headway control and crowd management. We introduce FedST-GNN, a federated spatio-temporal graph neural network that fuses encrypted federated averaging (FedAvg) with a frequency-domain Transformer and an Adaptive Windowing (ADWIN) - triggered meta-learning loop for fast concept-drift recovery. Experiments on the public Copenhagen-Flow dataset (18.7 M events, 312 stops, 2022–2024) show that FedST-GNN cuts mean-absolute-error by 5% and root-mean-square-error by 7% relative to the strongest deep baseline (Temporal Fusion Transformer), while sustaining a median inference latency of 38 ms on a GTX 1660 SUPER. During a city half-marathon, the ADWIN trigger and two inner meta-updates lowered peak error by 41% without exceeding a 5 MB communication budget per 15-minute federated round. These results demonstrate that privacy-compliant, drift-resilient graph learning can deliver real-time accuracy on commodity hardware, offering a practical blueprint for intelligent transport analytics.

Keywords:

passenger flow

;

federated learning

;

graph neural networks

;

concept drift

;

real-time forecasting

;

intelligent transportation systems

;

privacy

;

frequency-domain attention

Subject:

Computer Science and Mathematics - Artificial Intelligence and Machine Learning

1. Introduction

Accurate short–horizon forecasts of passenger volume underpin dynamic headway control, vehicle assignment, and crowd-management in modern public transport systems. Urban mobility, however, exhibits increasing volatility due to extreme weather, traffic incidents, pandemics, and mass events; empirical studies report forecast error spikes within five minutes of regime shifts on metro networks [1]. Classical statistical models, which assume quasi-stationarity and depend on periodic batch retraining, therefore fail under rapid drift [2].

Recent work has shown that Spatio-Temporal Graph Neural Networks (ST-GNNs) such as STGCN [3], ASTGCN [4], and drift-aware variants [5] capture non-linear spatial–temporal dependencies and improve accuracy. Yet most deployments centralise raw smart-card data, conflicting with data-sovereignty regulations. A dedicated privacy discussion follows in Section 1.1. Furthermore, the interaction between federated learning (FL) and concept drift for passenger-flow forecasting remains largely unexplored.

This paper presents FedST-GNN, a drift-aware federated ST-GNN evaluated on the new public Copenhagen-Flow dataset—18.7 M minute-level tap-in/tap-out events from 312 stops (2022–2024) enriched with weather and event metadata.

Contributions

Benchmark: Release of the Copenhagen-Flow dataset with rich exogenous covariates.
Model: A lightweight architecture that couples graph attention with frequency-domain transformers, reducing temporal-attention cost from $O (L^{2})$ to $O (L log L)$ .
Protocol: A communication-frugal FL scheme—encrypted FedAvg plus Top-K sparsification—requiring ≤5 MB per round.
Adaptation: An ADWIN-triggered meta-learning loop that restores accuracy within 15 min of drift onset.
Empirics: Comprehensive comparison against eight baselines; FedST-GNN yields a 5% MAE and 7% RMSE versus the strongest deep baseline, with a median latency of 38 ms on commodity GPUs.

Research Questions

RQ1: Can a federated ST-GNN match or exceed centralised baselines while respecting data-sovereignty constraints?
RQ2: How quickly and by how much can lightweight drift detection plus meta-adaptation recover accuracy after abrupt demand shifts?
RQ3: What communication and compute budgets are required to deliver real-time inference on commodity hardware?

The remainder of the article is organised as follows: Section 2 reviews related work; Section 1.1 details regulatory motivation and our privacy design; Section 3 describes data and methodology; Section 4 presents results; Section 5 discusses implications and limitations; Section 6 concludes.

1.1. Regulatory motivation and privacy design

European General Data Protection Regulation (GDPR) and analogous frameworks prohibit uncontrolled dissemination of passenger transaction data. By retaining raw records on operator premises and exchanging only encrypted, sparsified gradients, the proposed FL pipeline satisfies data-sovereignty requirements while keeping communication overhead manageable.

2. Literature Review

Passenger-flow forecasting research intersects three themes: (i) classical and deep spatio-temporal models; (ii) privacy-preserving or decentralised optimisation; and (iii) adaptation to distributional drift. The following subsections synthesise key advances and identify open challenges.

2.1. Classical and Deep Approaches to Passenger-Flow Forecasting

Statistical foundations.

Seasonal ARIMA (SARIMA) remains a canonical baseline for short-horizon transit forecasts. On Beijing metro data, a SARIMA variant achieved off-peak MAPE below 12% but exceeded 25% during event spikes [10]. Spline-based GAMs can curb peak-hour error yet still neglect spatial spill-over effects.

Sequence Models

Recurrent networks have largely supplanted pure statistical baselines. A pioneering LSTM study on freeway sensor data cut RMSE by almost 18% relative to SARIMA, albeit at four-fold computational cost [11]. Temporal Fusion Transformers (TFT) further improve sMAPE via gated residual blocks and multi-scale attention [12].

Graph Neural Networks

ST-GNNs jointly model network topology and temporal dynamics. STGCN uses Chebyshev graph convolutions followed by gated 1-D convolutions [3], reducing MAE by around 11% versus LSTM on NYC subway counts. DCRNN replaces the Chebyshev filter with a bidirectional diffusion operator, securing an additional 4 pp gain [13]. Attention-based ASTGCN injects spatial–temporal attention and outperforms vanilla STGCN on TaxiBJ and METR-LA [4].

Spectral and Frequency-Domain Innovations

Fourier-enhanced transformers such as Informer compute self-attention in the frequency domain, achieving

O (L log L)

complexity and a 7 pp RMSE gain on long sequences [14]. Graph WaveNet introduces adaptive spectral filters that lower variance during special events [15].

2.2. Federated Learning in Intelligent Transportation

Most FL work in mobility focuses on ride-hailing demand or generic traffic speed. A seminal survey outlines privacy mechanisms, communication bottlenecks and open problems [6]. FedProx subsequently demonstrated that proximal regularisation mitigates client heterogeneity without harming accuracy [7]. To date, no open study combines ST-GNNs with FL for passenger-flow data, leaving the impact of drift under federation unquantified.

Privacy guarantees often rest on secure aggregation alone; adding differential-privacy (DP) noise with

ε \approx 1

was found to raise MAE by 3–4 pp on traffic-flow data [16], underscoring a non-trivial privacy–utility trade-off.

2.3. Concept Drift Detection and Mitigation

Signal-Based Detectors

ADWIN maintains two sliding windows and flags drift when their empirical means differ beyond a Hoeffding bound [8]. An in-network P4 implementation registers sub-millisecond alarms on high-rate streams [17].

Model-Based Adaptation

Meta-learning refreshes such as Reptile recalibrate model parameters with a few post-drift gradient steps, reducing post-drift MAE by around 26% on electricity-load streams [9]. Systematic evidence on passenger-flow data is still lacking.

Open Questions Under Federation

Detecting or adapting to drift inside FL remains largely unexplored for graph-based passenger-flow tasks; existing case studies are confined to centrally trained models.

2.4. Synthesis and Research Gap

Table 1 summarises representative studies. Taken together:

Graph neural architectures outperform sequence-only models when spatial correlations are strong.
FL can match centralised accuracy if communication and heterogeneity are properly handled.
Drift adaptation can halve post-drift error, but has rarely been integrated with federated ST-GNNs.

Table 1. Representative passenger-flow (PF) forecasting studies, 2015–2024. Metrics reproduced from the original papers.

Study	Model	Privacy	Drift	MAE (pax)	Key Outcome
Li 2021 [10]	SARIMA	central	no	4.71	MAPE spikes to 25% during events
Yu 2018 [3]	STGCN	central	no	2.96	11% gain over LSTM baseline
Zhou 2021 [14]	Informer	central	no	2.62	FFT attention lowers RMSE by 7 pp
Li 2020 [7]	FedProx-GRU	FL	no	3.05	FedProx stabilises heterogeneous clients
Gupta 2024 [16]	DP-Transf.	FL + DP	no	3.28	DP noise adds 3–4 pp MAE

A privacy-preserving, drift-aware ST-GNN pipeline validated on an open passenger-flow benchmark is therefore still missing. The present work addresses this gap by unifying graph attention, frequency-domain transformers, encrypted FL and lightweight drift adaptation in a single deployable architecture.

3. Materials and Methods

3.1. Dataset Acquisition and Pre–Processing

The Copenhagen-Flow corpus aggregates automatic fare-collection (AFC) and automatic vehicle-location (AVL) streams provided by Movia, the regional transport authority. Raw AFC transactions were matched to vehicle trajectories through GPS timestamps and stop identifiers; erroneous matches (

0.47 %

) were removed via a four-sigma headway filter. Hourly meteorological observations from the Danish Meteorological Institute were linearly interpolated to one-minute resolution and spatially joined by Voronoi tessellation.

Descriptive Statistics

Table 2 summarises salient dataset characteristics after cleaning.

Feature Engineering

Each stop-time pair t is encoded as

x_{v, t} = [b_{v, t}, a_{v, t}, h_{v, t}, T_{t}, P_{t}, E_{t}, sin (\frac{2 π t}{1440}), cos (\frac{2 π t}{1440}), sin (\frac{2 π t}{10080}), cos (\frac{2 π t}{10080})],

where b and a denote boardings and alightings, h the empirical headway, T temperature, P precipitation, and E a binary event flag; the final four terms represent cyclical encodings for time-of-day and day-of-week. All real-valued variables are z-normalised per training split.

3.2. Graph Construction

The dynamic graph

G_{t} = (V, E_{t}, A_{t})

is instantiated as follows. Vertices correspond to stops; a directed edge

(i, j)

exists if (i) stops are served consecutively on any route or (ii) Euclidean distance

d_{i j} \leq 750

m. Edge weights incorporate geographic decay and time-varying congestion:

A_{i j}^{(t)} = \underset{static}{\underset{︸}{exp (- d_{i j} / σ)}} \times \underset{dynamic}{\underset{︸}{\frac{h_{i, t} + h_{j, t}}{2 \bar{h}}}}, σ = 350 m,

(1)

where

\bar{h}

is the long-run network headway.

3.3. Problem Definition

Given a sliding window of length L minutes,

X_{N (v), t - L + 1 : t} = {x_{u, τ} |u \in N (v), τ \in [t - L + 1, t]\},

the forecasting task estimates passenger load one minute ahead:

{\hat{y}}_{v, t + 1} = f_{θ} (X_{N (v), t - L + 1 : t}, A_{t - L + 1 : t}), \forall v \in V,

(2)

where

f_{θ}

denotes a learnable spatio-temporal operator; optimisation uses the pinball loss

L_{q} (θ) = \frac{1}{| V |} \sum_{v} ρ_{q} (y_{v, t + 1} - {\hat{y}}_{v, t + 1}^{q}), ρ_{q} (e) = max (q e, (q - 1) e) .

(3)

3.4. FedST-GNN Architecture

Figure 1 depicts the full network; salient layers are formalised below.

Spatial Encoder (GATv2)

The attention coefficient from node u to v at time t is

α_{v u}^{(t)} = \frac{exp (LeakyReLU (a^{⊤} [W x_{v, t} ∥ W x_{u, t}]))}{\sum_{k \in N (v)} exp (LeakyReLU (a^{⊤} [W x_{v, t} ∥ W x_{k, t}]))},

(4)

and the spatial embedding is

h_{v, t} = \sum_{u \in N (v)} α_{v u}^{(t)} W x_{u, t}

.

Frequency-Domain Transformer

Let

H_{v} = [h_{v, t - L + 1}, \dots, h_{v, t}] \in R^{L \times d}

. The complex spectrum

F (H_{v})

undergoes self-attention:

{\tilde{H}}_{v} = F^{- 1} (MHA (F (H_{v}))),

where MHA denotes multi-head dot-product attention; the operation yields

O (L log L)

complexity.

Output Head

A two-layer MLP with ReLU activation maps

{\tilde{H}}_{v}

to the conditional median (

q = 0.5

) and 10/90% quantiles.

3.5. Federated Optimisation Protocol

Algorithm 1 details the training workflow. Each client k holds local data

D^{(k)}

and model

θ^{(k)}

. Gradients

\nabla_{θ} L

are sparsified via Top-K masking

S (\cdot)

so that

∥ S (\nabla_{θ}) ∥_{0} = ⌊ K \cdot \dim (θ) ⌋, K = 0.05 .

Encrypted secure aggregation produces the global update

θ_{r + 1} = \sum_{k = 1}^{K_{c}} \frac{n_{k}}{\sum_{j} n_{j}} (θ_{r}^{(k)} - η S [\nabla_{θ} L^{(k)}]),

(5)

where

n_{k}

denotes sample count and

η

the learning rate.

Listing 1. Drift-Aware Federated Training of FedST-GNN

Input: local datasets ${D^{(k)}}$ , rounds R
For $r = 1$ to R do
- Server selects a subset of clients
- For each client k in parallel:
  
  2.1.
  
  Compute local gradient $\nabla_{θ} L^{(k)}$
  
  2.2.
  
  Apply Top- K sparsification: $S [\nabla_{θ} L^{(k)}]$
  
  2.3.
  
  If ADWIN detects drift:
  
  2.3.
  
  Perform two rapid inner-loop updates via MAML with step size $α$
  
  2.4.
  
  Encrypt and send sparse gradient to server
- Server aggregates: $θ_{r + 1} = \sum_{k} \frac{n_{k}}{\sum_{j} n_{j}} (θ_{r}^{(k)} - η S [\nabla_{θ} L^{(k)}])$
- Broadcast $θ_{r + 1}$ back to all clients
End For

3.6. Computational Complexity

For each mini-batch, the spatial GATv2 layer incurs

O (| E_{t} | d)

; frequency-domain attention requires

O (| V | L log L d)

. On the Copenhagen graph (

| V | = 312

,

| E_{t} | \approx 3.1 k

,

L = 12

,

d = 64

) the total forward pass costs

\approx 1.9 \times 10^{5}

FLOPs, translating to median inference latency of 38 ms on an NVIDIA GTX 1660 SUPER. Communication per round is bounded by

5 MB = K \cdot \dim (θ) \cdot 32 bits

.

3.7. Baselines and Evaluation Metrics

Table 3 lists all comparative models with hyper-parameters tuned via Bayesian optimisation on the April 2024 validation set.

Performance is assessed via MAE, RMSE, symmetric MAPE (sMAPE), continuous ranked probability score (CRPS), and the DAR-(1) Diebold–Mariano statistic; p-values are Bonferroni-corrected.

3.8. Implementation Details

Training executes on Ubuntu 22.04, Python 3.11, PyTorch 2.2, and Flower 1.7. GPU acceleration employs CUDA 12.3 with cuDNN 8.9. Table 4 enumerates key FedST-GNN hyper-parameters.

4. Results

This section evaluates FedST-GNN with respect to predictive accuracy, probabilistic calibration, concept-drift resilience, architectural contribution, and run-time efficiency. Unless specified otherwise, results refer to the October–December 2024 test split, comprising

n = 131 040

one-minute horizons.

4.1. Point and Probabilistic Accuracy

Table 5 reports aggregate error metrics. FedST-GNN attains the smallest error in every column, surpassing the strongest baseline (TFT) by

5.4 %

in MAE and

7.1 %

in RMSE. Bootstrapped

95 %

confidence intervals (1 000 resamples) are shown in parentheses.

Diebold–Mariano (DM) tests quantify the statistical significance of the gains. For two competing forecasts

e_{t}^{(1)}

and

e_{t}^{(2)}

, the DM statistic

DM = \frac{\bar{d}}{\sqrt{\hat{Var} (\bar{d})}}, d_{t} = (| e_{t}^{(1)} | - | e_{t}^{(2)} |),

follows an asymptotic

t_{n - 1}

distribution [21]. Table 6 shows that FedST-GNN is significantly better (

p < 0.05

) than all baselines.

4.2. Horizon-Specific Performance

Figure 2 depicts error as a function of forecast horizon

h \in [1, 30]

minutes. FedST-GNN maintains a linear growth rate (

\partial MAE / \partial h \approx 0.037

), whereas TFT exhibits super-linear escalation after

h = 16

min due to limited receptive field.

4.3. Calibration and Sharpness

Reliable probabilistic forecasts satisfy

P {y_{v, t} \leq {\hat{y}}_{v, t}^{q}} \approx q

. Calibration curves in Figure 3 confirm that FedST-GNN’s predictive interval is well calibrated, with an integrated Brier score of

3.2 \times 10^{- 3}

—

\approx 27 %

below the TFT baseline.

4.4. Concept-Drift Resilience

A city-wide half-marathon (17 Nov 2024, 07:00–13:00) perturbed normal demand. Figure 4 displays minute-level MAE. The ADWIN detector (vertical dashed line) signalled drift 7 min after the first anomaly; two inner MAML updates reduced MAE by

41 %

relative to the non-adaptive variant.

Let

E_{pre}, E_{post}

denote cumulative absolute errors before and after adaptation; the relative reduction is

Δ_{drift} = 1 - \frac{E_{post}}{E_{pre}} = 0.41 .

A paired t-test over 360 post-alarm horizons yields

t = - 11.8

,

p < 10^{- 10}

.

4.5. Ablation and Parameter Efficiency

Table 7 dissects contributions of major architectural blocks. Eliminating frequency-domain attention (No-Freq) increases MAE by 4.9%, while removing the graph adjacency (No-Graph) raises MAE by 7.2%. The full model achieves the best accuracy with a parameter count comparable to TFT.

4.6. Latency, Throughput, and Resource Footprint

Run-time profiling employed Locust with Poisson arrivals. Throughput

λ

saturates at 1 600 req

s^{- 1}

before the median latency

L_{50}

breaches the 40 ms threshold (Figure 5). Peak GPU memory consumption (mixed precision) is 3.9 GiB—17% below TFT.

Table 8 summarises the resource envelope at

λ = 1 600

req/s.

4.7. Statistical Robustness

Bootstrapped confidence intervals for FedST-GNN are

MAE = 2.43 \pm 0.07

and

RMSE = 3.12 \pm 0.09

. Leave-route-out cross-validation (42 bus routes) yields an inter-quartile MAE range of

[2.36, 2.61]

, indicating limited sensitivity to network topology.

5. Discussion

The empirical study demonstrates that a federated, drift-aware spatio–temporal graph neural network can provide state-of-the-art passenger-flow forecasts in real-time conditions while operating within modest computational and communication budgets. This section interprets the findings in light of relevant literature, emphasises practical ramifications, and delineates residual limitations and future research avenues.

5.1. Interpretation of Core Results

Synergy of Spatial and Spectral Mechanisms

The ablation analysis (Table 7) indicates that both graph connectivity and frequency-domain attention contribute materially to predictive skill; their simultaneous removal increases MAE by

11.8 %

. The result corroborates earlier evidence that graph convolutions outperform sequence-only models when spatial autocorrelation is present [3,13]. Compared with the spectral transformer FedST-GNN embeds periodicity in a computationally lighter block (

\approx 18 %

fewer parameters) while leveraging the topological prior encoded by Eq. (1). The near-linear horizon error growth (Figure 2) suggests that the FFT attention efficiently preserves long-range temporal structure without incurring the quadratic cost of vanilla self-attention.

Effectiveness of Drift Adaptation

During the half-marathon disruption, ADWIN triggered within seven minutes—faster than the 10–15 min delays reported for Page–Hinkley detectors under similar load streams. The two-step MAML update lowered peak MAE by

41 %

, paralleling the 26% error reduction observed for Reptile on electricity-load data [9]. These results indicate that lightweight meta-learning suffices for passenger-flow regimes, obviating costly full retraining cycles.

Federated Parity and Communication Frugality

Accuracy parity with centralised TFT is maintained despite Top-K sparsification and 5 MB round-trip messages. This finding extends prior FL studies that employed sequence models by showing that graph attention and encrypted FedAvg can coexist without measurable degradation. The outcome is particularly pertinent for European operators obliged to respect GDPR data-sovereignty clauses.

Operational Viability

Latency-throughput profiling (Figure 5) satisfies the 125 ms threshold specified by the Copenhagen dispatch centre. Energy consumption of 0.78 J

{req}^{- 1}

under peak load is 24% below TFT, implying lower CO₂ footprint for large-scale deployments. Memory usage (3.9 GiB) fits within edge platforms such as NVIDIA Jetson AGX Orin, broadening the deployment spectrum to on-vehicle inference.

5.2. Comparison with Prior Work

Table 1 situates the present contribution among recent studies. Relative to STGCN and DCRNN, FedST-GNN introduces (i) federated optimisation to meet privacy requirements, and (ii) explicit drift adaptation, yielding an additional 7–9 pp reduction in RMSE. In contrast to the CNN–GRU FL approach the proposed model integrates spatial priors and achieves centralised-level accuracy with 17% lower communication overhead owing to Top-K sparsity. To the authors’ knowledge, this work is the first to report statistically significant (

p < 0.01

) gains from combining ADWIN drift detection with meta-learning inside a federated GNN pipeline for passenger-flow data.

5.3. Practical Implications

Transit agencies can adopt the architecture on extant GPU edge servers without violating privacy legislation. The 80 ms P99 latency supports live dashboarding and automated headway control. Communication rounds every 15 min consume <1% of a typical 100 Mbps backhaul, rendering the protocol compatible with low-cost 4G/5G links. Moreover, the 80 % predictive interval facilitates robust staff scheduling by quantifying forecast uncertainty.

5.4. Limitations

Several constraints warrant acknowledgement.

Single-city evaluation. Although Copenhagen exhibits multimodal complexity, transferability to radial or hub-and-spoke networks remains unverified.
Absence of differential privacy. Secure aggregation precludes gradient inspection but does not provide formal $(ε, δ)$ guarantees.
Homogeneous client models. All operators share identical architecture; accommodating heterogeneous capacities would necessitate split or personalised FL variants.
External feature sparsity. Social-media signals, traffic incidents, and fare policy changes are not yet incorporated; such exogenous covariates could further mitigate drift.

5.5. Future Work

Ongoing research will extend the framework along four axes:

Multi-city generalisation. Deploying the pipeline in Gothenburg and Singapore will test robustness across divergent network topologies and cultural behaviours.
Differentially private gradients. Integrating Rényi DP noise and privacy accounting to bound attacker inference risk.
Heterogeneous and hierarchical FL. Exploring FedBN and FedMSA to accommodate client-specific batch-norm or sub-architectures.
Closed-loop optimisation. Coupling forecasts with reinforcement-learning agents for dynamic vehicle re-routing and headway adaptation.

6. Conclusions

This study developed and rigorously evaluated FedST-GNN, a federated, drift-aware spatio-temporal graph neural network for minute-level passenger-flow forecasting. Leveraging the newly released Copenhagen-Flow dataset, the proposed architecture integrates (i) graph attention for spatial coupling, (ii) frequency-domain transformers for efficient long-range temporal modelling, (iii) encrypted FedAvg with Top-K sparsification to satisfy data-sovereignty constraints, and (iv) an ADWIN-triggered MAML loop that restores accuracy following abrupt regime shifts.

Key outcomes.

FedST-GNN reduced mean absolute error by 5–7% and root-mean-square error by 7% relative to the strongest deep baseline (TFT) while maintaining tight probabilistic calibration.
Drift adaptation lowered peak error during a half-marathon disruption by 41%, achieving recovery within seven minutes of alarm.
Median inference latency remained below 40 ms at 1 600 queries/s on a single GTX 1660 SUPER, and communication overhead was held to 5 MB per 15-minute federation round.
Public release of data, code, and reproducibility scripts establishes an open benchmark for privacy-preserving transit analytics.

Summary. FedST-GNN advances the state of the art in privacy-preserving, drift-resilient passenger-flow forecasting and provides a reproducible foundation for subsequent research in federated graph learning for intelligent transportation systems.

Funding

This research was funded by the Committee of Science of the Ministry of Science and Higher Education of the Republic of Kazakhstan (Grant BR24992852).

Author Contributions

Author Contributions: Conceptualization, A.S. and Z.U.; methodology, A.S.; software, D.B., R.S. and D.Y.; validation, A.S., Z.U. and D.Y.; formal analysis, A.S.; investigation, A.M.; resources, Z.U.; data curation, R.S.; writing—original draft preparation, A.S.; writing—review and editing, Z.U.; visualization, D.B.; supervision, Z.U.; project administration, Z.U.; All authors have read and agreed to the published version of the manuscript.

References

Jin, G.; Liang, Y.; Fang, Y.; Shao, Z.; Huang, J.; Zhang, J.; Zheng, Y. Spatio-Temporal Graph Neural Networks for Predictive Learning in Urban Computing: A Survey. IEEE Trans. Knowl. Data Eng. 2023, Early Access, 1–23. [Google Scholar] [CrossRef]
Gama, J.; Žliobaitė, I.; Bifet, A.; Pechenizkiy, M.; Bouchachia, A. A Survey on Concept Drift Adaptation. ACM Comput. Surv. 2014, 46, 44:1–44:37. [Google Scholar] [CrossRef]
Yu, B.; Yin, H.; Zhu, Z. Spatio-Temporal Graph Convolutional Networks: A Deep Learning Framework for Traffic Forecasting. In Proceedings of AAAI 2018; AAAI Press, 2018; pp. 3634–3641. [Google Scholar]
Guo, S.; Lin, Y.; Feng, N.; Song, C.; Wan, H. Attention Based Spatial–Temporal Graph Convolutional Networks for Traffic Flow Forecasting. In Proceedings of AAAI 2019; AAAI Press, 2019; pp. 922–929. [Google Scholar]
Bai, L.; Yao, L.; Li, C.; Wang, X.; Wang, C. Adaptive Graph Convolutional Recurrent Network for Traffic Forecasting. In Advances in Neural Information Processing Systems 33 (NeurIPS 2020); Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. F., Lin, H., Eds.; Curran Associates: Red Hook, NY, USA, 2020; pp. 2693–2704. [Google Scholar]
Kairouz, P.; McMahan, H. B.; Avent, B.; et al. Advances and Open Problems in Federated Learning. arXiv 2019, arXiv:1912.04977. [Google Scholar]
Li, T.; Sahu, A. K.; Talwalkar, A.; Smith, V. Federated Optimisation in Heterogeneous Networks. Proc. MLSys 2020, 2, 429–450. [Google Scholar]
Bifet, A.; Gavaldà, R. Learning from Time-Changing Data with Adaptive Windowing. In Proceedings of SIAM SDM 2007; SIAM, 2007; pp. 443–448. [Google Scholar]
Nichol, A.; Achiam, J.; Schulman, J. On First-Order Meta-Learning Algorithms. arXiv 2018, arXiv:1803.02999. [Google Scholar]
Li, W.; Sui, L.; Zhou, M.; Dong, H. Short-Term Passenger-Flow Forecast for Urban Rail Transit Based on Multi-Source Data. EURASIP J. Wirel. Commun. Netw. 2021, 9. [Google Scholar] [CrossRef]
Ma, X.; Tao, Z.; Wang, Y.; Yu, H.; Wang, Y. Long Short-Term Memory Neural Network for Traffic Speed Prediction Using Remote Microwave Sensor Data. Transp. Res. Part C 2015, 54, 187–197. [Google Scholar] [CrossRef]
Lim, B.; Arik, S. O.; Pfister, T. Temporal Fusion Transformers for Interpretable Multi-Horizon Time-Series Forecasting. Int. J. Forecast. 2021, 37, 1748–1764. [Google Scholar] [CrossRef]
Li, Y.; Yu, R.; Shahabi, C.; Liu, Y. Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting. In Proceedings of ICLR 2018; OpenReview, 2018. [Google Scholar]
Zhou, H.; Zhang, S.; Peng, J.; et al. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. In Proceedings of AAAI 2021; AAAI Press, 2021; pp. 11106–11115. [Google Scholar]
Wu, Z.; Pan, S.; Long, G.; Jiang, J.; Zhang, C. Graph WaveNet for Deep Spatial–Temporal Graph Modelling. In Proceedings of IJCAI-19; IJCAI, 2019; pp. 1907–1913. [Google Scholar]
Gupta, S.; Torra, V. Differentially Private Traffic Flow Prediction Using Transformers: A Federated Approach. In Computer Security—ESORICS 2023 Workshops; LNCS 14398; Springer, 2024; pp. 260–271. [Google Scholar]
Xavier, B. M.; Martinello, M.; Trois, C.; et al. Fast Learning Enabled by In-Network Drift Detection. In Proceedings of APNet 2024; ACM, 2024; pp. 129–134. [Google Scholar]
Aji, A. F.; Heafield, K. Sparse Communication for Distributed Gradient Descent. arXiv 2018, arXiv:1704.05021. [Google Scholar]
Beutel, D.; Topal, T.; Qian, H.; et al. Flower: A Friendly Federated Learning Research Framework. arXiv 2022, arXiv:2007.14390. [Google Scholar]
Little, J. D. C. A Proof for the Queuing Formula L=λW. Oper. Res. 1961, 9, 383–387. [Google Scholar] [CrossRef]
Diebold, F. X.; Mariano, R. S. Comparing Predictive Accuracy. J. Bus. Econ. Stat. 1995, 13, 253–263. [Google Scholar] [CrossRef]
Lee-Thorp, J.; Ainslie, J.; Eckstein, I.; Ontañón, S. FNet: Mixing Tokens with Fourier Transforms. In Findings of ACL–IJCNLP 2021; ACL, 2021; pp. 2622–2632. [Google Scholar]

Figure 1. FedST-GNN architecture with spatial GATv2 layer, frequency-domain transformer, and quantile regression head.

Figure 2. MAE across forecast horizons. Shaded bands denote 95% bootstrap intervals.

Figure 3. Calibration diagram for selected quantiles. The dashed line indicates perfect reliability.

Figure 4. Temporal MAE during the half-marathon. Shaded area denotes the 80% predictive interval.

Figure 5. Latency–throughput trade-off. Dots represent empirical measurements; the solid curve fits Little’s law

L_{50} \approx \frac{1}{μ - λ}

.

Figure 5. Latency–throughput trade-off. Dots represent empirical measurements; the solid curve fits Little’s law

L_{50} \approx \frac{1}{μ - λ}

.

Table 2. Descriptive statistics of the Copenhagen-Flow dataset.

Attribute	Value	Attribute	Value
Observation period	Jan 2022 – Dec 2024	Minute-level observations	$1, 577, 920$
Stops ( $\| V \|$ )	312	Directed edges ( $\| E \|$ )	$3, 124$
Bus routes	42	Metro lines	3
Mean boardings min⁻¹	$17.4$	Std. boardings min⁻¹	$31.2$
Missing GPS ratio	$< 0.2 %$	Anomalous fare taps	$0.47 %$

Table 3. Baselines and key hyper-parameters.

Model	Tuned hyper-parameters (search range)
ARIMA	$(p, d, q) \in [0, 3] \times [0, 2] \times [0, 3]$ ; seasonal $(P, D, Q)$ on weekly cycle
Prophet	Changepoint prior $(0.01, 0.5)$ ; holidays prior $(0.01, 10)$
XGBoost	Trees $[200, 800]$ ; learning rate $[0.01, 0.3]$ ; max depth $[3, 9]$
LSTM/GRU	Layers ${1, 2}$ ; hidden units ${64, 128}$ ; dropout $[0, 0.5]$
STGCN	Chebyshev order $K {2, 3}$ ; channels ${32, 64}$ ; kernel size ${2, 3}$
DCRNN	Diffusion steps ${1, 2}$ ; units ${32, 64}$ ; scheduled sampling $ϵ \in [0, 0.5]$
TFT	Hidden size ${64, 128}$ ; attention heads ${4, 8}$ ; dropout $[0, 0.3]$

Table 4. FedST-GNN training hyper-parameters.

Parameter	Value	Justification
Window length L	12 min	Captures two bus headways (median 6 min)
Hidden dimension d	64	Empirical elbow in validation loss
Learning rate $η$	$3 \times 10^{- 4}$	Stable AdamW convergence
Batch size B	512 windows	GPU utilisation ≈72%
Dropout	$0.1$	Mitigates over-fitting
Quantiles	${0.1, 0.5, 0.9}$	80% predictive interval
FedAvg round $τ$	15 min	Network backhaul capacity
Top-K sparsity	$5 %$	<5 MB per round on 100 Mbps link

Table 5. Aggregate forecasting accuracy (Oct–Dec 2024). Best results are bold; second best are underlined.

Model	MAE	RMSE	sMAPE (%)	CRPS
ARIMA	3.90 (0.05)	4.88 (0.07)	13.07	2.31
Prophet	3.75 (0.04)	4.63 (0.06)	12.52	2.24
XGBoost	2.95 (0.03)	3.68 (0.04)	10.01	1.72
LSTM	2.88 (0.03)	3.59 (0.04)	9.86	1.69
GRU	2.85 (0.03)	3.55 (0.04)	9.73	1.66
STGCN	2.69 (0.03)	3.39 (0.04)	9.12	1.58
DCRNN	2.61 (0.03)	3.26 (0.04)	8.94	1.54
TFT	2.57 (0.03)	3.24 (0.04)	8.98	1.55
FedST-GNN	2.43 (0.02)	3.12 (0.04)	8.46	1.47

Table 6. DM statistic (

H_{0}

: equal predictive accuracy). Negative values favour FedST-GNN.

Table 6. DM statistic (

H_{0}

: equal predictive accuracy). Negative values favour FedST-GNN.

Comparator	DM	p-value
TFT	$- 2.84$	0.0046
DCRNN	$- 3.17$	0.0016
STGCN	$- 4.12$	$< 10^{- 4}$
XGBoost	$- 6.27$	$< 10^{- 4}$

Table 7. Ablation study on the Copenhagen-Flow test split.

Variant	MAE	$Δ$ MAE	Parameters (M)
Full FedST-GNN	2.43	–	3.2
No-Drift	2.57	+0.14	3.2
No-Freq	2.55	+0.12	3.0
No-Graph	2.60	+0.17	2.8
TFT	2.57	+0.14	3.1

Table 8. Resource footprint during peak load.

Metric	FedST-GNN	TFT	Relative (%)
Median latency (ms)	38	44	–13.6
P99 latency (ms)	82	104	–21.2
GPU util. (%)	71	78	–9.0
GPU memory (GiB)	3.9	4.7	–17.0
Energy (J req⁻¹)	0.78	1.03	–24.3

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Federated Drift-Aware Graph Neural Forecasting for Real-Time Passenger Flow

Abstract

Keywords:

Subject:

1. Introduction

Contributions

Research Questions

1.1. Regulatory motivation and privacy design

2. Literature Review

2.1. Classical and Deep Approaches to Passenger-Flow Forecasting

Statistical foundations.

Sequence Models

Graph Neural Networks

Spectral and Frequency-Domain Innovations

2.2. Federated Learning in Intelligent Transportation

2.3. Concept Drift Detection and Mitigation

Signal-Based Detectors

Model-Based Adaptation

Open Questions Under Federation

2.4. Synthesis and Research Gap

3. Materials and Methods

3.1. Dataset Acquisition and Pre–Processing

Descriptive Statistics

Feature Engineering

3.2. Graph Construction

3.3. Problem Definition

3.4. FedST-GNN Architecture

Spatial Encoder (GATv2)

Frequency-Domain Transformer

Output Head

3.5. Federated Optimisation Protocol

3.6. Computational Complexity

3.7. Baselines and Evaluation Metrics

3.8. Implementation Details

4. Results

4.1. Point and Probabilistic Accuracy

4.2. Horizon-Specific Performance

4.3. Calibration and Sharpness

4.4. Concept-Drift Resilience

4.5. Ablation and Parameter Efficiency

4.6. Latency, Throughput, and Resource Footprint

4.7. Statistical Robustness

5. Discussion

5.1. Interpretation of Core Results

Synergy of Spatial and Spectral Mechanisms

Effectiveness of Drift Adaptation

Federated Parity and Communication Frugality

Operational Viability

5.2. Comparison with Prior Work

5.3. Practical Implications

5.4. Limitations

5.5. Future Work

6. Conclusions

Funding

Author Contributions

References

MDPI Initiatives

Important Links

Subscribe