Preprint
Article

This version is not peer-reviewed.

GeoAI-Based Few-Shot Transfer Learning for Cross-City Pedestrian Level-of-Service Mapping Using Spatio-Temporal Graph Models

Submitted:

15 April 2026

Posted:

17 April 2026

You are already at the latest version

Abstract
Urban planners need continuous, scalable methods to evaluate pedestrian Level of Service (LOS). Static and locally calibrated approaches fail to capture the dynamic, network-wide, and context-dependent nature of pedestrian activity. While traditional LOS uses fixed density thresholds and data-driven models predict continuous flows, neither supports cross-city analysis due to context-specific assumptions. This study introduces a transferable analytical framework for predicting pedestrian LOS using large scale urban sensor data that captures both recurrent temporal demand patterns and spatial dependencies within street networks. The framework is evaluated using pedestrian sensor data from three cities Melbourne, Dublin, and Zurich, which represent diverse geometries, demand profiles, and sensing infrastructures. Results show strong in-domain Melbourne performance (accuracy 79.7%; Acc±1 99.1%) and effective cross-city generalization. Few-shot fine-tuning with only 5% labeled target-city data recovers 95–99% of in-domain performance, demonstrating practical scalability. KernelSHAP explainability reveals short-term temporal lag features universally dominate predictions, while spatial/contextual factors exhibit city-specific influence tied to local morphology. These findings demonstrate transferable GeoAI methods can support real-time pedestrian congestion monitoring and evidence-based public-space management, offering planners a scalable decision-support tool to enhance walkability, safety, and equitable access to high-quality public spaces in contemporary cities.
Keywords: 
;  ;  ;  ;  ;  ;  ;  

1. Introduction

Public spaces and street environments are fundamental components of contemporary urban life, shaping how people move, interact, and experience cities. Pedestrian mobility plays a central role in sustainable urban systems by influencing accessibility, safety, environmental exposure, and public health outcomes. As cities increasingly promote walkability and active transportation, understanding how pedestrian conditions vary across space and time is essential for planning and managing high-quality public spaces.
Recent advances in Geospatial Artificial Intelligence (GeoAI) and spatiotemporal modelling have enabled data-driven analysis of urban mobility using high-resolution sensors and geospatial data. Graph neural networks and Transformer-based architectures have demonstrated strong performance in capturing spatial dependencies and temporal dynamics in urban systems [1,2,3]. These approaches have been widely applied to traffic forecasting and pedestrian trajectory prediction, including hybrid architecture designed to address sparse sensing and heterogeneous urban environments [4,5,6]. However, while continuous flow prediction provides detailed demand estimates, it does not align with ordinal service categories used in planning practice, limiting interpretability and cross-city comparability.
Pedestrian Level-of-Service (LOS) indicators are widely used to evaluate crowding, comfort, and operational performance in street environments. These metrics provide a compact representation of pedestrian experience and are particularly important for vulnerable users such as people with disabilities [7]. A wide range of studies have explored LOS assessment methods by incorporating environmental, behavioral, and perceptual factors [8,9,10]. More recent work has examined LOS under specific urban conditions, such as social distancing and varying trip purposes, further highlighting the context-dependent nature of pedestrian service quality [11,12]. Collectively, these studies establish LOS as a central metric for evaluating walkability and public space performance in urban planning.
However, conventional LOS approaches are typically based on fixed thresholds or context-specific formulations, which limit their transferability across cities with different urban structures and activity patterns. Variations in environmental density, land-use configuration, and pedestrian behavior can significantly influence LOS perception and measurement [13]. As a result, LOS models calibrated in one context often fail to generalize to others, reducing their effectiveness for comparative evaluation and cross-city planning.
Transfer learning has emerged as a promising direction for addressing data scarcity and improving model scalability across cities. Early work demonstrated the feasibility of cross-city knowledge transfer for urban flow prediction [14], while more recent studies have explored spatiotemporal adaptation, domain alignment, and few-shot learning strategies [15,16,17]. However, these approaches primarily focus on continuous flow estimation and do not explicitly account for the ordinal structure of planning-relevant indicators such as pedestrian LOS. Moreover, the “geospatial generalization problem” remains a key challenge, as models often degrade when applied to cities with different spatial and behavioral characteristics [18].
Explainable AI (XAI) supports planning applications through methods like SHAP [19]. but explanation stability across urban contexts remains uncertain [20,21]. Preliminary analyses suggest temporal lag features dominate LOS prediction, but their transferability with spatial factors remains unclear. Taken together, four key gaps persist: (1) ordinal LOS beyond continuous flows, (2) cross-city transfer preserving LOS semantics, (3) stable domain explainability, and (4) lag feature dominance versus spatial transferability.
This study addresses these gaps through four contributions across three heterogeneous cities (Melbourne, Dublin, Zurich) with distinct urban morphologies, demand profiles, and sensing infrastructures. First, we introduce a city-adaptive ordinal LOS formulation using percentile thresholds for relative congestion mapping. Second, we develop a spatiotemporal graph transformer integrating temporal dynamics, spatial dependencies, and ordinal classification for LOS prediction. Third, we evaluate cross-city transfer under zero-shot, few-shot, and domain-adaptive settings using multi-city sensor data. Fourth, we apply KernelSHAP to assess predictive driver stability across cities and transfer scenarios, with particular emphasis on lag feature dominance. Recognizing direct zero-shot deployment unreliability across heterogeneous contexts, this work prioritizes few-shot transferability for practical urban planning applications. Beyond predictive accuracy, we further examine whether explanatory structures themselves are transferable across cities, a question that remains largely unexplored in GeoAI-based urban mobility modeling.

2. Materials and Methods

2.1. Study Framework

This study’s methodology consists of five components: (i) assembling and harmonizing multi-city pedestrian sensor datasets, (ii) defining a city-adaptive ordinal LOS measure, (iii) constructing a unified feature space and spatiotemporal sequences, (iv) developing a spatiotemporal modeling Graph Transformer for LOS prediction, and (v) evaluating model performance through both in-domain and cross-city transfer experiments.
Figure 1. Spatial distribution of pedestrian sensors in three study cities: (a) Melbourne, (b) Dublin, and (c) Zurich. Points indicate sensor locations overlaid on the urban street network, while black outlines represent administrative boundaries. Differences in spatial coverage reflect variations in sensor density and monitoring infrastructure across cities. Basemap data are sourced from OpenStreetMap contributors.
Figure 1. Spatial distribution of pedestrian sensors in three study cities: (a) Melbourne, (b) Dublin, and (c) Zurich. Points indicate sensor locations overlaid on the urban street network, while black outlines represent administrative boundaries. Differences in spatial coverage reflect variations in sensor density and monitoring infrastructure across cities. Basemap data are sourced from OpenStreetMap contributors.
Preprints 208567 g001

2.2. Data Sources and Preprocessing

Pedestrian count data were obtained from fixed sensor networks in Melbourne (100 sensors), Dublin (99 sensors), and Zurich (15 sensors). The analysis focuses on hourly observations for the year 2023, selected as the first full year without disruptions related to the COVID-19 pandemic.
All datasets were harmonized through a standardized preprocessing workflow. Sensor observations were aligned to a common hourly timestamp and missing or erroneous values were removed through data cleaning procedures. Pedestrian counts were log-transformed to reduce skewness and stabilize variance. Additional contextual variables, including temporal indicators and environmental attributes, were standardized across cities to ensure comparability. The resulting dataset provides a consistent and structured foundation for spatiotemporal modelling of pedestrian activity across heterogeneous urban environments.

2.3. Pedestrian Level-of-Service Formulation

To support cross-city comparability, pedestrian LOS is formulated as a six-level ordinal variable representing increasing levels of pedestrian congestion. Rather than relying on fixed density thresholds, LOS categories are derived using city-specific percentiles of log-transformed hourly pedestrian counts. Percentile boundaries between the 20th and 90th percentiles are used to generate approximately balanced class distributions while reducing the influence of extreme values [11,13].
This approach preserves the ordinal progression from free-flow conditions to severe crowding while maintaining sensitivity to local variations in pedestrian demand. Consequently, the percentile-based formulation enables consistent LOS mapping across cities with heterogeneous urban morphologies and activity patterns, while avoiding the limitations associated with universal density-based thresholds.
The resulting six LOS categories represent progressively increasing pedestrian densities, ranging from free-flow conditions (LOS A) to severe congestion (LOS F).
Let c denote the raw hourly pedestrian count at a sensor. A logarithmic transformation is applied:
y = l o g ( c + 1 )
This transformation mitigates the heavy-tailed distributions typical of pedestrian flow data and stabilizes the percentile-based threshold estimation.
Let Q k denote the empirical k -th quantile computed on the training subset only. City-specific thresholds are defined as:
τ k = Q k y train , k 0.2,0.4,0.6,0.8,0.9
The ordinal LOS classes are then assigned as:
LOS y = 0 , y τ 0.2 1 , τ 0.2 < y τ 0.4 2 , τ 0.4 < y τ 0.6 3 , τ 0.6 < y τ 0.8 4 , τ 0.8 < y τ 0.9 5 , y > τ 0.9
This formulation yields approximately balanced class distributions, with roughly 20% of observations in each of the lower five classes and approximately 10% in the highest congestion class. Such a distribution facilitates stable ordinal learning and enables cross-city transfer by preserving relative service conditions rather than relying on absolute density thresholds.
The qualitative interpretation of LOS categories follows commonly used pedestrian service-level frameworks in transportation research, where lower service levels correspond to higher pedestrian congestion and reduced walking comfort [22,23]. However, while the percentile-based formulation improves transferability by representing relative congestion states within each city, it does not imply that identical LOS classes correspond to equivalent absolute density or comfort conditions across different urban contexts. Accordingly, the resulting LOS should be interpreted as a city-adaptive, relative indicator, rather than a direct substitute for fixed regulatory thresholds.
Table 1. City Adaptive LOS Thresholds for Melbourne.
Table 1. City Adaptive LOS Thresholds for Melbourne.
LOS Threshold (ŷ) Ped/hr Percentile Interpretation
0 (A) ≤ 2.9957 ≤ ~35 0–20% Very low pedestrian activity. Free-flow, high comfort.
1 (B) 2.9957–4.2627 ~35–120 20–40% Low activity. Comfortable walking, occasional interactions.
2 (C) 4.2627–5.1059 ~120–300 40–60% Moderate activity. slight speed reductions.
3 (D) 5.1059–5.7930 ~300–600 60–80% High activity. Frequent interactions, operational stress.
4 (E) 5.7930–6.5737 ~600–1,200 80–90% Very high activity. Congested, constrained movement.
5 (F+) > 6.5737 > ~1,200 90–100% Extreme crowding. Saturation, safety and comfort risks.

2.4. Feature Engineering and Sequence Construction

Feature engineering integrates temporal, contextual, and spatial attributes to capture multi-scale dynamics of pedestrian activity. Temporal encodings, including hour of day, day of week, and seasonal indicators, capture recurrent urban rhythms, while lagged features at 1, 24, and 168 hours represent short-term memory, diurnal cycles, and weekly patterns. Spatial and environmental features describe local urban context, including network structure and surrounding land-use characteristics. All features ( F = 28 ) are harmonized into a unified representation ensuring identical feature dimensions, scales, and encoding across Melbourne, Dublin, and Zurich cities. The complete list of features, including definitions, units, preprocessing steps, and city availability, is provided in Appendix A. All normalization and transformation parameters were fitted on the training data only and applied consistently to validation and test sets.
Input tensors are constructed using rolling temporal windows of length T = 168 hours (one week), generating input–target pairs of the form
X i R B × T × N × F
where B denotes the batch size, T the temporal window length, N the number of sensor locations, and F the number of features. The prediction targets are defined as ordinal LOS classes:
y { 0,1 , , 5 } N
Chronological dataset splits (80/10/10 for training, validation, and testing) are applied independently to each city to prevent temporal leakage and ensure realistic evaluation conditions. This standardized tensor construction enables consistent cross-city comparison and supports stable transfer of learned spatiotemporal representations.
To further contextualize cross-city transfer, we derive a set of city-level morphological and structural indicators aligned with urban movement theory. City-level structural indicators provide interpretive context (not model inputs) for transfer performance differences: (i) Center-periphery gradient: Mean Euclidean distance from each sensor to city administrative center + Gini coefficient of distance distribution (OSMnx-derived coordinates). (ii) Network hierarchy: Mean betweenness centrality of 100m street network buffers around sensors (OSMnx, k=5 nearest road segments). (iii) Temporal regularity: Lag-24 autocorrelation coefficient of hourly pedestrian counts (aggregated across all sensors). Together, these indicators provide a structural context for interpreting differences in pedestrian dynamics and model transferability across cities.

2.5. Model Architecture and Training Protocol

Building on the feature engineering and sequence construction described in Section 3.4, this section presents the proposed ST-Graph Transformer for predicting pedestrian LOS. The model is designed to capture both temporal dynamics in pedestrian activity and spatial dependencies across sensor locations, while explicitly preserving the ordinal structure of LOS categories. As illustrated in Figure 2, the framework integrates three main components: (i) a temporal Transformer encoder, (ii) a spatial attention module operating across sensor nodes, and (iii) an ordinal classification head for LOS prediction. In addition, a domain-adversarial adaptation mechanism is incorporated to support cross-city transfer.
The model operates on rolling temporal windows of the form:
X t R N × T × F
Each sensor-specific sequence is first projected into a latent feature space of dimension d m o d e l = 128 and processed using a temporal Transformer encoder with L = 2 layers and H = 4 attention heads. Sinusoidal positional encodings are used to preserve temporal ordering within the input sequence. This stage produces a latent embedding for each sensor:
Z t i R d model
where Z t i summarizes the recent temporal dynamics associated with sensor i , including short-term fluctuations, diurnal cycles, and weekly recurring patterns.
ST-Graph Transformer architecture used during training and inference. Input time-series tensors are processed by a temporal transformer, followed by spatial attention across sensor nodes. The ordinal head produces cumulative LOS probabilities. The DANN module is active during training only.
These temporally encoded sensor representations are then passed to a spatial Transformer encoder that applies self-attention across sensor locations. This enables the model to capture spatial interactions and correlated pedestrian behavior between nearby or structurally related parts of the urban network. To further constrain spatial message passing, a city-specific adjacency mask derived from a k -nearest-neighbor graph ( k = 5 ) is optionally incorporated. The resulting latent representations are then passed to the final prediction layer.
To explicitly account for the ordered structure of pedestrian LOS categories, the final output layer adopts a cumulative ordinal classification formulation rather than a standard multi-class softmax. Let h i R d denote the final latent representation for sensor i after temporal and spatial encoding. This representation is projected into a scalar latent score z i , which reflects the model’s estimated congestion intensity. For each ordinal threshold θ k , the model estimates the cumulative probability that the target belongs to LOS class k or higher as:
P y i k = σ z i - θ k , k = 1 , , K - 1
where σ ( ) denotes the sigmoid activation function, K is the total number of ordinal LOS classes, and θ k are learned cut-points separating adjacent service levels. This formulation preserves the ordinal structure of the prediction task by modeling progressively increasing congestion states along a latent continuum, rather than treating LOS categories as independent nominal classes.
Final class probabilities are recovered from the cumulative outputs by differencing adjacent thresholds. Specifically, the probability of belonging to the lowest LOS class is defined as P ( y i = 0 ) = 1 - P ( y i 1 ) , intermediate classes are computed as P ( y i = k ) = P ( y i k ) - P ( y i k + 1 ) , and the highest LOS class is given by P ( y i = K - 1 ) = P ( y i K - 1 ) . This cumulative ordinal design is particularly appropriate for pedestrian LOS estimation, where misclassifications between adjacent service levels are less severe than errors across distant classes.
Beyond the core prediction architecture, an auxiliary domain-adversarial neural network (DANN) mechanism is incorporated to improve cross-city generalization. This adaptation strategy encourages the model to learn more domain-invariant latent representations, thereby supporting transfer from the source city (Melbourne) to target cities (Dublin and Zurich). Importantly, DANN is used as an auxiliary training mechanism and does not alter the main prediction architecture.
To assess the contribution of different architectural components, three ablation baselines are also evaluated. A tabular XGBoost baseline is trained on window-based summary statistics. A temporal-only Transformer removes the spatial attention stage, while a spatial-only Transformer operates on the most recent time step without temporal encoding. These comparisons help isolate the value of temporal memory, spatial interaction modeling, and the full spatiotemporal design.
Training follows chronological splits (80/10/10 for training, validation, and testing) to prevent temporal leakage and ensure realistic forecasting conditions. Models are optimized using AdamW with a learning rate of 10 - 4 and a batch size of 32. The ordinal prediction head is trained using a hybrid loss that combines a CORN-based ordinal classification objective with an Earth Mover’s Distance (EMD) regularization term:
L = L CORN + 0.1 L EMD
The loss formulation encourages both correct ordinal ranking and smooth probabilistic transitions between neighboring LOS categories. Early stopping is applied based on validation MAE with a patience of 10 epochs, and training is conducted for up to 100 epochs. All experiments were implemented in PyTorch 2.1 and executed on an NVIDIA RTX A2000 GPU.

2.6. Experimental Design and Model Evaluation

2.6.1. Experimental Design

The proposed ST-Graph Transformer is evaluated under both in-domain and cross-city settings. Cross-city experiments simulate varying levels of data availability, including zero-shot transfer (no target data), few-shot learning (limited labeled data), and extensive fine-tuning using up to 80% of target city data. In addition, semi-supervised domain adaptation is explored using a DANN, where 5% of the target data are labeled and 95% are unlabeled.

2.6.2. Evaluation Metrics

Model performance is evaluated using standard metrics, including mean absolute error (MAE), classification accuracy (Acc), and accuracy within one adjacent class (Acc±1), which accounts for minor ordinal deviations. In addition, transfer efficiency is used to quantify the effectiveness of cross-city knowledge transfer by comparing performance between transferred and in-domain models. Confusion matrix analysis shows that most misclassifications occur between neighboring LOS levels, consistent with the ordinal structure of the task.

2.7. Explainability Analysis

To complement quantitative performance evaluation, an explainability analysis is conducted to examine how the ST-Graph Transformer generates pedestrian LOS predictions and how these mechanisms behave under cross-city transfer. Model interpretability is particularly important in urban analytics, where transparent predictive models support trust, diagnostic insight, and the policy relevance of data-driven tools used in mobility management and public-space operations.
The analysis aims to investigate three key aspects of model behavior:
  • Relative importance of input features influencing ordinal LOS classification;
  • Interaction between temporal, spatial, and contextual signals within the model; and
  • Stability of these explanatory patterns when models trained in Melbourne are transferred to Dublin and Zurich.
To quantify feature contributions, we employ KernelSHAP, a model-agnostic Shapley value approximation method that estimates the marginal contribution of each input feature to the model’s predictions. KernelSHAP enables both global explanations, which describe the overall importance of features across the dataset, and local explanations, which reveal how individual predictions are formed [21].
Given the spatiotemporal structure of the ST-Graph Transformer and the heterogeneous nature of pedestrian environments, explainability serves two purposes. First, it provides a mechanism to verify that the model relies on meaningful temporal and spatial patterns rather than spurious correlations. Second, it enables the evaluation of whether transfer learning alters the model’s internal decision logic when applied to different urban contexts.

3. Experiments and Results

Results are reported as mean values across multiple runs to ensure robustness of the performance trends observed.

3.1. In-Domain Performance

The proposed model achieves the strongest performance across all evaluation metrics in the Melbourne dataset, outperforming both traditional machine learning and simplified deep learning baselines.
Table 2 summarizes the ablation results, showing that the full ST-Graph Transformer outperforms all baselines across all evaluation metrics. It achieves the lowest validation and test MAE (0.1932 and 0.2150) and the highest validation and test accuracy (0.8140 and 0.7969) indicating strong predictive performance within the source domain and robust spatial–temporal modeling capabilities. It also reduces test MAE by 0.1130 relative to XGBoost and attains the highest Acc±1 (0.9909), meaning that nearly all predictions fall within one LOS level of the ground truth. The temporal-only Transformer surpasses the XGBoost baseline in both test MAE and Acc±1, indicating that explicitly modeling diurnal and weekly temporal patterns provides richer predictive signals than relying on hand-crafted window statistics. In contrast, the spatial-only Transformer underperforms all other variants, suggesting that spatial correlations alone are insufficient for recovering fine-grained temporal variations in pedestrian activity.

3.2. Ablation Without Lag Features

To assess reliance on autoregressive signals, all lag-based features were removed. In Melbourne, performance decreases moderately, with accuracy dropping from 0.7969 to 0.7028 and MAE increasing from 0.2150 to 0.3356. Despite this, Acc±1 remains high (0.9704), indicating that the model continues to capture meaningful patterns without historical counts. This result demonstrates that the framework is not solely dependent on lag features, but also leverages contextual, morphological, and temporal representations.

3.3. Cross-City Transfer

Cross-city transfer experiments evaluate the generalization capability of the ST-Graph Transformer when deployed in urban environments with differing pedestrian demand distributions, sensor densities, and dataset sizes. Transfer efficiency is computed using the definitions introduced in Section 2.6. Results show that fine-tuning with as little as 5% labeled target data is sufficient to recover near in-domain performance.
Table 3. Transfer Learning Results—Melbourne→Dublin/Zurich (In-domain, Zero-shot, Fine-tune 5/80% and DANN 5%).
Table 3. Transfer Learning Results—Melbourne→Dublin/Zurich (In-domain, Zero-shot, Fine-tune 5/80% and DANN 5%).
Setting City MAE ↓ Accuracy ↑ Acc±1 ↑ TM MAE TM Acs
In-domain Melbourne 0.2208 0.7908 0.9911 - -
In-domain Dublin 0.2576 0.7675 0.9828 1.00 1.00
E1 Zero-shot Dublin 0.8059 0.3880 0.8447 0.32 0.51
E2 Fine-tune 5% Dublin 0.2948 0.7354 0.9806 0.8 0.96
E3 Fine-tune 80% Dublin 0.2383 0.7813 0.9865 1.08 1.02
E4 DANN 5% Dublin 0.343 0.819 0.9174 0.75 1.07
In-domain Zurich 0.0852 0.9441 0.9903 1.00 1.00
E1 Zero-shot Zurich 2.7729 0.0128 0.0201 0.03 0.01
E2 Fine-tune 5% Zurich 0.0985 0.9368 0.9883 0.87 0.99
E3 Fine-tune 80% Zurich 0.0929 0.9378 0.9900 0.92 0.99
E4 DANN 5% Zurich 0.9549 0.4237 0.8322 0.09 0.45
For Dublin, zero-shot transfer leads to substantial degradation (Acc: 0.7675 → 0.3880), reflecting temporal and distributional mismatch. However, 5% fine-tuning restores 95.8% of in-domain performance, indicating effective adaptation under limited supervision in dense urban environments.
In contrast, Zurich exhibits extreme zero-shot failure due to sparse sensor coverage but achieves near-complete recovery (99.2%) with minimal fine-tuning. This highlights the sensitivity of transfer performance to sensor density and urban structure. This result highlights the sensitivity of transfer performance to sensor density and network structure, particularly under sparse monitoring conditions
DANN-based adaptation improves performance in Dublin but degrades it in Zurich, suggesting that adversarial domain adaptation is unstable under sparse sensing conditions. This indicates a practical limitation of such methods in low-data urban settings.
Overall, transferability follows a morphology–density gradient, with performance improving as structural similarity between cities increases.
Figure 3. shows row-normalized confusion matrices for Melbourne (in-domain) and Dublin (5% fine-tuning).
Figure 3. shows row-normalized confusion matrices for Melbourne (in-domain) and Dublin (5% fine-tuning).
Preprints 208567 g003
Both panels show diagonal dominance with adjacent-only spillover (e.g., B↔C, D↔E), preserving the ordinal staircase and confirming physically plausible LOS transitions under cross-city transfer. Across all cities, the confusion matrices displayed a consistent “urban ordinal staircase”: diagonal dominance plus narrow, adjacent off-diagonals. This structure confirms that the ST-Graph Transformer produces physically plausible LOS transitions, respects the ordered nature of congestion levels, and retains ordinal coherence even under substantial cross-city transfers, supporting its use for operational urban planning and real-time mobility management.

3.4. Comparative Summary of Experiments

Table 4 summarizes the results across all experimental settings, including the full model, no-lag ablation, and cross-city transfer scenarios.
The results demonstrate a clear performance hierarchy, where the inclusion of lag features consistently yields the highest accuracy. However, the no-lag experiments provide critical insight into the intrinsic capabilities of the model. In particular, the relatively strong performance observed in Melbourne and Zurich without lag features indicates that the framework captures transferable spatial and contextual representations. Conversely, the reduced performance in Dublin highlights the influence of inter-city heterogeneity on transferability.

3.5. Explainability Analysis

We employ KernelSHAP to quantify feature contributions to the ST-Graph Transformer’s LOS predictions. Explanations are computed for more than 17,000 Melbourne test samples using a reduced 10-sensor neighborhood representation to ensure computational tractability while retaining meaningful local spatial context. Feature attribution patterns are illustrated in Figure 4 and Figure 5, which show global feature importance rankings and local explanation patterns, respectively. The analysis confirms the dominance of temporal autoregressive signals and reveals systematic changes in explanatory structure under cross-city transfer, as summarized in Table 5.
Figure 4 shows the global SHAP, Top 10 feature importance rankings for Melbourne (in-domain) and Dublin after fine-tuning. Temporal features (Lagged Demand) dominate in both cities, with lag1 (Short-Term), lag24 (Diurnal), and lag168 (Weekly) consistently contributing most to LOS prediction. Melbourne additionally shows stronger spatial influence from distance-to-center and bearing-to-center, while Dublin’s predictors reflect greater sensitivity to contextual and seasonal variation.
Figure 5 presents SHAP beeswarm plots illustrating the global feature importance for pedestrian LOS prediction in (a) Melbourne (in-domain) and (b) Dublin after 5% fine-tuning. Each point represents the contribution of a feature to an individual prediction, with the horizontal axis indicating the magnitude and direction of impact on the model output. Colors represent the feature value (low to high), allowing simultaneous interpretation of feature influence and directionality. Across both cities, short-term temporal features particularly lag1, lag24, and lag168 consistently exhibit the largest contribution to model predictions, indicating that recent and cyclical pedestrian demand patterns are the primary drivers of LOS.
In Melbourne, spatial features such as distance to city center and bearing to center appear among the top contributors, suggesting that urban structure and centrality influence pedestrian congestion patterns. In contrast, the Dublin model places relatively greater emphasis on temporal and environmental variables (e.g., temperature, seasonal indicators), reflecting differences in local urban context and activity patterns. Overall, the similarity in dominant temporal features across cities, combined with variation in secondary predictors, indicates that while core demand dynamics are transferable, contextual factors remain city specific.
To assess the robustness of explanatory patterns under cross-city transfer, we compute Spearman rank correlations between SHAP feature rankings obtained in Melbourne and those derived after transfer learning. The resulting correlations and feature overlap statistics are summarized in Table 5.
Table 5 indicates that explanation stability varies across cities and transfer settings. For Dublin, zero-shot transfer shows relatively low agreement with Melbourne’s feature ranking structure (ρ ≈ 0.23), suggesting that the model initially relies on a somewhat different combination of predictors. After fine-tuning, rank correlations increase substantially (ρ ≈ 0.35–0.38), indicating that even limited local supervision helps realign the model’s explanatory structure with the source-domain patterns. The Top 10 feature overlap remains relatively high (60–70%), suggesting that many of the most influential predictors remain consistent despite differences in ranking order. For Zurich, rank correlations remain lower overall (ρ ≈ 0.15–0.23), but Top 10 feature overlap remains relatively stable (40–50%) across transfer settings. This pattern suggests that while the precise ordering of feature importance differs from Melbourne, the set of influential predictors remains broadly similar.
The DANN-based semi-supervised adaptation (E4) shows reduced rank correlation for both cities (ρ ≈ 0.13–0.18), while maintaining moderate Top 10 feature overlap. This indicates that adversarial alignment tends to emphasize a smaller set of domain-invariant predictors—primarily temporal lag features—while reducing sensitivity to more context-specific spatial or environmental signals.

4. Discussion

The experimental results demonstrate that pedestrian LOS mapping across heterogeneous urban environments is most effective when supported by limited local adaptation, rather than relying solely on zero-shot transfer. Although zero-shot performance exhibits substantial variation between cities, few-shot fine-tuning reliably recovers near in-domain accuracy levels, underscoring the practical value of minimal target-domain supervision.
These observations are consistent with established findings in urban mobility modeling, where temporal dynamics consistently emerge as the primary drivers of short-term pedestrian flow predictions [1,2,3]. The dominance of temporal lag features observed here persisting even under cross-domain conditions reinforces this pattern and highlights their role as a robust, transferable signal for LOS estimation.

4.1. Transferability of Ordinal Urban Indicators

A key insight from this study concerns the differential transferability of temporal and spatial features in ordinal LOS prediction. Short-term demand patterns, captured through lag features, demonstrate high stability across cities, whereas spatial and contextual attributes are more sensitive to local urban morphology. This distinction explains both the effectiveness of few-shot adaptation and the pronounced degradation observed under zero-shot transfer, particularly in Zurich, where sparse sensor coverage and structural differences in the monitoring network limit direct generalization. These findings align with the well-documented geospatial generalization challenge, in which domain mismatch in spatial configuration and activity patterns reduces model performance [14,18].
Domain-adversarial approaches such as DANN, while conceptually promising, exhibit instability in low-data regimes in this study, suggesting that they should be interpreted as exploratory rather than universally reliable solutions. Overall, the results indicate that transferability is conditional, depending on both data availability and structural similarity between cities.

4.2. Insights from Ablation and Explainability Analyses

Ablation experiments demonstrate predictive capability beyond purely autoregressive dependence. Removal of lag features results in moderate in-domain performance degradation while preserving ordinal coherence (Acc±1 > 0.97). Cross-city transfer remains viable in structurally similar contexts (Zurich: 93.7% accuracy with 5% fine-tuning) but degrades more substantially in heterogeneous environments.
KernelSHAP analysis further supports these findings. Temporal lag features consistently dominate feature importance across all cities, confirming their central role in prediction. At the same time, secondary features exhibit city-specific variation. In Melbourne, spatial indicators such as centrality reflect more structured, monocentric dynamics, whereas Dublin shows greater reliance on contextual variables, consistent with a more heterogeneous urban environment.
Spearman rank correlations in feature importance (ρ ≈ 0.35–0.38 after fine-tuning) indicate partial stability of explanatory structures. This suggests that while core predictive mechanisms are transferable, local adaptation remains necessary, supporting cautious interpretation of cross-city explainability. Confusion matrices confirm ordinal plausibility (Figure 3): diagonal dominance with adjacent-only spillover (B↔C, D↔E) validates physically realistic LOS transitions under transfer.

4.3. Implications for Urban Planning and Public Space Management

The framework advances data-driven public space management through three concrete applications: 1. Real-time congestion monitoring: Hourly LOS forecasts across 214 sensors enable municipalities to identify emerging bottlenecks and plan sidewalk capacity proactively. 2. Adaptive signal timing: Predicted LOS-F conditions (extreme crowding >90th percentile) trigger dynamic pedestrian crossing adjustments, enhancing safety in peak-demand corridors. 3. Equity analysis: Persistent high-LOS locations reveal spatial walkability disparities, prioritizing interventions in underserved neighborhoods.
For resource-constrained cities, few-shot adaptation requires only 5% local labeling to achieve operational accuracy, dramatically lowering deployment barriers.
Continuous street-level LOS forecasting enables proactive congestion monitoring, facilitating timely interventions such as adaptive signal timing and capacity enhancements in high-demand corridors. For resource-constrained municipalities, few-shot adaptation lowers barriers to implementation, requiring only sparse local labeling to achieve operational accuracy.
The percentile-based LOS formulation demonstrates transferability advantages over density-based approaches [22,23], enabling 95–99% cross-city performance recovery (Table 3) Unlike existing models requiring full recalibration, few-shot adaptation (5% data) supports scalable equity analysis systematically identifying persistent LOS-E/F hotspots for prioritized interventions in walkability-disadvantaged areas, directly addressing the special issue’s equitable public space management. In contrast, the proposed formulation enables relative comparison across cities while preserving sensitivity to local demand conditions, aligning with recent efforts to incorporate contextual factors into LOS assessment [8,9].
From a planning perspective, the framework also supports equity-oriented analysis. Persistent high-congestion locations can be systematically identified, allowing planners to prioritize interventions in areas experiencing disproportionate pedestrian pressure. In key public-space contexts such as central districts, transit nodes, and event areas, the model provides a basis for adaptive and evidence-based decision-making.

4.4. Limitations and Future Work

This study has some limitations. Analysis across three cities constrains broader generalizability, while heterogeneous sensor densities (particularly Zurich’s sparsity) influence robustness. The relative LOS scale, while transfer-friendly, diverges from absolute density-based benchmarks, necessitating perceptual validation. Temporal lag reliance, though interpretable, signals opportunities for exogenous feature enrichment.
Future work should expand to diverse urban morphologies, integrate complementary modalities (e.g., trajectory or perception data), and calibrate relative LOS against human comfort metrics. Investigations into long-horizon forecasting and real-time integration will further bridge methodological advances with operational public-space governance.

5. Conclusions

This study demonstrates that pedestrian LOS can be reliably mapped and transferred across heterogeneous urban environments using an ordinal, percentile-based formulation and a unified ST-Graph Transformer. Across Melbourne, Dublin, and Zurich, the model establishes three key findings: (i) temporal primacy-short and multi-day lags (lag1/24/168) dominate prediction and remain stable under domain shift; (ii) the value of spatial message-passing, which improves robustness in sparse sensing regimes and accelerates few-shot adaptation; and (iii) production-ready few-shot transfer, where 5% labeled target data rapidly recovers near in-domain performance and 80% fine-tuning exceeds locally trained baselines. Explainability analysis confirms the model leverages physically meaningful predictors, with temporal memory dominating across cities. Transfer success correlates with urban morphology and temporal regularity, positioning transferable GeoAI as operational decision-support for real-time congestion monitoring, adaptive signal timing, and equity-focused public space interventions.

Supplementary Materials

The following supporting information can be downloaded at the website of this paper posted on Preprints.org. Supplementary_Data.zip: Input tensors and feature data (adj_feat_knn6, X.npy, y.npy, features.csv), SHAP values, and model performance outputs, including evaluation metrics and confusion matrices.

Author Contributions

Conceptualization, A.C., J.D., and N.T.; methodology, A.K. and A.C.; software, A.K.; validation, A.K. and A.C.; formal analysis, A.K.; investigation, A.K. and A.C.; data curation, N.T.; writing original draft preparation, A.K.; writing review and editing, A.C. and J.D.; visualization, A.K.; supervision, A.C. and J.D.; project administration, A.C. All authors have read and agreed to the published version of the manuscript.

Funding

Partial support provided by the Ariel University Data Science and Artificial Intelligence Research center.

Appendix A

Group Feature Name Description Type
Lagged Demand (Short-Term) lag1 Pedestrian count lagged by 1 hour Continuous
Lagged Demand (Diurnal) lag_24 Pedestrian count lagged by 24 hours Continuous
Lagged Demand (Weekly) lag_168 Pedestrian count lagged by 168 hours (1 week) Continuous
Temporal Encoding hour_cos Cosine encoding of hour of day (24-hour cycle) Continuous
hour_sin Sine encoding of hour of day Continuous
Time-of-Day Context time_of_day Categorical time-of-day period (e.g., night, peak) Categorical
Traffic Interaction peak_traffic_proxy Proxy indicator for peak traffic conditions Binary /
Continuous
lag1_x_peak Interaction between lagged demand and peak traffic Continuous
Calendar Indicators is_weekend Weekend indicator Binary
day_of_week Day of week (1–7) Integer
holiday_flag Public holiday indicator Binary
month_num Month of year (1–12) Integer
Weather Context temp Air temperature Continuous
precip Precipitation intensity Continuous
wind Wind speed Continuous
Seasonality Season Meteorological season Categorical
Vegetation Context sensor_ndvi_mean Mean NDVI around sensor Continuous
sensor_canopy_pct Tree canopy cover (%) within buffer Continuous
sensor_canopy_valid_frac Fraction of valid canopy pixels Continuous
canopy_is_valid Canopy data validity indicator Binary
ndvi_x_canopy Interaction between NDVI and canopy cover Continuous
Topographic Context topographic_position Relative topographic position index Continuous
terrain_complexity Local terrain variability index Continuous
Network Structure n_edges Number of connected street edges Integer
Network Centrality betweenness Betweenness centrality of sensor node Continuous
closeness Closeness centrality of sensor node Continuous
Urban Geometry & Context dist_to_city_center Distance from sensor to city center Continuous
bearing_to_center Bearing from sensor to city center Continuous

Appendix B: Notation Table

Symbol Description
(c) raw pedestrian count
(y) log-transformed count
(τ_k) quantile thresholds
(X) input tensor
(T) temporal window (168 hours)
(N) number of sensors
(F) number of features
(d_{model}) embedding dimension
(θ_k) ordinal thresholds

References

  1. Jiang, W.; Luo, J.; He, M.; Gu, W. Graph Neural Network for Traffic Forecasting: The Research Progress. ISPRS Int. J. Geoinf. 2023, 12.
  2. Zhou, H.; Ren, D.; Xia, H.; Fan, M.; Yang, X.; Huang, H. AST-GNN: An Attention-Based Spatio-Temporal Graph Neural Network for Interaction-Aware Pedestrian Trajectory Prediction. Neurocomputing 2021, 445, 298–308. [CrossRef]
  3. Li, R.; Qiao, T.; Katsigiannis, S.; Zhu, Z.; Shum, H.P.H. Unified Spatial–Temporal Edge-Enhanced Graph Networks for Pedestrian Trajectory Prediction. IEEE Transactions on Circuits and Systems for Video Technology 2025, 35, 7047–7060. [CrossRef]
  4. Lee, J.; Kang, Y. PGTFT: A Lightweight Graph-Attention Temporal Fusion Transformer for Predicting Pedestrian Congestion in Shadow Areas. ISPRS Int. J. Geoinf. 2025, 14. [CrossRef]
  5. Xie, Y.; Niu, J.; Zhang, Y.; Ren, F. Multisize Patched Spatial-Temporal Transformer Network for Short- and Long-Term Crowd Flow Prediction. IEEE Transactions on Intelligent Transportation Systems 2022, 23, 21548–21568. [CrossRef]
  6. Yuan, Y.; Ding, J.; Han, C.; Sheng, Z.; Jin, D.; Li, Y. UniFlow: A Foundation Model for Unified Urban Spatio-Temporal Flow Prediction. 2025.
  7. Asadi-Shekari, Z.; Moeinaddini, M.; Shah, M.Z. Disabled Pedestrian Level of Service Method for Evaluating and Promoting Inclusive Walking Facilities on Urban Streets. J. Transp. Eng. 2013, 139, 181–192. [CrossRef]
  8. Raad, N.; Burke, M.I. What Are the Most Important Factors for Pedestrian Level-of-Service Estimation? A Systematic Review of the Literature. Transp. Res. Rec. 2018, 2672, 101–117. [CrossRef]
  9. Nag, D.; Goswami, A.K.; Gupta, A.; Sen, J. Assessing Urban Sidewalk Networks Based on Three Constructs: A Synthesis of Pedestrian Level of Service Literature. Transp. Rev. 2020, 40, 204–240. [CrossRef]
  10. Marisamynathan, S.; Vedagiri, P. Pedestrian Perception-Based Level-of-Service Model at Signalized Intersection Crosswalks. Journal of Modern Transportation 2019, 27, 266–281. [CrossRef]
  11. Talavera-Garcia, R.; Pérez-Campaña, R. Applying a Pedestrian Level of Service in the Context of Social Distancing: The Case of the City of Madrid. Int. J. Environ. Res. Public Health 2021, 18. [CrossRef]
  12. Paul, D.; Moridpour, S.; Venkatesan, S.; Withanagamage, N. Evaluating the Pedestrian Level of Service for Varying Trip Purposes Using Machine Learning Algorithms. Sci. Rep. 2024, 14. [CrossRef]
  13. Sharifi, M.S.; Christensen, K.; Chen, A.; Song, Z. Exploring Effects of Environment Density on Heterogeneous Populations’ Level of Service Perceptions. Transp. Res. Part A Policy Pract. 2019, 124, 115–127. [CrossRef]
  14. Wang, L.; Geng, X.; Ma, X.; Liu, F.; Yang, Q. Cross-City Transfer Learning for Deep Spatio-Temporal Prediction; 2019;
  15. Fang, Z.; Wu, D.; Pan, L.; Chen, L.; Gao, Y. When Transfer Learning Meets Cross-City Urban Flow Prediction: Spatio-Temporal Adaptation Matters; 2022;
  16. Zhang, K.; Shang, W.L.; De Vos, J.; Zhang, Y.; Cao, M. Illuminating the Path to More Equitable Access to Urban Parks. Sci. Rep. 2025, 15. [CrossRef]
  17. Lu, B.; Gan, X.; Zhang, W.; Yao, H.; Fu, L.; Wang, X. Spatio-Temporal Graph Few-Shot Learning with Cross-City Knowledge Transfer. In Proceedings of the Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Association for Computing Machinery, August 14 2022; pp. 1162–1172.
  18. Tenzer, M.; Rasheed, Z.; Shafique, K. The Geospatial Generalization Problem: When Mobility Isn’t Mobile. In Proceedings of the GIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems; Association for Computing Machinery, November 13 2023.
  19. Koushik, A.; Manoj, M.; Nezamuddin, N. SHapley Additive ExPlanations for Explaining Artificial Neural Network Based Mode Choice Models. Transportation in Developing Economies 2024, 10. [CrossRef]
  20. Agarwal, C.; Johnson, N.; Pawelczyk, M.; Krishna, S.; Saxena, E.; Zitnik, M.; Lakkaraju, H. Rethinking Stability for Attribution-Based Explanations. 2022.
  21. Alamelh, A. ExplAIn-City: Explainable Spatio-Temporal Deep Learning for Urban Resilience and Risk Governance. IEEE Access 2025, 13, 181989–182004. [CrossRef]
  22. Landis, B.W.; Vattikuti, V.R.; Ottenberg, R.M.; McLeod, D.S.; Guttenplan, M. Modeling the Roadside Walking Environment: Pedestrian Level of Service. Transp. Res. Rec. 2001, 1773, 82–88. [CrossRef]
  23. Dixon, L.B. Bicycle and Pedestrian Level-of-Service Performance Measures and Standards for Congestion Management Systems. Transp. Res. Rec. 1996, 1538, 1–9. [CrossRef]
Figure 2. ST-Graph Transformer with DANN Architecture.
Figure 2. ST-Graph Transformer with DANN Architecture.
Preprints 208567 g002
Figure 4. Top-10 Normalized SHAP Feature Importance for Melbourne (in-domain) and Dublin after 5% fine-tuning (5% FT).
Figure 4. Top-10 Normalized SHAP Feature Importance for Melbourne (in-domain) and Dublin after 5% fine-tuning (5% FT).
Preprints 208567 g004
Figure 5. SHAP Beeswarm Plots (Melbourne in-domain) and Dublin (5%FT).
Figure 5. SHAP Beeswarm Plots (Melbourne in-domain) and Dublin (5%FT).
Preprints 208567 g005
Table 2. In-domain performance in Melbourne.
Table 2. In-domain performance in Melbourne.
Model Val MAE ↓ Val Acc ↑ Val Acc±1 ↑ Test MAE ↓ Test Acc ↑ Test Acc±1 ↑
XGBoost 0.2393 0.7854 0.9839 0.3280 0.7128 0.9718
Temporal-only Transformer 0.2363 0.7251 0.9839 0.2569 0.7627 0.9859
Spatial-only Transformer 0.3165 0.7125 0.9779 0.3498 0.6906 0.9697
ST-Graph Transformer (Full) 0.1932 0.8140 0.9943 0.2150 0.7969 | 0.9909
Table 4. Performance comparison: full model, no-lag ablation, and few-shot transfer settings.
Table 4. Performance comparison: full model, no-lag ablation, and few-shot transfer settings.
Experiment MAE ↓ Accuracy ↑ Acc±1 ↑
Melbourne (Full Model) 0.2150 0.7969 0.9909
Melbourne (No-Lag) 0.3356 0.7028 0.9704
Melbourne → Dublin (Fine-tune 5%) 0.2948 0.7354 0.9806
Melbourne → Dublin (No-Lag, Fine-tune 5%) 1.0151 0.3664 0.7576
Melbourne → Zurich (Fine-tune 5%) 0.0985 0.9368 0.9883
Melbourne → Zurich (No-Lag, Fine-tune 5%) 0.1746 0.8776 0.9584
Table 5. SHAP Rank Correlation—Melbourne vs. transfer models.
Table 5. SHAP Rank Correlation—Melbourne vs. transfer models.
City Model Spearman ρ (vs Melbourne) Top-10 Overlap (%)
Dublin E0 (In-domain) 0.23 70%
Dublin E2 (5% FT) 0.38 70%
Dublin E3 (80% FT) 0.35 60%
Dublin E4 (DANN) 0.13 40%
Zurich E0 (In-domain) 0.15 40%
Zurich E2 (5% FT) 0.23 50%
Zurich E3 (80% FT) 0.21 40%
Zurich E4 (DANN) 0.18 50%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated