Figure 1.
The proposed adaptive cascade clustering architecture. The process begins with data acquisition from the urban transport network, followed by feature extraction within time windows. A weighted voting mechanism then selects the optimal clustering strategy (HDBSCAN or k-means) based on data characteristics, leading to the final identification of traffic patterns.
Figure 1.
The proposed adaptive cascade clustering architecture. The process begins with data acquisition from the urban transport network, followed by feature extraction within time windows. A weighted voting mechanism then selects the optimal clustering strategy (HDBSCAN or k-means) based on data characteristics, leading to the final identification of traffic patterns.
Figure 2.
Logical scheme of the weighted voting process. Input data characteristics, such as noise ratio and density variation, are evaluated to inform the selection between HDBSCAN and k-means. The quality of both models is assessed using internal and external metrics, and the final clustering result is chosen based on a comparative analysis, ensuring the most suitable model is applied.
Figure 2.
Logical scheme of the weighted voting process. Input data characteristics, such as noise ratio and density variation, are evaluated to inform the selection between HDBSCAN and k-means. The quality of both models is assessed using internal and external metrics, and the final clustering result is chosen based on a comparative analysis, ensuring the most suitable model is applied.
Figure 3.
Clustering of aggregated average traffic data using HDBSCAN. The algorithm automatically identified eight distinct clusters, effectively separating different simulated traffic modes and demonstrating a strong alignment with the ground-truth data structure.
Figure 3.
Clustering of aggregated average traffic data using HDBSCAN. The algorithm automatically identified eight distinct clusters, effectively separating different simulated traffic modes and demonstrating a strong alignment with the ground-truth data structure.
Figure 4.
Clustering of aggregated average traffic data using k-means with K=5. This approach produced compact, well-defined spherical clusters, leading to high internal validation scores, but merged some distinct traffic scenarios into single groups.
Figure 4.
Clustering of aggregated average traffic data using k-means with K=5. This approach produced compact, well-defined spherical clusters, leading to high internal validation scores, but merged some distinct traffic scenarios into single groups.
Figure 5.
Clustering of aggregated average traffic data using k-means with K=7. Increasing the cluster count resulted in over-detailing, where minor variations in traffic flow were incorrectly classified as separate patterns, reducing semantic clarity.
Figure 5.
Clustering of aggregated average traffic data using k-means with K=7. Increasing the cluster count resulted in over-detailing, where minor variations in traffic flow were incorrectly classified as separate patterns, reducing semantic clarity.
Figure 6.
Comparison of quality metrics for clustering approaches on aggregated average data. This chart highlights the trade-off between algorithms: HDBSCAN excels in external validation (V-measure, ARI), while k-means (K=5) performs better on internal metrics like the Silhouette Score.
Figure 6.
Comparison of quality metrics for clustering approaches on aggregated average data. This chart highlights the trade-off between algorithms: HDBSCAN excels in external validation (V-measure, ARI), while k-means (K=5) performs better on internal metrics like the Silhouette Score.
Figure 7.
Comparative analysis of clustering quality using V-measure and ARI metrics. The chart displays the performance of HDBSCAN, k-means (K=5), and k-means (K=7), illustrating that HDBSCAN is superior for lower-dimensional average data, while the performance gap narrows for high-dimensional data.
Figure 7.
Comparative analysis of clustering quality using V-measure and ARI metrics. The chart displays the performance of HDBSCAN, k-means (K=5), and k-means (K=7), illustrating that HDBSCAN is superior for lower-dimensional average data, while the performance gap narrows for high-dimensional data.
Figure 8.
Temporal distribution of transport scenarios and their corresponding cluster assignments by the HDBSCAN algorithm on aggregated average data. Each colored block represents a specific cluster, showing a clear, non-overlapping temporal sequence that aligns with distinct traffic patterns throughout the simulated day.
Figure 8.
Temporal distribution of transport scenarios and their corresponding cluster assignments by the HDBSCAN algorithm on aggregated average data. Each colored block represents a specific cluster, showing a clear, non-overlapping temporal sequence that aligns with distinct traffic patterns throughout the simulated day.
Figure 9.
Distribution of experimental scenarios within clusters identified by HDBSCAN on aggregated average data. (a) A bar chart detailing the number of scenarios per cluster, showing a balanced distribution. (b) A pie chart illustrating the proportion of each scenario type in the experiment, highlighting the five primary traffic modes: Hrechany, Evening, Random, Morning, and Mixed.
Figure 9.
Distribution of experimental scenarios within clusters identified by HDBSCAN on aggregated average data. (a) A bar chart detailing the number of scenarios per cluster, showing a balanced distribution. (b) A pie chart illustrating the proportion of each scenario type in the experiment, highlighting the five primary traffic modes: Hrechany, Evening, Random, Morning, and Mixed.
Table 1.
Clustering performance on aggregated average traffic data. External and internal validation metrics are presented for HDBSCAN, k-means (K=5), and k-means (K=7). Higher values are better for V-measure, Rand Index, ARI, NMI, Fowlkes–Mallows, Silhouette, and Calinski–Harabasz scores; lower is better for the Davies–Bouldin Index.
Table 1.
Clustering performance on aggregated average traffic data. External and internal validation metrics are presented for HDBSCAN, k-means (K=5), and k-means (K=7). Higher values are better for V-measure, Rand Index, ARI, NMI, Fowlkes–Mallows, Silhouette, and Calinski–Harabasz scores; lower is better for the Davies–Bouldin Index.
| Approach |
V-measure |
Rand Index |
ARI |
NMI |
Fowlkes–Mallows |
Silhouette |
Calinski–Harabasz |
Davies–Bouldin |
| HDBSCAN |
0.79 |
0.93 |
0.73 |
0.79 |
0.78 |
0.52 |
124.95 |
0.92 |
| k-means (K=5) |
0.73 |
0.90 |
0.70 |
0.73 |
0.76 |
0.57 |
292.23 |
0.65 |
| k-means (K=7) |
0.70 |
0.89 |
0.63 |
0.70 |
0.70 |
0.53 |
265.10 |
0.84 |
Table 2.
Clustering performance on high-dimensional combined traffic data. The table presents validation metrics for HDBSCAN and k-means, illustrating the impact of increased data dimensionality on algorithm performance.
Table 2.
Clustering performance on high-dimensional combined traffic data. The table presents validation metrics for HDBSCAN and k-means, illustrating the impact of increased data dimensionality on algorithm performance.
| Approach |
V-measure |
Rand Index |
ARI |
NMI |
Fowlkes–Mallows |
Silhouette |
Calinski–Harabasz |
Davies–Bouldin |
| HDBSCAN |
0.64 |
0.88 |
0.61 |
0.64 |
0.68 |
0.26 |
42.83 |
1.49 |
| k-means (K=5) |
0.67 |
0.87 |
0.62 |
0.67 |
0.71 |
0.23 |
34.79 |
1.59 |
| k-means (K=7) |
0.66 |
0.88 |
0.59 |
0.66 |
0.67 |
0.19 |
26.84 |
2.14 |
Table 3.
Comprehensive performance metrics for clustering algorithms on aggregated average traffic data. A full suite of external and internal validation metrics for HDBSCAN and k-means.
Table 3.
Comprehensive performance metrics for clustering algorithms on aggregated average traffic data. A full suite of external and internal validation metrics for HDBSCAN and k-means.
| Metric |
HDBSCAN |
k-means (K=5) |
k-means (K=7) |
| V-measure |
0.79 |
0.73 |
0.70 |
| Rand Index |
0.93 |
0.90 |
0.89 |
| Adjusted Rand Index (ARI) |
0.73 |
0.70 |
0.63 |
| Mutual Information |
1.53 |
1.21 |
1.26 |
| Normalized Mutual Information (NMI) |
0.79 |
0.73 |
0.70 |
| Adjusted Mutual Information (AMI) |
0.75 |
0.70 |
0.67 |
| Fowlkes–Mallows scores |
0.78 |
0.76 |
0.70 |
| Silhouette Score |
0.52 |
0.57 |
0.53 |
| Davies–Bouldin Index |
0.92 |
0.65 |
0.84 |
| Calinski–Harabasz Index |
124.95 |
292.23 |
265.10 |
Table 4.
Comprehensive performance metrics for clustering algorithms on high-dimensional combined traffic data. The results illustrate the performance shift caused by the ’curse of dimensionality.
Table 4.
Comprehensive performance metrics for clustering algorithms on high-dimensional combined traffic data. The results illustrate the performance shift caused by the ’curse of dimensionality.
| Metric |
HDBSCAN |
k-means (K=5) |
k-means (K=7) |
| V-measure |
0.64 |
0.67 |
0.66 |
| Rand Index |
0.88 |
0.87 |
0.88 |
| Adjusted Rand Index (ARI) |
0.61 |
0.62 |
0.59 |
| Normalized Mutual Information (NMI) |
0.64 |
0.67 |
0.66 |
| Adjusted Mutual Information (AMI) |
0.61 |
0.64 |
0.62 |
| Fowlkes–Mallows scores |
0.68 |
0.71 |
0.67 |
| Silhouette Score |
0.26 |
0.23 |
0.19 |
| Davies–Bouldin Index |
1.49 |
1.59 |
2.14 |
| Calinski–Harabasz Index |
42.83 |
34.79 |
26.84 |
Table 5.
Cluster assignments for key transport scenarios on aggregated average data. The table shows the categorization of different time periods and scenarios by the HDBSCAN, k-means (K=5), and k-means (K=7) algorithms.
Table 5.
Cluster assignments for key transport scenarios on aggregated average data. The table shows the categorization of different time periods and scenarios by the HDBSCAN, k-means (K=5), and k-means (K=7) algorithms.
| Time Period |
Scenario Type |
HDBSCAN |
k-means (K=5) |
k-means (K=7) |
| 00:00–01:30 |
Morning |
Cluster 2 |
Cluster 3 |
Cluster 4 |
| 01:30–02:30 |
Random No. 1 |
Cluster 3 |
Cluster 4 |
Cluster 7 |
| 02:30–04:00 |
Evening |
Cluster 4 |
Cluster 2 |
Cluster 3 |
| 04:20–05:20 |
Hrechany |
Cluster 1 |
Cluster 1 |
Cluster 1 |
| 05:20–06:20 |
Random No. 2 |
Cluster 6 |
Cluster 4 |
Cluster 5 |
| 06:30–07:30 |
Evening (variation) |
Cluster 4 |
Cluster 2 |
Cluster 6/3 |
| 07:30–08:30 |
Hrechany (variation) |
Cluster 1 |
Cluster 1 |
Cluster 1 |
Table 6.
Performance comparison between standalone algorithms and the cascade approach. The table showcases the improvements in clustering structure quality (V-measure) and compactness (Silhouette Score) achieved by the adaptive approach.
Table 6.
Performance comparison between standalone algorithms and the cascade approach. The table showcases the improvements in clustering structure quality (V-measure) and compactness (Silhouette Score) achieved by the adaptive approach.
| Criterion |
HDBSCAN (Standalone) |
k-means (Standalone) |
Cascade Approach |
| Structure Quality (V-measure) |
0.79 |
0.73 |
0.79–0.82 (+0–4%) |
| Cluster Compactness |
0.52 |
0.57 |
0.57–0.59 (+4–13%) |
Table 7.
Scenario identification accuracy rates for different clustering approaches. The table shows the percentage accuracy for identifying five distinct transport scenarios and the average accuracy for each algorithm.
Table 7.
Scenario identification accuracy rates for different clustering approaches. The table shows the percentage accuracy for identifying five distinct transport scenarios and the average accuracy for each algorithm.
| Scenario Type |
HDBSCAN (%) |
k-means (K=5) (%) |
k-means (K=7) (%) |
Cascade Approach 1 (%) |
| Morning Peaks |
95 |
92 |
88 |
95–97 |
| Evening Peaks |
93 |
90 |
85 |
93–96 |
| Hrechany Scenario |
98 |
98 |
98 |
98 |
| Mixed Modes |
91 |
85 |
82 |
91–94 |
| Low-Active Periods |
87 |
83 |
79 |
87–90 |
| Average Accuracy |
92.8 |
89.6 |
86.4 |
92.8–95.0 |
Table 8.
Robustness of clustering algorithms to noise, measured by the ARI. The table shows quality degradation for each approach as Gaussian noise levels increase from 0% to 35%.
Table 8.
Robustness of clustering algorithms to noise, measured by the ARI. The table shows quality degradation for each approach as Gaussian noise levels increase from 0% to 35%.
| Noise Level |
HDBSCAN |
k-means (K=5) |
k-means (K=7) |
Cascade Approach 1
|
| 0% (basic) |
0.73 |
0.70 |
0.63 |
0.73 |
| 15% |
0.71 (–3%) |
0.64 (–8%) |
0.58 (–8%) |
0.71 (–3%) |
| 25% |
0.68 (–7%) |
0.60 (–15%) |
0.53 (–16%) |
0.68 (–7%) |
| 35% |
0.65 (–11%) |
0.55 (–21%) |
0.48 (–24%) |
0.65 (–11%) |
Table 9.
Temporal coherence analysis of clustering results. The table compares the temporal consistency of clusters generated by each approach, measured by a coherence coefficient and the number of temporal intersections.
Table 9.
Temporal coherence analysis of clustering results. The table compares the temporal consistency of clusters generated by each approach, measured by a coherence coefficient and the number of temporal intersections.
| Approach |
Coherence Coefficient |
Intersections in Time |
| HDBSCAN |
0.94 |
0 |
| k-means (K=5) |
0.89 |
2 |
| k-means (K=7) |
0.85 |
5 |
| Cascade Approach |
0.94 |
0 |
Table 10.
Statistical significance of performance differences (Wilcoxon signed-rank test). The table shows the W-statistic and p-value for key comparisons, confirming the significance of observed advantages.
Table 10.
Statistical significance of performance differences (Wilcoxon signed-rank test). The table shows the W-statistic and p-value for key comparisons, confirming the significance of observed advantages.
| Comparison |
W-Statistic |
p-Value |
| HDBSCAN vs. k-means (K=5) |
78 |
0.008 |
| HDBSCAN vs. k-means (K=7) |
85 |
0.003 |
| Averages vs. Combined Values |
92 |
0.002 |
Table 11.
Computational performance of the cascade approach on the experimental dataset (K=132 time windows, aggregated average data).
Table 11.
Computational performance of the cascade approach on the experimental dataset (K=132 time windows, aggregated average data).
| Processing Stage |
Execution Time (seconds) |
| Feature Extraction |
1.48 |
| HDBSCAN Clustering |
4.72 |
| k-means Refinement (K=8) |
0.19 |
| Quality Evaluation and Voting |
0.45 |
| Total Runtime |
6.84 |