4.3. Experiment
We evaluate the performance of all methods using standard metrics, namely Recall@K and NDCG@K, based on the top-
K recommended POIs for each user [
19], where
. The definitions for these evaluation metrics are given in Equations
18-
20.
Where:
Where:
is the maximum possible DCG for a given recommendation list.
(Discounted Cumulative Gain) measures the quality of ranked results.
denotes the relevance score of the item at position i.
N is the number of correctly recommended POIs.
We compare our model under different feature configurations in the process coefficient. Specifically, four model variants are evaluated: MTF-c, which incorporates only category features; MTF-a, which integrates area features; MTF-pt, which emphasizes user check-in pattern features; and the full MTF-POI model, which combines all features (category, area, and pattern) into a unified framework.
The experimental results are presented in
Table 3. It can be observed that integrating multiple features yields better overall performance, as the combined model (
MTF-POI) consistently outperforms the individual feature-based variants across all evaluation metrics.
Feature Effectiveness and Synergy in Multi-Feature Transitions
As shown in
Table 3, the area-based variant (MTF-a) achieves the best performance among single-feature models, outperforming both the category-based (MTF-c) and pattern-based (MTF-pt) configurations across all evaluation metrics. This finding confirms that spatial correlation is a dominant factor influencing user mobility and that clustering POIs into representative areas enables the model to better capture users’ movement tendencies within geographically coherent regions.
More importantly, the combined model (MTF-POI) that integrates category, area, and pattern features yields the highest accuracy overall. Specifically, Recall@1 increases from 0.2084 (MTF-a) to 0.2480 for NYC and from 0.2300 to 0.2430 for TKY, indicating an additional gain of +3.96 and +1.30 percentage points, respectively. This improvement highlights a synergistic interaction among the three features: the area feature provides spatial stability, the category feature contributes semantic context, and the pattern feature refines temporal and behavioral consistency. Together, they enable the model to precisely predict the most probable next POI at the top-1 rank while maintaining steady gains in higher K metrics (Recall@3, Recall@5, and NDCG).
Total Performance Score: TPS as Equation
21 was introduced to provide an integrated evaluation of model performance across multiple metrics and datasets. Specifically, the TPS summary in
Table 4 reveals consistent performance trends across feature configurations. Among the evaluated variants, MTF-a achieves the highest TPS on both NYC and TKY datasets, resulting in the largest overall TPS score. This indicates that the attribute-based feature configuration contributes the most comprehensive performance improvements across all ranking levels and metrics. In contrast, MTF-pt consistently outperforms MTF-c, suggesting that personalized temporal patterns capture user mobility behavior more effectively than spatial clustering alone. Notably, the ordering of TPS values (MTF-a > MTF-pt > MTF-c) remains stable across both datasets, demonstrating the robustness of feature importance rankings and reinforcing the generalizability of the proposed feature design. Overall, these results validate the effectiveness of incorporating richer contextual and temporal signals to enhance next POI prediction performance.
Normalized performance Weight: NPW as Equation
22 quantifies the relative contribution of each feature by integrating its predictive performance across both datasets and all evaluation metrics. First, the aggregate score
is computed by averaging 12 values: Recall@1, Recall@3, Recall@5, and NDCG@1, NDCG@3, NDCG@5 from the NYC and TKY datasets using a double summation over the datasets
d and metrics
k. This results in a unified performance score that reflects the overall effectiveness of the feature
f as Equation
22.
Based on the Total Performance Score (TPS) aggregated across both datasets, the area-based variant (
MTF-a) demonstrates the strongest overall effectiveness, achieving a perfect TPS of 3.9265. This result indicates that the
Area Feature is the most stable and informative signal for characterizing user mobility, as it consistently captures spatial correlations across diverse urban environments. In contrast, the pattern-based model (
MTF-pt) attains a moderate TPS of approximately 3.5978, suggesting that the
Pattern Feature contributes meaningful behavioral cues but remains less dominant than spatial information. Meanwhile, the category-based configuration (
MTF-c) yields the lowest TPS at around 3.1453, implying that the
Category Feature alone is insufficient for modeling users’ movement behavior and lacks the representational strength required for accurate POI transition learning. Although the category feature contributes to the overall performance, its effectiveness is generally weaker and more unstable compared to the area and pattern features. To reflect this observation and reduce the influence of noisy or weak category signals, we introduce a penalty factor
to downscale the raw performance score of the category feature before weight normalization. The weight
is then obtained by normalizing
with respect to the total score of all features, ensuring that the weights are comparable and collectively sum to one. Finally, the constraint
guaranties that the category, area, and pattern features together account for the full proportion of the performance contribution of the model as Equations
23-
24.
We compared our proposed framework, MTF-POI, with several baseline methods commonly used for Next Point-of-Interest (Next-POI) recommendation. To ensure reproducibility and prevent implementation bias in future research, we directly adopted the reported performance results from two benchmark datasets presented in the following studies:
Results from the proposed methodology and baseline comparisons are presented in
Table 5. When compared with AFNextPOI, certain metrics such as Recall@3, Recall@5, and NDCG@5 show slightly lower values for MTF-POI. However, MTF-POI achieves higher Recall@1, which is considered the most important evaluation metric for Next-POI recommendation.
4.3.1. Influence of Multifeature Feature Transition
The consistent superiority of MTF-POI across both datasets demonstrates that multi-feature transition awareness generalizes effectively under different spatial densities and urban structures. Hence, integrating all three correlated features not only enhances overall recall and ranking quality but also strengthens the model’s adaptability to both routine and non-routine user behaviors.
The bar chart as
Figure 7 presents a comparison of Recall@1 performance across four model version (MTF-c, MTF-a, MTF-pt, and MTF-POI) on two datasets (NYC and TKY). Each dataset has four group of bars, representing the performance of the different version.
From the results, MTF-POI consistently achieves the highest Recall@1 in both datasets. Among the three versions (MTF-c, MTF-a, and MTF-pt), MTF-a consistently achieves the highest Recall@1 on both the NYC and TKY datasets. This indicates that the feature-enhanced version (MTF-a) provides a stronger performance compared to the category-based (MTF-c) and pattern-based (MTF-pt) versions. By combining the three features and applying a weighted average for scoring, the performance is further improved compared to the MTF-POI version.
The contrasting results of MTF-pt between NYC and TKY reflect differences in mobility regularity and city structure. In the NYC dataset, MTF-pt performs slightly worse than MTF-c because the routine/non-routine pattern signal is not strongly correlated with the actual next-location transitions. NYC exhibits high mobility diversity, where users frequently visit new places, change routes dynamically, and have irregular schedules. As a result, binary pattern-type labels (routine vs. non-routine) provide little predictive power and may even introduce noise—causing MTF-pt to underperform compared to the simpler category-based feature.
In contrast, TKY displays much more structured and repeatable mobility behavior, driven by predictable commuting flows, concentrated activity zones, and higher regularity in daily routines. In such an environment, pattern-type information becomes highly valuable: whether a movement is routine or non-routine strongly influences the likelihood of the next POI. This makes MTF-pt substantially outperform MTF-c in the TKY dataset.
4.3.2. Influence of Hidden Location Attraction
The MTF-c-AHLTP follows the same modeling process as MTF-c, where next-POI prediction is based on category transitions. The key difference is that MTF-c-AHLTP augments user trajectories with hidden locations inferred from category associations using the AHLTP [
13] approach prior to model training.
The comparison between MTF-c and MTF-c-AHLTP in
Figure 8 shows that incorporating hidden locations does not improve top-rank accuracy or ranking quality. While MTF-c-AHLTP slightly increases candidate coverage at higher K values, it introduces noise into category transitions, leading to lower Recall@1 and NDCG@K. These results suggest that, for large-scale next-POI prediction, using only real check-in transitions is more effective than augmenting trajectories with inferred locations.
More importantly, this result indicates that addressing uncertain check-ins through feature transfer is more effective than explicitly inserting inferred locations into user trajectories. Rather than modifying the original trajectory structure, feature transfer enables the model to adaptively reweight or transform existing features based on contextual and behavioral information, thereby mitigating uncertainty without amplifying error propagation. In contrast, hidden location insertion alters transition sequences by introducing inferred check-ins that may not reflect users’ true short-term intent, leading to degraded ranking performance.
These findings suggest that, for large-scale next-POI prediction, modeling uncertainty at the feature level provides a more robust solution than augmenting trajectories with inferred locations.