B. Experimental Results
This paper first conducts a comparative experiment, and the experimental results are shown in
Table 1.
Overall, the comparative experiments shown in
Table 1 clearly demonstrate the significant advantages of the proposed adaptive spatiotemporal graph contrastive learning model in cloud service anomaly detection tasks. Compared with traditional sequence modeling methods such as LSTM and BiLSTM, the proposed model achieves substantial improvements in both accuracy and recall. This indicates that time-series-based one-dimensional modeling alone can no longer fully capture the complex multidimensional dependencies in cloud environments. By introducing a spatiotemporal graph structure, the model can identify correlations among service nodes while tracking temporal dynamics, effectively alleviating the performance degradation caused by distribution shifts and workload fluctuations.
From the perspective of spatial dependency modeling, graph-based methods such as GNN and GAT can capture certain topological information but remain limited by the assumption of static graphs. They cannot effectively represent the dynamic evolution of service dependencies. In contrast, the proposed model adopts a dynamic updating mechanism for adaptive adjacency matrices, enabling the graph structure to self-adjust over time. This allows the model to maintain robustness in multi-service interaction and resource competition scenarios. In the experimental results, the model achieves a recall rate of 0.951, indicating stronger sensitivity and generalization in identifying anomalous samples and effectively reducing missed detections in complex cloud environments.
In addition, compared with general deep representation models such as Transformer and BERT, the proposed adaptive spatiotemporal graph contrastive learning method achieves a balance between global dependency modeling and feature consistency. Although Transformer-based models have strong capabilities in capturing temporal dependencies, they lack structural adaptability in highly dynamic topologies. The proposed design enhances the aggregation of normal states and the separability of abnormal states through spatiotemporal contrastive constraints and unsupervised representation optimization. As a result, the model achieves the highest F1-Score of 0.953. These results verify the effectiveness and stability of the fusion mechanism combining spatiotemporal graph structures and contrastive learning in cloud service anomaly detection tasks.
This paper further presents an experiment on the sensitivity of the number of attention heads to the F1-Score, aiming to explore how variations in the multi-head attention configuration influence the model's overall representational ability and detection stability. By adjusting the number of attention heads within the temporal–spatial feature fusion module, the experiment investigates the relationship between attention diversity and the model's capacity to capture critical dependencies across service nodes. The purpose of this analysis is to evaluate the robustness and adaptability of the proposed framework when balancing feature expressiveness and structural consistency under different attention configurations, as shown in
Figure 2.
The experimental results show that the number of attention heads has a significant impact on the overall performance of the model. When the number of heads is small, such as one or two, the model's F1-Score is relatively low. The main reason is that the feature subspace of the multi-head attention mechanism is insufficient at this stage, making it difficult to capture the global dependency structure among cloud service monitoring indicators. Since cloud systems contain complex nonlinear coupling relationships between service nodes, a small number of attention heads limits the model's representational capacity during feature aggregation. This leads to overly concentrated feature information, causing information loss and reduced discriminative ability.
When the number of attention heads increases to a moderate level, such as four to six, the model performance improves significantly, and the F1-Score reaches its peak value of 0.953. This indicates that under this configuration, the multi-head attention mechanism achieves a better balance between feature decomposition and cross-dimensional information aggregation. It effectively captures spatiotemporal dependencies and enhances the model's expressive flexibility. At this stage, the model can capture both local abnormal fluctuation patterns and global correlations among nodes, improving the precision and recall of anomaly detection.
However, when the number of attention heads continues to increase, such as eight or ten, the model's performance slightly declines. This phenomenon suggests that too many attention heads may introduce feature redundancy and computational noise, leading to overly dispersed representations during information fusion and disrupting the inherent spatiotemporal consistency. In addition, redundant attention heads increase the parameter complexity of the model and cause a dilution effect in feature distribution, which weakens the separability of anomalous samples and reduces detection stability.
Overall, the experimental results indicate that setting an appropriate number of attention heads is crucial for maintaining model stability and generalization performance. In this framework, a moderate number of attention heads achieves a balance between sufficient feature representation and structural constraint. This allows the model to adaptively capture spatiotemporal dependencies among service nodes in dynamic cloud environments and maintain high detection accuracy and robustness under complex workload fluctuations. These findings further verify the effectiveness of the adaptive spatiotemporal graph contrastive learning mechanism in multi-scale feature modeling and anomaly pattern representation.
This paper also presents the effect of the temporal window size on the experimental results, as shown in
Figure 3.
From the overall trend, it can be observed that the length of the time window has a clear influence on model performance. Smaller time windows, such as 4 or 8, can quickly capture short-term dynamic features, but their limited coverage makes it difficult for the model to integrate long-term dependencies effectively. As a result, the overall detection performance remains low. In this case, the model responds sensitively to local fluctuations but cannot model long-term trends, which leads to misjudgments in complex cloud environments. As the time window increases, the model gradually captures the evolution of abnormal behaviors over longer time spans and identifies potential temporal dependencies. Accordingly, accuracy and recall show a continuous upward trend.
When the time window increases to a moderate range, such as 12 to 16, the model achieves its best performance, with all four metrics reaching peak values. This result indicates that within this range, the model maintains a balance between short-term local fluctuations and long-term trend patterns. It can capture transient anomalies while understanding stable evolution under temporal dependencies. The improvement in both F1-Score and Recall is particularly evident, showing that the model achieves a good balance between the completeness and precision of anomaly detection. This characteristic aligns with the multi-scale dynamics of real-world cloud service systems, where anomalies may result from both short-term load bursts and long-term state drifts.
However, when the time window further expands to larger values, such as 20 or 24, the model performance slightly decreases. Excessively long time windows may introduce too much historical information, causing redundant or outdated dependencies in feature representations. This dilutes the critical features of the current state. Longer sequences also increase input noise and computational complexity, making it difficult for the model to distinguish between short-term anomalies and background changes. This phenomenon indicates that an overly long time window reduces the model's temporal sensitivity and weakens its ability to respond quickly in highly dynamic environments.
Overall, the analysis shows that selecting an appropriate time window length is crucial for cloud service anomaly detection. A moderate-length window provides the best balance of spatiotemporal features, enabling the adaptive spatiotemporal graph structure to more efficiently capture service dependencies and temporal evolution patterns. These experimental results further verify the adaptive advantages of the proposed model in multi-scale temporal modeling. They also provide theoretical support for introducing dynamic window optimization and self-adjustment mechanisms, offering practical guidance for performance transfer and deployment across different cloud system scenarios.
This paper also presents an experiment on the sensitivity of the node failure ratio to the F1-Score, and the experimental results are shown in
Figure 4.
The experimental results show that as the node failure ratio increases, the model performance in all metrics gradually declines. This indicates that node availability has a direct impact on anomaly detection performance. When the node failure ratio is low, between 0% and 5%, the model still maintains high accuracy and recall. This demonstrates that the adaptive spatiotemporal graph structure has a certain level of robustness and can preserve the continuity of feature propagation even when some nodes fail. The stability observed at this stage reflects the model's effectiveness in capturing redundant structural information and adaptively adjusting the graph topology.
When the node failure ratio increases to a moderate level, between 10% and 15%, the model performance begins to drop more noticeably, particularly in F1-Score and Precision. The main reason is that the failure of several key nodes weakens the connectivity of the service dependency graph, which obstructs information transmission paths. This reduces the model's feature aggregation capability during graph convolution and temporal modeling. At the same time, the loss of node features disrupts the integrity of spatiotemporal dependencies, resulting in less effective feature representations in anomalous regions. This shows that at higher failure rates, the model becomes more sensitive to topological disturbances. The balance of feature propagation is broken, which leads to a significant decline in anomaly detection effectiveness.
When the node failure ratio exceeds 20%, the downward trend becomes more severe, and the F1-Score reaches its lowest point. At this stage, the system's graph structure is severely degraded. Spatiotemporal contrastive learning can no longer maintain effective feature alignment or semantic aggregation. The model's representation space drifts, and the boundary between anomalous and normal samples becomes blurred. These results confirm that the proposed adaptive mechanism can maintain stable performance within a limited failure range but remains constrained by topological fragmentation and information loss under large-scale node failures. Therefore, future work may incorporate graph repair or redundancy-aware modules to enhance the model's self-healing ability and spatiotemporal robustness under extreme distribution disturbances.
This paper also presents the impact of the proportion of outlier samples on the experimental results, and the experimental results are shown in
Figure 5.
The experimental results show that the increase in the proportion of anomalous samples has a significant impact on model performance. All evaluation metrics show a continuous decline as the anomaly ratio increases. When the anomaly ratio is low, between 1% and 3%, the model maintains high accuracy, precision, and recall. This indicates that the proposed adaptive spatiotemporal graph contrastive learning model demonstrates good stability under small distribution perturbations. At this stage, the model can effectively learn the spatiotemporal dependency features of normal samples. Through self-supervised contrastive constraints, it can clearly distinguish between normal and abnormal behaviors, reflecting strong discriminative ability in low-noise environments.
As the anomaly ratio increases to a moderate level, between 5% and 10%, the model performance begins to decline noticeably, especially in F1-Score and Recall. This suggests that as the proportion of anomalous data grows, the structure of the feature space becomes disturbed, and the boundary between normal and abnormal samples becomes less distinct. The excessive presence of anomalous samples weakens the compactness of normal patterns, causing an imbalance in positive and negative sample pairs during contrastive learning and reducing the consistency of feature aggregation. Nevertheless, the model still maintains a relatively high level of precision, which shows that it retains a certain degree of resistance to interference through effective spatiotemporal feature fusion and structural adaptation.
When the anomaly ratio further increases beyond 15%, the overall model performance declines rapidly. Both accuracy and F1-Score drop sharply. At this point, anomalous data dominate the distribution, and the semantic features of normal samples are heavily diluted. This damages the topological consistency of the spatiotemporal graph structure and introduces noise into the information propagation paths. The model's performance degradation under this extreme imbalance indicates that, although the adaptive spatiotemporal mechanism can dynamically adjust dependency relationships, feature drift still occurs when anomalies dominate the distribution. The non-stationary distribution of anomalous samples may also disrupt the positive and negative sampling strategies in contrastive learning, significantly reducing the separability of the embedding space.
Overall, the experimental results demonstrate that the anomaly ratio plays a critical role in determining the robustness of cloud service anomaly detection models. A moderate proportion of anomalous samples can enhance the model's ability to learn anomaly patterns, while an excessively high ratio undermines the stability of the feature space. The proposed adaptive spatiotemporal graph contrastive learning mechanism can self-regulate within a certain range of anomaly ratios, maintaining good performance by strengthening spatiotemporal consistency in feature aggregation. These findings verify the model's practicality in handling complex distribution shifts and high-noise conditions and provide a theoretical foundation for future optimization strategies under imbalanced data scenarios.