B. Experimental Results
This paper first conducts a comparative experiment, and the experimental results are shown in
Table 1.
As shown in
Table 1, different models exhibit significant variations in performance on risk assessment tasks based on electronic health records. Traditional temporal modeling methods such as LSTM and GRU have certain advantages in capturing changes in patients' health states. However, due to their limited ability to model long-term dependencies, these models face performance bottlenecks in complex disease progression representation. Although both models maintain high Recall values, indicating good sensitivity to potential risk events, their Precision is relatively low. This suggests that misclassification still occurs, and the models cannot fully capture the latent semantic associations among multidimensional medical features.
In contrast, the Transformer architecture performs better in modeling global dependencies through its self-attention mechanism. Its Accuracy and AUC are notably higher than those of recurrent neural network models. This indicates that attention-based sequence encoding can better capture potential causal relationships across different time intervals in EHR data, enabling stronger discriminative ability in risk prediction. Moreover, the representation power of the Transformer provides a better feature space foundation for downstream self-supervised tasks. However, its purely attention-based structure remains sensitive to noise and has limited generalization ability when handling high-dimensional heterogeneous medical data, making it difficult to adapt fully to inter-patient variability.
The GAT model introduces graph-structured relational modeling, which captures topological associations among medical events in the feature space. Experimental results show that the F1 and Precision values of GAT exceed those of the Transformer. This demonstrates that the graph-based feature propagation mechanism enhances the model's ability to represent interactions among patients' multidimensional features. Such a structure helps mitigate information loss in heterogeneous feature fusion for traditional sequential models, making the model more robust in identifying complex pathological patterns and multisource risk factors. However, GAT still faces limitations in capturing global temporal dependencies and needs further enhancement in modeling long time-span health records.
Overall, the proposed self-supervised representation learning model achieves the best performance across all metrics, particularly with Accuracy, F1, and AUC values of 0.911, 0.903, and 0.937, respectively. It demonstrates stronger stability and generalization capability. This advantage is mainly attributed to the model's ability to extract deep structural features of EHR data during pretraining through masking reconstruction and context prediction tasks. As a result, the model forms robust latent representations even with limited labeled data. These results confirm the effectiveness of the self-supervised mechanism in medical time-series data, showing that label-free representation learning can significantly enhance the model's ability to identify complex health risk patterns and provide new insights for intelligent and generalizable risk prediction.
This paper also presents a learning rate sensitivity experiment, the experimental results of which are shown in
Figure 2. To further examine the stability and robustness of the proposed model, this experiment systematically explores how variations in the learning rate influence convergence dynamics, feature representation quality, and prediction consistency during training. By adjusting the learning rate across multiple magnitudes, the analysis aims to reveal the model's adaptability to optimization conditions and its ability to maintain performance under different parameter update scales. This setting provides an important perspective for understanding how the self-supervised representation learning framework responds to gradient variations in the context of EHR data modeling.
From the figure, it can be seen that the model's performance varies significantly under different learning rates, indicating that the learning rate plays an important role in the stability and convergence speed of the self-supervised representation learning process. When the learning rate is low (such as 1e-5), the model converges slowly but remains stable during training. The final Accuracy and F1-Score still stay at a relatively high level. As the learning rate increases to 1e-4, the model reaches its peak performance across all four metrics, suggesting that this value provides a good balance between weight update efficiency and parameter stability.
When the learning rate continues to increase to 5e-4 and above, the model's performance declines slightly, with noticeable decreases in Precision and AUC. This trend indicates that an excessively high learning rate causes rapid parameter updates, making it difficult for the model to capture the complex temporal dependencies and structural features in EHR data. Such behavior can lead to underfitting or gradient oscillations. In particular, for self-supervised reconstruction tasks, a large learning rate disrupts the consistency of the feature space, reducing the transferability and robustness of the learned representations.
The trends of different metrics show that Accuracy and F1-Score exhibit similar variation curves, while Precision is more sensitive to changes in the learning rate. This suggests that in risk assessment tasks, the learning rate not only affects overall classification performance but also directly determines the accuracy of identifying high-risk samples. The fluctuation of the AUC metric further supports this observation. When the learning rate is too high, the model's ability to distinguish risk samples drops significantly, indicating that its decision boundaries in the feature space become unstable, making it difficult to effectively separate risk and normal samples.
Overall, the experimental results demonstrate that the self-supervised representation learning framework is highly sensitive to the learning rate setting. A properly chosen learning rate can significantly improve model performance in EHR-based risk prediction. The optimal learning rate, around 1e-4, enables the model to achieve the best balance between reconstruction and risk encoding, forming a more robust health representation structure. This finding suggests that in self-supervised EHR modeling, optimizing hyperparameters not only affects convergence speed but also directly determines the effectiveness of latent feature extraction and the precision of subsequent risk identification.
This paper further presents a hidden layer dimension sensitivity experiment, the experimental results of which are shown in
Figure 3.
From the figure, it can be observed that the model shows a clear nonlinear trend under different hidden layer dimensions, indicating that the hidden layer dimension plays a critical role in the capability of self-supervised representation learning. When the dimension is small (such as 64 or 128), both Accuracy and F1-Score remain low, suggesting that the feature space capacity is insufficient to effectively capture the complex temporal dependencies and multimodal semantic information in electronic health records. As the dimension increases to 256, the model reaches its peak performance. This shows that the latent space at this scale achieves an optimal balance between representational richness and parameter stability, allowing the model to achieve higher discriminative power in risk assessment tasks.
When the hidden layer dimension further increases to 512 or above, the performance metrics begin to decline gradually, with noticeable decreases in Precision and AUC. This trend indicates that excessively high dimensions may lead to feature redundancy and noise amplification. As a result, the model tends to overfit the reconstruction task during self-supervised pretraining, weakening the discriminative ability of its learned features. The feature diffusion effect in high-dimensional space reduces the model's focus on key clinical variables, which negatively impacts the accuracy of downstream risk assessments. Moreover, higher dimensions may increase optimization difficulty, causing gradient instability and unnecessary computational overhead during training.
Overall, the experimental results demonstrate that a proper hidden layer dimension is essential for constructing high-quality self-supervised health representations. The optimal dimension, around 256, enables the model to efficiently learn dynamic features of patient health states in the latent space while maintaining good generalization and stability. This finding further confirms the rationality of the proposed structural design. By balancing feature capacity and parameter complexity, the model achieves robust modeling and efficient risk identification for EHR data, providing a strong foundation for future transfer and extension across various healthcare tasks.
This paper also presents a noise interference level sensitivity experiment, the experimental results of which are shown in
Figure 4.
From the figure, it can be seen that as the noise level increases, all performance metrics show a slight downward trend but remain within a high range. This indicates that the proposed self-supervised representation learning framework demonstrates strong robustness to input noise. When the noise level is low (between 0.0 and 0.01), Accuracy and F1-Score remain almost stable. This suggests that the model can effectively capture the main semantic information in electronic health records and suppress minor random disturbances during feature extraction. It also reflects that through masked reconstruction and context prediction during pretraining, the model has learned to adaptively repair local anomalies and missing information.
When the noise level rises to 0.1, Precision and AUC begin to decline more noticeably. This shows that noise disturbances gradually disrupt the consistency of the latent feature space, causing some samples to drift near the risk boundaries. This phenomenon is common in medical data, as noise in EHRs often originates from abnormal entries, measurement errors, or recording biases. The proposed model still maintains a relatively high AUC level, indicating that it retains strong discriminative power in risk identification. This can be attributed to the global feature alignment mechanism introduced during self-supervised training, which helps the model preserve stable discriminative features even under high-noise conditions.
As the noise ratio further increases to 0.2, the model performance declines more significantly, especially in F1-Score and Precision. This trend indicates that under high-noise conditions, the model's ability to balance recall and precision becomes more challenging. Some low-confidence samples may be misclassified, affecting overall predictive performance. Nevertheless, the model does not experience catastrophic degradation, which shows that its adaptive mechanism under multi-noise environments remains effective. Comparing performance across different noise levels suggests that the robustness of the model mainly comes from the compression of redundant features and the reinforcement of key variables during multitask self-supervised training.
Overall, the experimental results confirm the noise-resistant stability of the proposed method in EHR-based risk assessment tasks. Proper self-supervised feature modeling not only improves learning efficiency under low-noise conditions but also maintains robust representational capacity under strong noise interference. This ability to suppress noise disturbance makes the model more reliable and generalizable in real-world medical applications, providing solid technical support for analyzing complex, multisource, and unstructured EHR data.