3.1. Dataset
This study adopts the open source IEEE CIS Fraud Detection dataset as a unified benchmark for enterprise financial anomaly monitoring. The dataset is designed for real-world transaction risk control scenarios. It provides structured records at the transaction level. Anomalies can be directly defined as high-risk or fraudulent transactions, which represent a typical form of financial abnormal events. The task setting closely matches the objective of identifying a small number of abnormal behaviors from massive transaction streams. It is therefore suitable for evaluating the effectiveness and practical feasibility of representation space modeling for anomaly detection.
The dataset consists of two main components. The transaction table contains business-related features such as transaction time, transaction amount, product category, payment card attributes, address information, and distance-related variables. It also includes a large number of anonymized statistical and behavioral features. The identity table provides device type, device information, and multiple anonymized identity-related variables. These two tables can be linked through transaction identifiers. This linkage forms a typical multi-source input structure. It supports the modeling setting of mapping multiple fields and multiple source data into a unified latent representation space.
From the perspective of anomaly monitoring requirements, the dataset exhibits properties of multidimensional heterogeneity, relational constructability, and temporal aggregatability. Multiple feature types can be jointly encoded into transaction-level representation vectors. Fields such as card identifiers, email domains, and device information naturally induce implicit links across transactions. This property facilitates the construction of representation structures that better reflect real enterprise risk propagation patterns. In addition, transaction timestamps enable window-based aggregation to form stable behavioral segment representations. This aligns with the anomaly scoring strategy based on representation consistency and structural shift.
Overall, the dataset provides a reproducible and publicly verifiable open benchmark for enterprise financial anomaly monitoring. It enables systematic evaluation without reliance on proprietary accounting systems. This makes it suitable for validating representation-based monitoring frameworks in a transparent and repeatable manner.
3.3. Experimental Results
This article first presents the results of the comparative experiments, as shown in
Table 1.
A global comparison reveals a clear and stable performance hierarchy among the baseline methods. This pattern indicates that the task is highly sensitive to representation quality and to the definition of anomaly decision boundaries. The proposed approach achieves the best results on Acc, Precision, Recall, and F1. In particular, the F1 score reaches 0.91. This result reflects a better balance between false alarm control and missed anomaly reduction. Such a balance aligns with the core requirements of enterprise financial anomaly monitoring. In high noise settings with multiple source features and low anomaly ratios, it is critical to reduce unnecessary disruption caused by false alarms while avoiding losses from undetected high-risk behaviors.
The improvement in Precision suggests that the method provides clearer separation between normal and abnormal samples in the anomaly scoring space. Anomaly decisions are concentrated in more reliable risk regions. For enterprise financial monitoring, higher Precision implies more actionable alerts. Audit and investigation resources are less likely to be wasted on low-value warnings. The proposed method reaches a Precision of 0.92. This further reduces false positives compared to other approaches. The result indicates that the representation space captures normal financial behavior in a more compact manner. As a consequence, deviations can be identified more consistently.
At the same time, the gain in Recall indicates more comprehensive anomaly coverage. The model is able to capture hidden or weak signal anomalies. Financial anomalies often emerge as gradual structural shifts rather than abrupt extreme values. Models that rely only on local features or short-term patterns tend to miss such cases. The Recall of the proposed method reaches 0.89. This shows that anomaly detection is not limited to obvious outliers. It remains sensitive to marginal anomalies along potential risk chains. This behavior is consistent with the objective of a framework based on representation consistency and structural shift. It helps expand risk coverage under complex business relations and multidimensional behavior signals.
Overall, the improvement in F1 demonstrates a better trade-off in enterprise financial anomaly monitoring. Alert credibility is preserved while anomaly capture ability is enhanced. Compared with the incremental gains of baseline methods, the advantage of the proposed approach stems from a shift in modeling paradigm. Anomaly detection moves from raw feature thresholds and shallow pattern matching to measuring stability in latent representation structures. This leads to more robust decision boundaries under multi-source, multi-dimensional, and dynamic financial data distributions. Such characteristics are better suited for deployment in enterprise monitoring processes that are constrained by risk control objectives and limited resources. They provide more consistent signals for continuous monitoring and audit decision-making.
The learning rate determines the step size of parameter updates and is a key factor affecting the stability of representation learning and the convergence behavior of anomaly scoring boundaries. Different learning rates alter the model’s trade-off between noise suppression and pattern fitting, thus affecting its sensitivity to identifying corporate financial anomalies. To verify the robustness and controllability of the proposed framework to changes in the learning rate, multiple learning rates were set, and the response trends of core indicators with the learning rate were observed. The experimental results are shown in
Figure 2.
Across the four subplots, the overall shapes reveal a clear interval effect of the learning rate on representation learning effectiveness. When the learning rate is too low, parameter updates are limited. The model remains close to its initial representation structure. It struggles to absorb critical differences from multiple source financial behaviors. As a result, overall scores stay low and improve slowly. As the learning rate increases to a moderate range, all metrics rise together. This indicates that the latent representation space begins to form clearer normal structures and anomaly shift boundaries. The anomaly scoring function becomes more confident in separating risk samples.
From the perspective of accuracy, the bars associated with a moderate learning rate are clearly higher. This suggests a more stable global decision boundary and a more complete separation between normal and abnormal samples. For enterprise financial anomaly monitoring, this means the system is less likely to suffer from widespread misclassification during continuous operation. This is especially important when transaction distributions exhibit intraday fluctuations or business structure changes. An appropriate update step allows the representation space to remain adaptive while avoiding frequent boundary drift that would trigger unstable alerts. When the learning rate increases further, accuracy declines. This implies that overly aggressive updates disrupt the previously compact normal structure and reduce overall consistency.
For precision and recall, their optimal learning rate ranges do not fully coincide. Precision reaches a higher level around the moderate learning rate. This indicates that alerts are concentrated in more reliable risk regions. It reduces cases where normal transactions are incorrectly flagged as anomalies. This directly affects whether audit resources are consumed by low-value alerts. Recall becomes more prominent at a slightly higher but still moderate learning rate. This reflects a stronger tendency to capture marginal and weak signal anomalies, which expands risk coverage. When the learning rate continues to increase, recall drops sharply. This suggests instability in the representation space, where true structural shifts are masked by noisy updates.
The trend of the F1 score further confirms the typical trade-off in enterprise financial anomaly monitoring. False alarms must be controlled, while missed detections must be avoided. F1 reaches its peak in the moderate learning rate interval. This indicates that reconstruction consistency and structural shift measurement produce a more coordinated anomaly score. A better balance is achieved between alert reliability and anomaly coverage. For a framework centered on representation-based data modeling, this observation highlights the learning rate as a key control factor. It strongly influences representation geometry and threshold stability. Selecting a moderate learning rate is more conducive to producing controlled, reproducible, and stable outputs for real-world financial monitoring processes.
This paper also presents the impact of the optimizer on the experimental results, as shown in
Table 2.
A comparison of different optimizers indicates that optimization strategies strongly influence representation learning and the stability of anomaly decision boundaries, with AdamW achieving the best performance across all metrics. Its adaptive learning rates and explicit regularization help handle heterogeneous feature scales and suppress overfitting, leading to more reliable risk representations and stable alerts. Compared with Adam, AdamW provides balanced improvements in Precision and Recall, reducing both false alarms and missed anomalies, which is critical for enterprise financial monitoring. In contrast, AdaGrad performs less effectively, likely due to overly rapid learning rate decay that limits adaptation under dynamic data distributions, while SGD shows moderate performance because of its lack of adaptivity and sensitivity to manual tuning in heterogeneous feature spaces. The F1-score ranking further confirms that AdamW offers the best trade-off between reliability and anomaly coverage, making it suitable as a default optimization strategy. In addition, the number of network layers directly affects representation capacity and structural abstraction, influencing the separation between normal patterns and abnormal shifts, and the impact of model depth on performance is analyzed in
Figure 3.
The curve patterns indicate that network depth has a clear impact on the geometry of the latent representation space. An optimal complexity range can be observed. With a small number of layers, the expressive capacity of the mapping is limited. The model cannot fully absorb relational information across multiple source financial features. Normal structures are therefore not compact. Anomaly shifts are more easily obscured by mixed signals. As depth increases to a moderate level, the model learns more stable behavior representations. Structural shift signals used for anomaly scoring become clearer. Overall identification performance improves.
From the accuracy curve, a peak appears around a moderate depth and then gradually declines. This shows that deeper networks do not necessarily yield better decision boundaries. In enterprise financial anomaly monitoring, overly deep models are more likely to fit short-term noise and incidental patterns as if they were generalizable structures. Normal clusters in the representation space become more dispersed. Stable separation of anomaly shifts is weakened. This behavior is consistent with real business data, where feature dimensionality is high but effective signals are sparse. Excessive model capacity tends to introduce unnecessary complexity in decision boundaries.
The precision curve rises first, then reaches a plateau, and finally declines at larger depths. This implies that an appropriate depth allows alerts to concentrate in more reliable risk regions. False alarms and audit resource waste are reduced. When depth increases further, precision decreases. Boundary constraints for normal samples become weaker. Normal structures are overpartitioned. More normal transactions fall into high-risk regions. For a framework centered on representation consistency and structural shift, this suggests that encoder depth must match the noise level of the data. Otherwise, representation consistency is damaged, and alert credibility declines.
The trends of recall and F1 more clearly reflect the trade-off between anomaly coverage and alert reliability. Recall reaches a high point at moderate depth and then drops sharply. This indicates that overly deep networks may oversmooth anomaly features or be disrupted by noise fitting. Marginal and weak signal anomalies are missed. The F1 curve forms a peak near the moderate depth and declines when the network is too shallow or too deep. This shows that this depth range maintains both false alarm control and anomaly capture ability. It better satisfies enterprise requirements for stability and controllability. It also supports the conclusion that the proposed method more easily forms separable and interpretable representation structures under moderate model complexity.
The intensity of input noise injection can perturb the statistical structure and representation space consistency of financial features, thereby affecting the stability and environmental robustness of the anomaly scoring mechanism in this paper. The experimental results are shown in
Figure 4.
The bar chart shows that as the intensity of input noise injection increases, the F1 score exhibits a consistent downward trend. This indicates that environmental disturbance directly weakens the overall anomaly detection capability. In enterprise financial anomaly monitoring, such noise can correspond to data collection errors, inconsistent field reporting, cross-system synchronization bias, and statistical perturbations introduced by anonymization. Increased noise blurs the previously clear boundaries of normal structures in the representation space. As a result, the responsiveness of anomaly scoring to structural shifts is reduced. In the low noise range, the F1 score remains at a relatively high level. This suggests that the representation learning mechanism has a certain degree of robustness to mild perturbations. Stable representations of normal behavior can still be maintained. From a business perspective, this means that when data quality fluctuates slightly, alert outputs do not immediately become unreliable and risk signals remain usable. Under a framework centered on representation consistency and structural shift, mild noise is not sufficient to disrupt reconstruction consistency or the central structure of representations. The relative geometry between normal and abnormal samples is largely preserved.
When noise intensity reaches a moderate level, the decline in F1 accelerates. This reflects a significant loss in separability between normal and anomalous samples in the representation space. Noise at this level not only perturbs features but also alters similarity structures among samples. The model struggles to maintain compact normal clusters. As a consequence, the score distributions of anomalies and normal samples begin to overlap. Once overlap increases, the system faces simultaneous risks of rising false positives and rising false negatives. The former consumes audit resources, while the latter conceals true abnormal behaviors. This directly targets the most sensitive cost and risk trade-off in enterprise financial monitoring. In the high noise range, the F1 score continues to decline and approaches a low level. This indicates that the tolerance limit of the model to strong disturbances has been exceeded. Structural stability required for representation modeling can no longer be preserved. For practical deployment, this result highlights the need to strengthen data pipeline quality control and consistency checks. Measures include missing value handling, outlier truncation, field standardization, and cross-source alignment. These steps reduce structural drift caused by noise injection. The results also show that representation-based anomaly monitoring frameworks have quantifiable robustness boundaries under environmental changes. The sensitivity curve can serve as an operational reference for alert credibility. It can guide decisions on when to trigger data remediation or strategy updates.