3.1. Dataset
The dataset used in this study is the Corporate Financial Risk Assessment Dataset. It was obtained from an open financial data platform and covers corporate financial and operational indicators from multiple industries. The dataset includes annual financial data and performance labels for approximately 5,000 enterprises, spanning the years 2013 to 2022. The main variables include total assets, debt ratio, current ratio, operating income growth rate, net profit margin, cash flow status, proportion of R&D investment, and market valuation. It also contains information on each enterprise’s industry, region, and credit rating. This dataset provides a solid foundation for building multidimensional models of enterprise performance evaluation and growth potential analysis.
During data preprocessing, the original data were cleaned and standardized. Samples with a high proportion of missing values or extreme anomalies were removed. All continuous variables were normalized using the Z-score method to eliminate differences in measurement units, while categorical variables were converted into numerical form through one-hot encoding. To ensure model comparability across enterprises of different sizes and industries, industry median correction and scale adjustment mechanisms were applied. This balanced the differences in financial structure and maintained statistical consistency in cross-industry prediction tasks.
In addition, since enterprise performance exhibits strong temporal dependency, the dataset was organized into time-series slices based on fiscal years. This allows the model to capture enterprise development trends and dynamic changes over time. Finally, the dataset was divided into training, validation, and test sets with a ratio of 70%, 15%, and 15%. This structured dataset contains comprehensive financial and non-financial features, with strong temporal relevance and wide industry coverage. It provides high-quality support for the subsequent construction of the knowledge graph and causal inference modeling.
3.2. Experimental Results
This paper first conducts a comparative experiment, and the experimental results are shown in
Table 1.
From the comparative results presented in
Table 1, it is evident that the proposed model achieves the best overall performance across all evaluation metrics. Traditional sequence models such as LSTM and BiLSTM exhibit relatively high error levels due to their limited ability to capture long-term dependencies and complex multi-dimensional relationships within enterprise data. While these models can model sequential correlations, they fail to effectively extract the structural and semantic dependencies underlying multi-source financial indicators. Consequently, their predictions are more sensitive to noise and temporal fluctuations, resulting in higher MSE and MAE values compared with transformer-based methods.
The transformer-family models, including Informer, FedFormer, and BERT, demonstrate notable improvements owing to their attention mechanisms and global context modeling capabilities. These architectures effectively capture cross-temporal dependencies and heterogeneous financial signals, leading to lower prediction errors. However, despite their strength in learning abstract representations, these models remain limited in causal interpretability and lack an explicit mechanism to reason over the latent inter-relationships among corporate indicators. This limitation constrains their capacity to distinguish between correlation and true causal influence, which is critical in performance optimization scenarios that involve dynamic enterprise decision-making processes.
By contrast, the proposed method integrates knowledge graph representation with causal reasoning, enabling the model to understand not only the associations but also the causal pathways influencing enterprise performance. The incorporation of structured semantic knowledge enhances interpretability, while the causal inference layer allows the model to simulate intervention effects and identify key performance drivers. As a result, the proposed model achieves the lowest MSE (0.5827), MAE (0.4678), and RMSE (0.7632), representing a substantial improvement over both transformer-based and language-model baselines. These findings confirm that the joint modeling of knowledge and causality provides a more robust, explainable, and data-driven approach to enterprise performance optimization.
This paper also presents a sensitivity analysis of the MSE index for different embedding dimensions, and the experimental results are shown in
Figure 2.
As shown in
Figure 2, the MSE index exhibits a clear downward trend as the embedding dimension increases, indicating that larger representation spaces enable the model to capture richer and more discriminative semantic features. When the embedding dimension is small (e.g., Dim-32 and Dim-64), the model’s expressive capacity is constrained, leading to higher reconstruction errors. As the dimension expands to 128 and beyond, the performance stabilizes and the MSE value decreases significantly, reflecting that the model achieves a balance between feature compactness and representation sufficiency.
However, the improvement becomes marginal once the embedding dimension exceeds 256, implying that excessively large latent spaces contribute limited additional information and may introduce redundancy or overfitting risk. The lowest MSE observed at Dim-512 demonstrates that the proposed framework benefits from a moderately high-dimensional embedding, where the integration of knowledge-graph semantics and causal dependencies reaches optimal synergy. This result confirms that embedding dimensionality plays a crucial role in model generalization and performance optimization within the enterprise performance prediction framework.
This paper further presents an assessment of the sensitivity of noise injection intensity to MAPE, and the experimental results are shown in
Figure 3.
From the figure, it can be observed that as the intensity of noise injection gradually increases, the model performance shows a clear degradation trend. This indicates that perturbations in the input features have a direct impact on the overall stability of the prediction. When the noise level is low, the model can effectively capture and represent key features, and the output remains stable. However, as the noise becomes stronger, the extraction of structured information and the identification of causal relationships are disturbed, leading to a certain degree of fluctuation and instability.
This phenomenon shows that although the proposed framework has some resistance to disturbances within a limited range, it still depends on the purity of the input signals. When the noise level is too high, the collaborative mechanism between knowledge representation and causal inference is disrupted, causing a shift in the decision space. The overall trend confirms the importance of data quality control and feature robustness modeling in enterprise performance optimization. It also demonstrates that the model’s behavior under noisy conditions reflects its essential characteristics of generalization and stability.
This paper also presents the impact of different optimizers on the experimental results of MAE, and the experimental results are shown in
Figure 4.
From the figure, it can be observed that different optimizers exhibit clear performance differences during model training. This indicates that the optimization algorithm plays a crucial regulatory role in the framework that integrates knowledge graphs and causal inference. Overall, optimizers with adaptive learning rate mechanisms perform better in terms of stable convergence and error control. Among them, Adam achieves a good balance between update direction and learning step adjustment, enabling the model to optimize quickly and stably in a multidimensional feature space and obtain lower error values. In contrast, optimizers based on traditional gradient descent are more likely to fall into local minima when modeling complex nonlinear features, resulting in slightly inferior overall performance.
These results highlight the importance of optimizer selection in learning multi-layer semantic features and causal dependencies within the integrated model. In this research framework, the knowledge graph component requires capturing semantic associations among multiple entities, while causal structure learning relies on the stability of gradient propagation. If the optimizer cannot maintain a proper adaptive learning rate between global and local features, the model may experience feature drift or unstable convergence, leading to a decline in overall performance. Therefore, optimizers with dynamic learning rate adjustment and gradient normalization capabilities are better suited to handle the model’s complex feature distributions.
Furthermore, the results show that the model’s sensitivity to the optimizer is closely related to the characteristics of the task. In the process of integrating knowledge and causal reasoning, the parameter space exhibits a high degree of nonlinear coupling. Choosing an appropriate optimization strategy affects not only the error convergence speed but also the efficiency of information coordination across different feature layers. The best-performing optimizer can ensure stable convergence while enhancing the model’s ability to capture potential structural patterns, providing more reliable support for enterprise performance prediction and optimization.