3. Model Research
3.1. Sample Source
The sample data for this study was derived from the financial indicator data of 4,254 A-share listed companies in 2020, sourced from TongHuaShun Finance [
15]. The sample design encompassed both a sample group and a matched group.
In selecting the sample group, companies that were specially treated (ST and *ST) due to "abnormal financial conditions" were chosen as the markers of financial distress (i.e., the research subjects). Statistics revealed that there were 200 A-share listed companies with ST and *ST status in 2020. After excluding companies with missing indicator data, 182 companies with valid data remained. Following the treatment of outlier values, 160 listed companies were selected as the sample group for our modeling, including 80 manufacturing enterprises and 80 non-manufacturing enterprises. Statistics further indicate that 80% of these 160 ST and *ST companies are privately owned, while 20% are state-owned enterprises.
For the selection of the matched group, 160 financially healthy companies were chosen based on the method of finding companies with the closest ending asset totals to those of the ST and *ST companies. The proportion of enterprise attributes in the matched group was kept identical to that of the sample group.
In total, the combined matched sample comprises financial data from 320 companies.
3.2. Indicator Selection
Currently, there is no unified standard for establishing an indicator system in research literature on financial early warning. Different scholars have chosen different indicators in their research processes. This paper employed the Delphi method to select key financial indicators from the set of financial indicator knowledge graphs as research variables [
16]. These eight financial indicators, after repeated deliberation by experts, comprehensively cover the core indicators of a company's various aspects of operations, management, and finance, thus forming the financial feature dimensions of this study's financial risk early warning model.
In addition, enterprise nature (private or state-owned) and enterprise industry classification (manufacturing or non-manufacturing) were also included as non-financial feature dimensions of our risk early warning model. The purpose of incorporating these two indicators is to explore whether they have a positive enhancing effect on the model.
The selected indicators are presented in
Table 1, and partial financial indicator data is shown in
Table 2 (variable labels will be used in place of indicator names in the following text).
3.3. Indicator Selection
Table 3 presents the descriptive statistics of the data, from which it is evident that the maximum value of the solvency indicator is relatively large. In a multi-indicator evaluation system, different evaluation indicators often possess distinct measurement units and scales due to their varying natures. When there are significant differences in the levels of various indicators, analyzing them directly using their original values would highlight the influence of those with higher numerical values while relatively weakening the impact of those with lower numerical levels. Consequently, to ensure the reliability of the results, it is necessary to standardize the original indicator data.
The correlation coefficient is a metric used to measure the degree of correlation between observed data. Generally, a higher correlation coefficient indicates a stronger correlation.
As can be seen from
Table 4, there is correlation between each pair of financial indicators, mostly at a low or moderate level. However, there is basically no correlation between non-financial indicators and other indicators on a pairwise basis.
The results of the significance analysis shown in
Table 5 indicate that among the ten indicators, only profitability, leverage, turnover, and cash flow indicators exhibit significance at the 0.05 level. This means that, without data transformation, only these four indicators play a decisive role, while the remaining indicators contribute little to the model. There are two primary reasons for this outcome:
The indicators themselves may not be meaningful for the model. Through analysis, we found that non-financial indicators such as enterprise nature and industry classification do not contribute positively to the model. Even from the perspective of correlation, they are negatively correlated with other financial indicators. This suggests that, in terms of enterprise nature, there is no significant difference between state-owned enterprises and private enterprises in determining whether a listed company is ST or non-ST. As for industry classification, there is no need to distinguish between manufacturing and non-manufacturing industries when modeling. This indirectly supports the feasibility of using a general financial risk warning model for enterprise risk prediction.
There is a strong collinearity among financial indicators, and it is necessary to consider removing the multicollinearity between indicator variables. Removing financial indicators would lead to incomplete interpretability, so factor analysis can be used to avoid multicollinearity among financial indicators.
3.4. Factor Analysis
Based on the conclusions drawn from the previous analysis, this paper abandons the two non-financial indicators of enterprise nature and industry classification and constructs a model solely composed of continuous financial indicators. Factor analysis is an extension of principal component analysis (PCA), which is more inclined to describe the correlation between the original variables compared to PCA [
17,
18]. The factor analysis method in SPSS software is used for calculation.
First, a KMO measure and Bartlett's test are conducted on the eight financial indicators. The results are shown in
Table 6:
From the results, we can see that the Bartlett statistic is 486.096, and its corresponding significance probability is 0.000, which is less than the significance level of 0.05. This indicates that the correlation matrix is not an identity matrix, therefore suitable for factor analysis. The KMO value is greater than 0.6, suggesting that the factor analysis results are satisfactory.
Next, using SPSS software, we automatically calculated the eigenvalues and contribution values of each principal component, as detailed in
Table 7:
Taking into account the amount of information represented by the actual indicators and the comprehensiveness of the indicators, we still specify the retention of eight factors. It is believed that these eight common factors reflect the comprehensive information of the original variables. Therefore, factor analysis in this paper only serves the purpose of eliminating collinearity.
Additionally, as can be seen from the scree plot of eigenvalues (
Figure 1), the eigenvalue for Factor 8 is not particularly small. Moreover, the differences between Factors 2 to 8 are similar, making it difficult to justify discarding any one of them. Therefore, it is concluded that retaining all eight factors will not result in any loss of information.
In order to clearly reflect the relationship between the principal component factors and the original variables, we have output the rotated factor loadings as shown in
Table 8:
From
Table 8, it can be observed that the asset growth indicator has a relatively large loading on Factor 1, hence it is named as Asset Growth Factor (F1). The solvency indicator has a significant loading on Factor 2, thus it is designated as Solvency Factor (F2). The profitability indicator exhibits a strong loading on Factor 3, leading to its denomination as Profitability Factor (F3). Similarly, the turnover indicator has a prominent loading on Factor 4, making it Turnover Factor (F4). The earnings indicator is heavily loaded on Factor 5, naming it Earnings Factor (F5). The cash flow indicator displays a significant loading on Factor 6, resulting in its designation as Cash Flow Factor (F6). The liquidity indicator has a strong loading on Factor 7, naming it Liquidity Factor (F7). Finally, the leverage indicator is loaded on Factor 8, naming it Leverage Factor (F8). The results of the factor analysis firmly validate the strategy of selecting these eight factors.
To establish an accurate relationship between the common factors and the indicators, it is necessary to express the common factors as linear combinations of the individual variables. Using the regression method within the factor analysis function of SPSS software, a factor score coefficient matrix can be generated, as shown in
Table 9. This matrix allows us to calculate the factor scores based on the factor score coefficients and the standardized values of the original variables. With these factor scores, further analysis of the financial indicators can be conducted.
3.5. Logistic Regression
In this paper, ST listed companies are coded as 0, and non-ST listed companies are coded as 1, serving as the dependent variable. Using the eight influencing factors identified through factor analysis as independent variables, a Logistic regression analysis is conducted with the assistance of SPSS software. The regression results are presented in
Table 10:
Based on the analysis above, it is evident that each influencing factor in the model is crucial, contributing approximately the same variance. Under such data validation, removing or replacing any factor would result in significant information loss for the model. Therefore, this article opts to establish the model at a significance level of 0.1, retaining all eight influencing factors intact.
When using a cut-off threshold of 0.5, the observation of the model's performance on the sample data is presented in
Table 11:
Table 11 indicates that the Logistic regression model achieves an overall prediction accuracy of 89.7% for the sample data. The model incorporates a comprehensive set of dimensional features and exhibits strong explanatory power, suggesting that its predictive capability is reliable and well-supported.
3.6. Risk Level Classification
Based on the Delphi method, this article divides enterprise financial risks into four categories: A-level representing significant risk, B-level representing moderate risk, C-level representing minor risk, and D-level representing no risk. This classification is considered more practical and widely accepted by relevant personnel in enterprises and institutions based on years of industry experience and qualitative analysis.
To classify financial risks based on these four levels, this article proposes an innovative approach. Drawing on the significance testing perspective proposed by Fisher in statistics, we set 90% (general significance level) and 95% (high significance level) as confidence thresholds. We believe that the accuracy of classifying financial risks as category 0 (ST category) should be determined by finding the corresponding Sigmoid function threshold values at the 95% and 90% confidence levels. Additionally, based on the experimental findings presented earlier and the critical characteristics of the Sigmoid function, we set a threshold (P-value) of 0.5, corresponding to a probability of 89.4% for classifying as category 0 (ST category), as the confidence threshold. We also establish 0% and 100% as the lower and upper bounds of confidence, respectively.
Through repeated experiments, we searched in the direction from 100% to 0% to find the Sigmoid function threshold values corresponding to the 95% and 90% confidence levels. The P-values for these thresholds were determined to be 0.887 and 0.754, respectively. The results are presented in
Table 12:
Therefore, the P-values corresponding to the confidence levels are presented in
Table 13 below:
Based on the comprehensive analysis above, this article classifies the enterprise financial risk levels according to the P-values and the linearly weighted Z-scores. The results of the classification are presented in
Table 14 as follows:
Enterprises with economic strength can transform the model into a dynamic monitoring and insight product, enabling real-time data capture and continuous monitoring of their financial risk status. This enhances the enterprise's resilience and adaptability to macro and micro-environmental risks.
3.7. Testing and Validation
To verify the performance and generalization ability of the proposed financial risk warning model on new datasets, we randomly selected 30 samples from the financial data of ST (including *ST) listed companies in 2019 and another 30 samples from healthy non-ST listed companies, totaling 60 validation sample data sets. Based on the risk level classification criteria proposed in this article, we aim to validate that the predicted probability P-value for ST enterprises in the dataset is less than or equal to 0.5, with a predicted value of 0, classifying them as A-level significant risk. Conversely, for non-ST enterprises, we expect the predicted probability P-value to be greater than or equal to 0.887, with a predicted value of 1, classifying them as D-level risk-free enterprises. This represents the ideal validation outcome.
Following the calculation steps outlined in this article for the general model and its parameters, the prediction accuracy of the 60 validation sample data sets under the specified cut-off thresholds is presented in
Table 15:
Based on the information provided in the previous table, we observe that only two ST companies, namely ST BuSen (002569) and ST SenYuan (002358), were not successfully predicted. Their respective P-values are 0.64 and 0.62, which, according to our classification criteria, categorize them as B-level moderate risk.