Application of Artificial Intelligence Model to Identify the Distorted Financial Statements

Distortion of financial statements is recognized as one of the most important issues in the field of accounting and auditing, which is also one of the most common issues today. In this regard, the present research was conducted, in which stock exchange information was used to investigate, predict, and model accounting distortions. For this purpose, financial performance, non-financial metrics, market-based metrics and commitment, or selection items were reviewed over a 6-year period. For collecting data of distorting companies, database of the Society of Certified Public Accountants in Iran was used and the information was analyzed using data mining methods (decision tree, neural networks, and Bayesian method). The results showed that analysis of financial statements҆ information has a high accuracy in determining and identifying the distorted financial statements. Using this information, it is possible to get better acquainted with the methods of document distortion and to take necessary measures in order to control and prevent administrative violations at national and international levels. Given frequent occurrence of these violations, artificial intelligence models can be used to identify these papers. financial statements. The results of the study are consistent with the results of the research conducted by Mashayekhi and Hassanpour (2016) and Hestan et al., (2013) on the use of accruals by companies to distort financial statements. The results also indicated that the neural networks and Bayesian methods are more accurate than the decision tree. In other words, the Tehran Stock Exchange Organization or the Society of Certified Public Accountants can produce useful software to predict companies that may intend to distort financial statements and gain trust of shareholders by increasing their control over them.


Introduction
The detrimental effects of financial scandals in the recent years have increased attention towards the issue of fraud and distortion in financial statements. Accounting misconduct is referred to as any discrepancy between amount, classification, presentation, or disclosure of an item reported in financial statements and amount, classification, presentation, or disclosure of that item in accordance with requirements of the accepted financial reporting framework (international federation of accountants (IFAC), December, 2009: 370). Distortions are the result of mistakes or fraud and a mistake is defined as an inadvertent act in financial statements including omission of an amount or a case of disclosure and fraud is defined as any intentional or fraudulent act by one or more directors or third parties to obtain an unlawful advantage, according to the 2008 report of association of certified fraud examiners (ACFE). It is estimated that fraud schemes involve losses costing $ 1 billion a year. In addition to economic losses ,such as harming creditors, investors, and shareholders, financial scandals have caused political, judicial, and social costs and due to the decrease in reliability of companies' financial statements, it has led to the increased transfer costs and the reduced efficiency of capital market (Prolls, 2011). Rezaei and Riley (2010) argued that extent of fraudulent financial statements has raised public concerns and undermined public trust in financial reporting process and auditing performance. This phenomenon threatens quality, integrity, and reliability of financial reporting process and causes significant economic losses to investors and creditors (Sajjadi and Kazemi, 2016). According to the 2016 report by the ACFE, up to 23 % of the surveyed subjects had lost more than $ 1 million of their capital. Misappropriation of asset is the most common type of fraud, accounting for 84% of total fraud cases on average and causing $ 125,000 losses on average to each economic unit. Fiscal statements, on the other hand, account for only 10 % of all fraud cases, causing $ 975,000 losses per unit on average. Because of importance of this issue, extensive research has been conducted to understand causes, motives, and consequences of financial distortion and profit manipulation (Kim et al., 2016). In particular, determining how financial statements and income are detected is a major concern for research on the field of accounting. Therefore, results of this study, in addition to expanding the research literature on the distorted financial statements can increase confidence of users of financial statements and thus, increase operational efficiency of capital market. In other words, protecting interests of business units҆ stakeholders and helping to improve internal mechanisms of corporate governance as well as securities markets҆ regulators in identifying and predicting the possibility of distortion in financial statements before its occurrence are among reasons for importance of conducting the research. For improving prediction of distortion, research in the field of accounting has focused on application of various statistical methods. Thus, distinguishing features of this research compared to the previous research are as follows: First; in this study, an attempt was made to investigate a large number of financial and non-financial variables studied in the previous research simultaneously. Second; in the previous research, mainly due to lack of publishing names of companies distorting financial statements, the distorted companies have been identified using other criteria, but in this research, database of the Society of Certified Public Accountants was used. Third; in most studies, regression method has been used to predict distortion and other data mining methods, such as decision tree, neural networks, and Bayesian method have not been used frequently. Therefore, this study is conducted to investigate the possibility of predicting accounting distortions using decision tree methods, neural networks, and Bayesian method. To this end, companies with accounting distortions are predicted and identified by analyzing characteristics of companies that have been distorted in the past -according to the four dimensions of accruals, financial performance, non-financial criteria ,and market-based criteria -and reviewing the research conducted by Decho et al., (2011). Variables are classified and tested in three different models. The first model includes variables of accruals and company performance. In the second model, in addition to accruals and company performance; non-financial variables are also considered. In the third model, in addition to the previous variables, market-related variables are added and the following question is addressed: Do the decision tree, neural networks, and Bayesian methods allow predicting the distorted financial statements. Many authors use the term "distortion" for manipulation of financial statements, resulting in unfair presentation of financial position of the business. The official definition of accounting distortion has been provided by IFAC as follows: Any discrepancy between amount, method of classification, presentation or disclosure of an item reported in financial statements and amount, classification, presentation or disclosure of that item in accordance with requirements of the accepted financial reporting framework ( December, 2009,: 370). This definition is also in accordance with Irans҆ Auditing Standard No. 240. Distortions are divided into intentional and unintentional distortions. In other words, distortions are the result of a mistake or fraud, which is an inadvertent act in financial statements including omission of an amount or a case of disclosure and fraud includes any intentional or fraudulent attempt by one or more directors or third parties to obtain an unlawful advantage. Most models of detecting financial distortions face with a dilemma. As can be seen in Fig. 1, the companies with financial distortion (or subset A as fraudulent companies) and the companies that have not been distorted (i.e., A vs. D or classification of companies into two categories of the distorted companies and not-distorted companies (C vs. D where C includes A and B) are presented in this study (Kim et al., 2016).
Reviewing the research literature, it can be concluded that many studies have been done using descriptive and forecasting techniques to predict simultaneous accounting distortions. In this research, prediction techniques were used to detect distortion.

Materials and Methods
This research is applied in terms of purpose and of experimental and post-event type, which is based on real information of financial statements of companies listed in the Tehran Stock Exchange and data mining methods including decision tree, neural networks, and Bayesian learning were selected to analyze the results.

Decision Tree
Learning decision tree is one of the most common deductive inference algorithms that has been successfully utilized in a wide range of applications, from medical diagnoses to assessment of credit risk. The decision tree, whose main purpose is categorizing data, is a model in data mining that provides a tree-like structure for making decisions and determining class of a particular data, like a flowchart. As its name suggests, this tree is composed of a number of nodes and branches (Qaderzadeh et al., 2017). The leaves represent categories, and middle nodes are used to make decisions about one or more specific attributes. An important advantage of the decision tree algorithm is its easy interpretation and comprehensibility. If the tree is allowed to grow indefinitely, in addition to spending a lot of time, it will cause the tree to over-adapt to training data and therefore, the developed decision tree will not be generalizable. Size of the trees can be controlled by stop rules or the tree can be pruned after construction. REPTree algorithm is one of well-known decision tree learning algorithms herein, its good results will be presented for modeling problems҆ data. REPTree algorithm is a fast classification algorithm that is based on gaining information and calculation of entropy.

Neural Networks
Neural networks are one of the most widely used methods for modeling complex and large problems. A neural network is a data processing system that takes ideas from the humans҆ brain and delegates data processing to many small processors acting as interconnected, parallel networks to solve a problem. In these networks, a data structure is designed with the help of programming knowledge that can act like a neuron, called as a node. In the neural network method, the assumption of non-linearity and independence of explanatory variables is eliminated, in which the hidden relationships between explanatory variables enter the function as an additional variable. Neural networks have different architectures. One of the most popular neural network architectures is multilayer perceptron because; it can consider both non-linearity and interaction of variables.

Bayesian Learning
In general, the purpose of Bayesian learning and deductive learning is finding the best hypothesis "h" in the H space using the "D" training set. The best hypothesis is the hypothesis that is most probable with respect to the D dataset. Given this definition, the problem of machine learning will become an attempt to determine the probability of different H hypotheses and to select the highest probability based on D dataset. Conditional probability, P (h | D) is the probability of hypothesis "h" after observing D dataset. Bayes҆ theory, which is given in the following equation, intends to calculate this conditional probability and is the basis of Bayesian learning. The main advantage of this method is that it allows determining the probability for occurrence of a particular event based on a set of actions, and in this way, a clear view of the relationship will be obtained.

Unbalanced Data
Unbalanced datasets are the data, in which distribution of samples between different classes is unbalanced. In other words, in this type of problem, the number of samples in certain categories is much less than other categories. For example, in the problem studied in this paper, the number of distortion class data is much less than normal class data. Machine learning algorithms usually work poorly on the unbalanced data because they can easily provide a model that predicts all samples belonging to a larger class. In other words, due to the lack of data of distorting companies, model output of non-distorting companies is well predicted and it has little ability to predict distorting companies, which is the main purpose of building the model. Therefore, data modification or learning method seems necessary for solving this type of issues.

The Dependent Variable
Distortion in financial statements is considered as dependent variable, which has a qualitative nature and has a nominal scale. For measuring this variable, the value of 1 is assigned to distorting companies and the value of 0 is assigned to other companies. In this research, the database of the Society of Certified Public Accountants was used to identify and classify distorting economic units.

Independent Variables
Independent variables of the research are as follows:

Accruals
For the first time, Heli (1985) stated that profits are distorted through accruals. Therefore, in this study, the issue stating that whether the years, in which the profit is distorted are related to emergence of unusual high accruals or not is investigated. Based on the previous research,

-2) RSST Accruals
The next benchmark, abbreviated as RSST, was introduced in 2005 by Richardson et al., and it was expanded into working capital accruals including changes in long-term operating assets and long-term operating liabilities. This scale is consistent with change in net non-cash assets.

1-3) Decho and Digo Model
AWCit is the change in working capital in this year compared to the last year, 1-CFOit is operating cash flow in the previous year, CFOit is operational cash flow in this year, and CFOit + 1 is operational cash flow in the next year.

1-5) Change in Receivables
It is referred to as change in accounts receivable and documents, accompanied by a distortion to show sales growth and is related to investors. Another set of variables is financial performance of the company, evaluated from different dimensions, and herein, the question of whether managers distort financial statements to cover up poor financial performance is investigated.

2-1) Change in Cash Sales
Ch-cs = SAAR S: Sales and AR: Change in receivables. In this regard, E is net profit and MV is stock markets҆ value.

Population and Statistical Sample
Statistical population of this study consisted of all companies listed in the Tehran Stock Exchange whose information was available during the period of 2009-2015 and were not part of banks and financial institutions of investment companies, financial intermediation holding companies, and leasing companies. According to the above criteria, 189 companies including 21 distorting companies and 168 control companies were selected.
All companies 412 Insurance, investment, banking ,and financial intermediation companies 28 Companies whose end of fiscal year is not equivalent to the end of March 52 Companies whose information was not available 142 Number of the surveyed companies 189 Results Table 1 presents descriptive statistics of the research variables including mean, maximum, minimum, and standard deviation for distorting and control companies. SPSS 23 software was used to compare the two groups. Kolmogorov-Smirnov test was used to check normality of data distribution. Due to non-normality of the research data, Mann-Whitney U test was used to compare the two groups of distorting and control companies. The results showed a statistically significant difference in average financial leverage between the two groups of companies at significance level of 5%. On average, larger financial leverage was reported in distorting companies. Jones et al., (2008) in a study showed that distorting firms have reported less financial leverage, which is probably due to differences in the way foreign companies are financed compared to the Iranian companies. The difference in average residual accruals in all accruals҆ models homogenized by the first assets of the period was significant at significance level of 5%, which is consistent with the research by Jones et al., (2008).
Also, regarding other variables, no significant difference was observed between companies of the two groups, and the criteria of optional accruals calculated in all models were able to predict distortion, and the biggest difference between the distorting and control companies was related to the criteria of Decho and Digo, and McNichols Models (about 68%) and therefore, it seems that these two criteria better differentiate between accruals҆ models in distorting companies.
The software was used for modeling as well as cost-sensitive learning to increase sensitivity of data mining methods used for distortion class. Pruning is also used in the REPTree algorithm to prevent over-compliance. For modeling neural networks, multilayer perceptron method with a latent layer was used. Also, the multilayer perceptron architecture was adjusted with one hidden layer and 4 hidden nodes. Also, learning and momentum rates were considered equal to 0.3 and 0.2, respectively. The number of repetitions in network learning process was set as 500 times. The simple Bayesian method was used to model the data under study.

Results of Testing the First Model
The cost of detecting normal distorting companies for modeling by neural network, Bayesian method, and decision tree in the first model was equal to 65, 50, and 70, respectively, obtained by trial and error. As shown in Table 2, average overall accuracy of the decision tree, neural networks, and Bayesian method was equal to 54, 86, and 87%, respectively. Average correct detection rates were obtained as 48, 62, and 56%, respectively, and subsurface level of rock was measured as 52, 62%, and 55%, respectively. Therefore, the first model of neural network method had the highest ability to predict distortion (with an overall accuracy of 86 %, correct detection rate of 62 %, and area under the curve of 62 %).

Results of Testing the Second Model
Detection cost of normal distorting companies for modeling by decision tree, neural networks, and Bayesian method in the second model was equal to 60, 50 and 70, respectively. As can be seen in Table 3, the overall accuracy of the decision tree, neural networks, and Bayesian method is equal to 62, 80, and 87%, respectively, and underside of the rock is equal to, respectively:

Results of Testing the Third Model
Cost of detecting normal distorting companies for modeling by the neural network, Bayesian method, and decision tree in the third model was equal to 60, 45, and 70, respectively. As can be seen in Table 4, the overall accuracy of the decision tree, neural networks, and Bayesian method is 73, 76, and 86%, respectively, and the subsurface level of rock is equal to 50, 50, and 57%, respectively. Therefore, in the third model, the Bayesian method had the highest ability to predict distortion (with an overall accuracy of 86%, a subsurface level of 57%, and detection accuracy of 57%).

Discussion and Conclusion
Predicting and identifying companies distorting financial statements is one of the most important issues in the field of accounting. Very satisfactory results can be achieved by anticipating and preventing distortion. In this research, the decision tree model, neural networks, and Bayesian method, as data mining methods were used to predict distortion. The results of study showed that the information of financial statements has a high predictive power to predict distortion and therefore, it is possible to predict distortion only using the information of financial statements. In other words, using accruals and financial performance, the data of which can be extracted from financial statements, the distorted financial statements can be identified. Our results showed a relationship between all models of accruals and the distorted financial statements. The results of the study are consistent with the results of the research conducted by Mashayekhi and Hassanpour (2016) and Hestan et al., (2013) on the use of accruals by companies to distort financial statements. The results also indicated that the neural networks and Bayesian methods are more accurate than the decision tree. In other words, the Tehran Stock Exchange Organization or the Society of Certified Public Accountants can produce useful software to predict companies that may intend to distort financial statements and gain trust of shareholders by increasing their control over them. Shareholders and creditors, as individuals who are subject to irreparable damage by distortion can also reduce their losses in this way. Notably, these companies were removed in the selected sample due to nature of intermediary companies, making it impossible to generalize the research findings to all companies and the issue in such companies should be considered