ARTICLE | doi:10.20944/preprints202012.0273.v1
Subject: Mathematics & Computer Science, Algebra & Number Theory Keywords: Bootstrap; Bayesian nonparamteric learning; Ensemble Models
Online: 11 December 2020 (10:22:05 CET)
Bootstrap resampling techinques, introduced by Efron and Rubin, can be presented in a general Bayesian framework, approximating the statistical distribution of a statistical functional φ(F), where F is a random distribution function. Efron’s and Rubin’s bootstrap procedures can be extended introducing an informative prior through the Proper Bayesian bootstrap. In this paper different bootstrap techniques are used and compared in predictive classification and regression models based on ensemble approaches, i.e. bagging models involving decision trees. Proper Bayesian bootstrap, proposed by Muliere and Secchi, is used to sample the posterior distribution over trees, introducing prior distributions on the covariates and the target variable. The results obtained are compared with respect to other competitive procedures employing different bootstrap techniques. The empirical analysis reports the results obtained on simulated and real data.
ARTICLE | doi:10.20944/preprints202010.0057.v1
Subject: Social Sciences, Accounting Keywords: multiclass classification; text mining; accounting control system
Online: 5 October 2020 (09:05:53 CEST)
Electronic invoicing has become mandatory for Italian companies since January 2019. Invoices are structured in a predefined xml template where the information reported can be easily extracted and analyzed. The main aim of this paper is to exploit the information structured in electronic invoices to build an intelligent system which can facilitate accountants work. More precisely, this contribution shows how it is possible to automate part of the accounting process: all sent or received invoices of a company are classified into specific codes which represent the economic nature of the the financial transactions. In order to classify data contained in the invoices a machine learning multiclass classification problem is proposed using as input variables the information of the invoices to predict two different target variables, account codes and the VAT codes, which composes a general ledger entry. Different approaches are compared in terms of prediction accuracy. The best performance is achieved considering the hierarchical structure of the account codes.