Preprint
Article

This version is not peer-reviewed.

Comparative Performance of Deep Learning Models for Financial Statement Fraud Detection in an Imbalanced Classification Setting

Submitted:

06 January 2026

Posted:

07 January 2026

You are already at the latest version

Abstract
Financial statement fraud continues to pose a significant challenge to audit effectiveness, investor confidence, and the integrity of financial markets. Fraud detection is particularly complex due to the highly imbalanced nature of financial reporting data, where fraudulent observations constitute only a small fraction of the total sample. In such settings, conventional accuracy-based evaluation often produces misleading conclusions and fails to reflect practical audit value. This study conducts a comparative evaluation of four deep learning models, namely LSTM, GRU, CNN1D, and Transformer, for financial statement fraud detection under class-imbalanced conditions. The analysis is based on a dataset of 805 firm-year observations. It adopts Precision–Recall Area Under the Curve as the primary performance metric, complemented by ROC-AUC, Precision, Recall, F1 score, and Specificity. To assess practical usability, Decision Curve Analysis is employed to evaluate the decision-level net benefit of each model across different threshold probabilities, and bootstrap resampling is used to assess performance stability under random data partitioning. The empirical results show that the Transformer model consistently outperforms the other architectures in terms of discriminative ability, robustness, and decision-level utility. Its attention-based structure enables effective modeling of global relationships among financial indicators, leading to stable performance across varying thresholds and data splits. The CNN1D model demonstrates relatively high specificity and a balanced error structure, suggesting its suitability in audit environments where minimizing false positives and controlling verification costs are critical. In contrast, although the LSTM and GRU models exhibit higher sensitivity to fraudulent cases, their lower precision and stability limit their effectiveness as standalone solutions. Overall, the findings emphasize the importance of imbalance-aware, decision-oriented evaluation frameworks for detecting financial statement fraud. The study offers practical insights for auditors and regulators by identifying deep learning models that combine statistical reliability with operational relevance in real-world auditing contexts.
Keywords: 
;  ;  ;  ;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated