Preprint
Article

This version is not peer-reviewed.

Beyond Generic Phishing Detection: Explainable AI for Finance-Adapted Models in Banking and Fintech

Submitted:

13 May 2026

Posted:

14 May 2026

You are already at the latest version

Abstract
Purpose: This study examines whether finance-adapted (FA) phishing detection models improve detection of finance-themed (FT) attacks, whether improvements differ across email and webpage modalities, and whether finance adaptation creates a specialisation–generalisation trade-off. Design/Methodology/Approach: A domain-aware framework is developed using email (164,972 instances) and webpage (11,430 instances) datasets. FT and non-finance-themed (NFT) instances are identified using weighted lexicon-based labelling. Generic models are compared with FA models across Logistic Regression, Linear SVC, and Random Forest using F1-score, MCC, balanced accuracy, ROC-AUC, and PR-AUC. Statistical validation employs bootstrap confidence intervals and McNemar's test, while SHAP and permutation importance interpret webpage model behaviour. Findings: FA models outperform generic models in FT email classification, confirming that finance-specific semantic cues improve detection. However, gains are weaker and less consistent in webpage classification, where models rely mainly on structural indicators (page rank, Google index, hyperlinks). Results reveal a specialisation–generalisation trade-off: FA models improve in-domain detection but do not consistently outperform generic models on NFT instances, with F1-score declines of -0.0057 to -0.0151 on non-finance subsets. Practical Implications: Financial institutions and fintech platforms should deploy domain-adapted detection for email-based threats, where finance-specific linguistic cues yield measurable gains, while maintaining generic or ensemble models for broader webpage phishing coverage. Originality/Value: This study introduces a finance-themed, multi-modal, explainable AI framework for phishing detection, demonstrating that domain adaptation depends critically on data modality and feature representation. It provides a novel systematic comparison of generic versus FA phishing detection across both modalities with statistical validation and explainability analysis.
Keywords: 
;  ;  ;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated