Preprint
Article

This version is not peer-reviewed.

The Fragility of Phishing Detection Models: Evidence from Cross-Corpus Transfer, Prevalence Shift, Artifact Learning, and Evasion Risk

Submitted:

18 May 2026

Posted:

20 May 2026

You are already at the latest version

Abstract
Phishing detection models often report strong benchmark performance, yet their reliability under realistic deployment conditions remains uncertain. This study examines this problem by investigating three failure modes of cross-dataset phishing email detection: corpus generalisation failure, asymmetric prevalence-shift failure, and artifact-driven spurious learning. Using six public email corpora, CEAS_08, Enron, Ling, Nazario, Nigerian Fraud, and SpamAssassin, the study evaluates Term Frequency (TF) and Inverse Document Frequency (IDF)-based Logistic Regression and Linear Support Vector Classifier (SVC) models across pooled baseline testing, single-corpus cross-dataset transfer, leave-one-corpus-out pooled training, prevalence-shift simulation, training-prevalence manipulation, dataset-identification analysis, top-feature inspection, artifact-removal ablation, and targeted artifact masking. The findings show that single-corpus models are unstable under cross-dataset transfer, with F1-scores varying substantially across source–target combinations. In contrast, leave-one-corpus-out pooled training improves robustness, with Logistic Regression achieving sustained F1-scores between 0.8201 and 0.8994, and Linear SVC achieving F1-scores between 0.7607 and 0.8910 across unseen corpora. Prevalence-shift experiments reveal that failure is asymmetric and threshold-dependent. High-prevalence-trained models maintain high recall under fixed thresholds but suffer sharp recall degradation when operational alert-budget constraints are imposed. Conversely, low-prevalence-trained models become overly conservative in high-threat environments, producing high precision but substantially lower recall and poorer calibration. Artifact analyses further show that source corpus identity is highly learnable, with dataset-identification accuracy reaching 0.9722 for Logistic Regression and 0.9806 for Linear SVC. Top-feature and masking analyses indicate that models rely partly on corpus markers, date tokens, URL/domain terms, headers, and other artifact-like features rather than only general phishing indicators. The study contributes a deployment-aware and adversary-aware evaluation framework for phishing detection. It shows that benchmark accuracy alone is insufficient for assessing real-world robustness and that reliable phishing detection requires cross-corpus validation, prevalence-aware thresholding, and systematic testing for artifact-driven spurious learning.
Keywords: 
;  ;  ;  ;  ;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated