Submitted:
27 April 2026
Posted:
29 April 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
1.1. Background and Motivation
1.2. Problem Statement
1.3. Research Objectives
- To engineer a binary classifier capable of differentiating between authentic FFHQ faces and StyleGAN-generated images.
- To implement a two-phase, layer-wise fine-tuning strategy that optimizes AUC and generalization while minimizing computational load.
- To assess performance using standard forensic metrics, including AUC, accuracy, precision, recall, F1-score, and confusion matrix analysis.
- To contextualize the results within current literature and analyze specific failure modes.
1.4. Scope and Limitations
1.5. Paper Organisation
2. Literature Review
2.1. Early and CNN-Based Deepfake Detection
2.2. EfficientNet in Deepfake Detection
2.3. Transfer Learning and Fine-Tuning Strategies
2.4. Research Gap
3. Dataset Description
3.1. Source and Composition
- 70,000 real faces: Sourced from NVIDIA’s Flickr-Faces-HQ (FFHQ) dataset [5]. These high-resolution images () encompass a diverse range of demographics and lighting conditions.
- 70,000 fake faces: Generated via StyleGAN [6] from the "1 Million Fake Faces" collection. While visually convincing, they contain subtle generative artifacts.

3.2. Dataset Characteristics and Forensic Properties
3.3. Data Splits
- Training Set: Used for model parameter updates.
- Validation Set: Used for hyperparameter tuning and overfitting checks.
- Test Set: A held-out collection of 10,905 images (5,492 real, 5,413 fake). This set remained untouched during training to ensure unbiased final evaluation.
4. Data PREPROCESSING
4.1. Cleaning and Validation
4.2. Resizing and Tensor Conversion
4.3. Data Augmentation (Training Only)
- Random Horizontal Flip (p=0.5): Doubles the effective dataset size by mirroring faces.
- Random Rotation (): Helps the model tolerate slight pose variations.
- Colour Jitter (brightness/contrast=0.2): Prevents the model from relying solely on color distribution for classification.
5. Proposed Methodology
5.1. Architecture: EfficientNet-B2 with Binary Head

5.2. Two-Phase Layer-wise Fine-Tuning Strategy
-
Phase 1 – Classification Head Training (Epochs 1–15):The entire backbone was frozen (requires_grad = False). Only the new classification head was trained. This allowed the head to calibrate to the "real vs. fake" distribution without disrupting the backbone’s pre-trained features. We used SGD with a learning rate of 0.01.
-
Phase 2 – Selective Deep Block Fine-Tuning (Epochs 16–55):At epoch 16, we unfroze Blocks 5 and 6, which encode high-level textural and structural features. These layers are best suited for identifying StyleGAN artifacts. Blocks 0 through 4 remained frozen to preserve low-level feature extraction. We reduced the learning rate to 0.001 to prevent catastrophic overwriting of learned weights.

5.3. Theoretical Justification for Block 5 and 6 Selection
5.4. Loss Function and Optimisation
5.5. Phase 2 Implementation
6. Experimental Setup
6.1. Hardware Configuration
- GPU: NVIDIA RTX 3050 (4 GB VRAM), CUDA 11.x
- Optimizations: Enabled CuDNN benchmark mode and high-precision matrix multiplication flags.
- Data Loading: Utilized pinned memory and parallel workers for faster data transfer.
6.2. Software Stack
6.3. Hyperparameter Configuration
| Parameter | Phase 1 | Phase 2 |
|---|---|---|
| Optimiser | SGD (momentum = 0.9) | SGD (momentum = 0.9) |
| Learning Rate () | 0.01 | 0.001 |
| Max Epochs | 15 | 40 |
| Early Stop Patience | 10 (val AUC) | 10 (val AUC) |
| Loss Function | BCEWithLogitsLoss | BCEWithLogitsLoss |
| Batch Size | 128 | 128 |
| Mixed Precision | Yes | Yes |
| Head Dropout | 0.5 | 0.5 |
| Trainable Blocks | Head only | Blocks 5, 6 + Head |
6.4. Evaluation Protocol
7. Results and Analysis
7.1. Headline Test Performance

7.2. Classification Report
7.3. Confusion Matrix
7.4. Training Dynamics – Loss Curves
7.5. Training Dynamics – Accuracy Curves
7.6. Training Dynamics – Validation AUC
7.7. Analysis of Precision-Recall Asymmetry
8. Comparison with State-of-the-Art
| Work | Architecture | Accuracy | AUC | Dataset |
|---|---|---|---|---|
| Rössler [1] | XceptionNet | ∼82% | 0.890 | FF++ |
| Tolosana [4] | ResNet-50 | ∼79% | 0.850 | Multi |
| Coccomini [8] | EffNet+ViT | ∼85% | 0.951 | DFDC |
| Naeem [2] | EffNetV2-B2 | 99.9% | - | 140k |
| Proposed Method | EffNet-B2 | 88.0% | 0.962 | 140k |
9. Discussion
9.1. Impact of the Two-Phase Strategy
9.2. Domain Shift Considerations
9.3. Accessibility and Efficiency
10. Applications
- Identity Verification: Useful for fintech and banking where minimizing false positives is critical for user experience.
- Social Media Moderation: Can flag bot accounts using AI-generated profile pictures.
- Legal Forensics: Serves as a preliminary screening tool for evidence authentication.
- Anti-Phishing: Detects fake personas in targeted email campaigns.
- Journalism: Assists fact-checkers in verifying the source of viral images.
11. Limitations
- The model is specialized for StyleGAN artifacts and may not generalize to diffusion models without retraining.
- It processes static images only and cannot analyze video or audio signals.
- We did not optimize the decision threshold; tuning this could improve recall for the fake class.
12. Conclusions
13. Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Rössler, A.; Cozzolino, D.; Verdoliva, L.; Riess, C.; Thies, J.; Nießner, M. FaceForensics++: Learning to detect manipulated facial images. IEEE/CVF ICCV 2019, 1–11. [Google Scholar]
- Naeem, M.; et al. Refining digital security with EfficientNetV2-B2 deepfake detection techniques. Ain Shams Eng. J. 2025. [Google Scholar]
- Tan, M.; Le, Q. V. EfficientNet: Rethinking model scaling for CNNs. Proc. ICML 2019, 97, 6105–6114. [Google Scholar]
- Tolosana, R.; Vera-Rodriguez, R.; Fierrez, J.; Morales, A.; Ortega-Garcia, J. Deepfakes and beyond: A survey of face manipulation and fake detection. Inf. Fusion 2020, 64, 131–148. [Google Scholar] [CrossRef]
- NVIDIA Corporation. Flickr-Faces-HQ Dataset. 2019. Available online: https://github.com/NVlabs/ffhq-dataset (accessed on 17 April 2026).
- Karras, T.; Laine, S.; Aila, T. A style-based generator architecture for GANs. Proc. IEEE/CVF CVPR 2019, 4401–4410. [Google Scholar]
- Springer, J.; et al. An enhanced deep learning framework for deepfake detection using EfficientNet-B3. Discover Computing 2025. [Google Scholar]
- Coccomini, D. A.; Messina, N.; Gennaro, C.; Falchi, F. Combining EfficientNet and vision transformers for video deepfake detection. arXiv 2022, arXiv:2107.02612. [Google Scholar] [CrossRef]
- Seferbekov, S. DFDC solution – EfficientNet ensemble (AUC: 0.981). 2020. [Google Scholar]
- Violos, J.; Papadopoulos, S.; Kompatsiaris, I. Comparative analysis of compression and transfer learning in deepfake detection. Mathematics 2025, 13(5), 887. [Google Scholar]
- Li, G.; et al. Beyond the benchmark: Generalisation limits of deepfake detectors in the wild. Tech. Rep., UC Berkeley 2024. [Google Scholar]
- Kim, D.; et al. FReTAL: Generalizing deepfake detection using knowledge distillation. Proc. IEEE/CVF CVPRW 2021. [Google Scholar]
- McCloskey, M.; Cohen, N. J. Catastrophic interference in connectionist networks. Psychol. Learn. Motiv. 1989, 24, 109–165. [Google Scholar]
- Zeiler, M. D.; Fergus, R. Visualizing and understanding convolutional networks. Proc. ECCV 2014, 8689, 818–833. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. Proc. IEEE/CVF CVPR 2016, 770–778. [Google Scholar]
- Kaur, P.; et al. UAM-Net: Robust deepfake detection through hybrid attention. Expert Syst. 2025. [Google Scholar]
- Ni, Y.; Zeng, W.; Xia, P.; Tan, R. Deepfake detection via Fourier transform of biological signal. CMC 2024, 79, 5295. [Google Scholar] [CrossRef]




| Class | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| Real (0) | 0.83 | 0.95 | 0.89 | 5,492 |
| Fake (1) | 0.94 | 0.80 | 0.86 | 5,413 |
| Overall Accuracy | 0.88 | 10,905 | ||
| Macro Avg | 0.88 | 0.88 | 0.87 | 10,905 |
| Predicted Real | Predicted Fake | |
|---|---|---|
| Actual Real | 5,229 | 263 |
| Actual Fake | 1,093 | 4,320 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).