Submitted:
11 June 2025
Posted:
13 June 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
- Comprehensive Benchmarking: We evaluate a range of supervised models, including logistic regression, random forest, LightGBM, ALBERT, and GRU, on the WELFake dataset, employing precision, recall, F1-score, and AUC-ROC for rigorous performance assessment.
- Statistical Validation: We apply pairwise McNemar’s tests with Holm correction to statistically quantify differences in model performance.
- Model Interpretability: We analyze and compare the most influential features identified by each machine learning model, providing insight into the lexical cues associated with fake and real news headlines.
2. Problem Formulation
3. Methods
3.1. Dataset and Preprocessing
3.2. Machine Learning Models
3.3. Deep Learning Models
3.4. Evaluation Metrics
4. Results
4.1. Model Performance
4.2. Pairwise Model Comparison
4.3. Model Interpretation and Feature Importance
5. Discussions
5.1. Interpretation of Findings
5.2. Practical Implications
- Model selection: For rapid, low-resource scenarios, classical models—especially Random Forest—offer an effective, interpretable baseline. However, transformer-based models like ALBERT should be preferred in high-throughput or mission-critical settings where the highest accuracy is required.
- Feature engineering: Consistently high importance of terms like “video,” city names, and emotive language indicates that both content and style are crucial for fake news identification, even with short text.
- Efficiency: The demonstrated strength of headline-only models lowers the barrier for deployment on platforms where only limited or partial data is available.
5.3. Limitations and Future Directions
- Title-only focus: Restricting input to headlines omits richer semantic context found in full articles. Future studies should quantify the incremental benefits of including article bodies or additional metadata.
- Dataset generalizability: Results are specific to the WELFake dataset; extending analysis to other datasets, languages, or domains would test the robustness of these findings.
- Interpretability of deep models: While feature importance is accessible for classical models, neural models remain a “black box.” Incorporating model-agnostic interpretation techniques would enhance transparency.
- Dynamic misinformation: All models were trained and tested statically; future work should explore online learning, domain adaptation, and real-time feedback mechanisms to handle evolving misinformation patterns.
- Emerging models: Large language models and multimodal fusion represent promising avenues for capturing more sophisticated or cross-platform fake news strategies.
References
- T. Abdullah All, E. M. Mahir, S. Akhter, and M. R. Huq, “Detecting Fake News using Machine Learning and Deep Learning Algorithms,” in 2019 7th International Conference on Smart Computing & Communications (ICSCC), pp. 1–6, June 28–30, 2019.
- J. Alghamdi, Y. Lin, and S. Luo, “Towards COVID-19 fake news detection using transformer-based models,” Knowledge-Based Systems, vol. 274, p. 110642, 2023. [CrossRef]
- S. Shah and S. Patel, “A comprehensive survey on fake news detection using machine learning,” Journal of Computer Science, vol. 21, no. 4, pp. 982-990, 2025. [CrossRef]
- Nida, I. U. Khan, F. S. Alotaibi, L. A. Aldaej, and A. K. Aldubaikil, “Fake Detect: A deep learning ensemble model for fake news detection,” Journal of Computer Science, vol. 17, no. 4, pp. 555-584, 2021. [CrossRef]
- P. K. Verma, P. Agrawal, I. Amorim, and R. Prodan, “WELFake: Word Embedding Over Linguistic Features for Fake News Detection,” IEEE Transactions on Computational Social Systems, vol. 8, no. 4, pp. 881–893, Aug. 2021. [CrossRef]
- D. W. Hosmer and S. Lemeshow, Applied Logistic Regression, 2nd ed., New York, NY: John Wiley & Sons, Inc., 2000.
- Y. Zhang, Z. Wang, Z. Ding, Y. Tian, J. Dai, X. Shen, Y. Liu, and Y. Cao, “Tutorial on using machine learning and deep learning models for mental illness detection,” arXiv preprint arXiv:2502.04342, 2025.
- L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001.
- J. H. Friedman, “Greedy function approximation: A gradient boosting machine,” Annals of Statistics, vol. 29, no. 5, pp. 1189–1232, 2001.
- G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y. Liu, “LightGBM: A highly efficient gradient boosting decision tree,” in Proc. 31st Int. Conf. Neural Information Processing Systems (NeurIPS), pp. 3149–3157, 2017.
- Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut, “ALBERT: A lite BERT for self-supervised learning of language representations,” arXiv preprint arXiv:1909.11942, 2020.
- Z. Wang, J. Cheng, C. Cui, and C. Yu, “Implementing BERT and fine-tuned RoBERTa to detect AI generated news by ChatGPT,” arXiv preprint arXiv:2306.07401, 2023.
- S. Xu, Y. Tian, Y. Cao, Z. Wang, and Z. Wei, “Enhancing fake news detection with transformer models and summarization,” Engineering, Technology & Applied Science Research, vol. 15, no. 3, pp. 23253-23259, 2025.
- A. Rahman, S. S. Kumar, and M. S. R. Anwar, “Transformer-based approach for detection and classification of fake news in low-resource languages,” Proceedings of the 2024 International Conference on Natural Language Processing, pp. 112-120, 2024. (Please insert DOI if available.).
- S. Kumar and R. Singh, “A novel approach for early rumour detection in social media using ALBERT,” International Journal of Intelligent Systems and Applications in Engineering, vol. 12, no. 3, pp. 259-265, 2024. [CrossRef]
- K. Cho, B. van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using RNN encoder–decoder for statistical machine translation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1724–1734, 2014.
- D. M. W. Powers, “Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation,” Journal of Machine Learning Technologies, vol. 2, no. 1, pp. 37–63, 2011.
- J. Davis and M. Goadrich, “The relationship between precision-recall and ROC curves,” in Proc. 23rd Int. Conf. on Machine Learning (ICML), pp. 233–240, 2006.
- Z. Ding, Z. Wang, Y. Zhang, Y. Cao, Y. Liu, X. Shen, Y. Tian, and J. Dai, “Trade-offs between machine learning and deep learning for mental illness detection on social media,” Scientific Reports, vol. 15, article no. 14497, 2025.
- Y. Cao, J. Dai, Z. Wang, Y. Zhang, X. Shen, Y. Liu, and Y. Tian, “Machine learning approaches for depression detection on social media: A systematic review of biases and methodological challenges,” Journal of Behavioral Data Science, vol. 5, no. 1, Feb. 2025.
- Y. Huang, L. Zhang, and J. Xu, “Adversarial group linear bandits and its application to collaborative edge inference,” in IEEE INFOCOM 2023 – IEEE Conference on Computer Communications, May 17, 2023, pp. 1–10.
- Y. Wang, Y. Guo, Z. Wei, Y. Huang, and X. Liu, “Traffic flow prediction based on deep neural networks,” in 2019 International Conference on Data Mining Workshops (ICDMW), Nov. 8, 2019, pp. 210–215.

| Models | Macro Average | AUC | ||
| Precision | Recall | F1-Score | ||
| Logistic Regression | 0.84 | 0.84 | 0.84 | 0.92 |
| Random Forest | 0.85 | 0.85 | 0.85 | 0.93 |
| LightGBM | 0.83 | 0.83 | 0.83 | 0.92 |
| ALBERT | 0.92 | 0.93 | 0.93 | 0.98 |
| GRU | 0.80 | 0.80 | 0.90 | 0.96 |
| Model 1 | Model 2 | Winner | Corrected p-value | Significant |
| LightGBM | RandomForest | RandomForest | <0.0001 | Yes |
| LightGBM | LR | LR | 0.33 | No |
| LightGBM | ALBERT | ALBERT | <0.0001 | Yes |
| LightGBM | GRU | GRU | <0.0001 | Yes |
| RandomForest | LR | RandomForest | <0.0001 | Yes |
| RandomForest | ALBERT | ALBERT | <0.0001 | Yes |
| RandomForest | GRU | GRU | <0.0001 | Yes |
| LR | ALBERT | ALBERT | <0.0001 | Yes |
| LR | GRU | GRU | <0.0001 | Yes |
| ALBERT | GRU | ALBERT | <0.0001 | Yes |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).