Submitted:
31 October 2025
Posted:
03 November 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Related Work
2.1. Traditional Approaches to Fake News Detection
2.2. Deep Learning and LLMs for Fake News Detection
2.3. Explainability in NLP and LLMs
- Attention visualization: Internal attention weights of transformer layers highlight which words the model focuses on (Vig, 2019).
- LIME (Local Interpretable Model-agnostic Explanations): Perturbs input texts and trains local interpretable models to approximate predictions (Ribeiro et al., 2016).
- SHAP (SHapley Additive exPlanations): A game-theoretic method estimating feature contributions to model output (Lundberg & Lee, 2017).
- Integrated Gradients and Layer-wise Relevance Propagation (LRP): Attribute model predictions to specific inputs using backpropagation-based techniques (Sundararajan et al., 2017).
2.4. The Accuracy-Interpretability Tradeoff
3. Proposed Methodology
3.1. Dataset and Preprocessing
- Text Cleaning: Removal of special characters, and excessive whitespace to standardize the text format.
- Handling Missing Data: Identification and appropriate handling of any missing values in the text or label columns.
- Label Encoding: Conversion of categorical labels ('real', 'fake') into numerical format (0, 1) suitable for binary classification.
- Tokenization: Utilization of the specific tokenizer associated with our chosen LLM (e.g., BERT tokenizer) to convert text into subword tokens.
- Encoding: Transformation of tokenized text into input IDs and attention masks, which are the required numerical inputs for the transformer model.
- Data Splitting: The train/test split was maintained, with part of the training set used for, as illustrated in Figure 1.
3.2. Integration of Explainability Techniques
- LIME (Local Interpretable Model-agnostic Explanations): A local interpretability method that quantifies the contribution of individual words or features to a specific prediction (Garreau & von Luxburg, 2020). This allows us to understand why the model classified a particular instance as fake or real.
- BERT Attention Weights: A global, model-intrinsic approach that captures the interactions between tokens across layers and attention heads (Clark et al., 2019). By visualizing these token-to-token relationships, we can observe which parts of the input the model prioritizes when forming its predictions.
3.3. Implementation Details
3.4. Evaluation Metrics
4. Result and Analysis
4.1. Model Performance
4.2. Model Interpretability
5. Conclusions
| 1 | INVALIZARE, Covid-19 Fake News Dataset. Kaggle, 2021. Available: https://www.kaggle.com/datasets/invalizare/covid-19-fake-news-dataset. |
References
- Jacovi, A., & Goldberg, Y. (2020). Towards faithfully interpretable NLP systems: How should we define and evaluate faithfulness? Proceedings of ACL 2020.
- Kaliyar, R. K., Goswami, A., & Narang, P. (2021). FakeBERT: Fake news detection in social media with a BERT-based deep learning approach. Journal of Computational Science, 38, 101545. [CrossRef]
- Lundberg, S. M., & Lee, S.-I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems (NeurIPS).
- Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). “Why should I trust you?”: Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD.
- Samek, W., Montavon, G., Lapuschkin, S., Anders, C. J., & Müller, K.-R. (2021). Explaining deep neural networks and beyond: A review of methods and applications. Pattern Recognition, 107733. [CrossRef]
- Vig, J. (2019). A Multiscale Visualization of Attention in the Transformer Model. ACL Demo.
- Abraham, T. (2025). Leveraging data analytics for detection and impact evaluation of fake news. Humanities and Social Sciences Communications, 12(3), 1–12. https://www.nature.com/articles/s41599-025-05389-4. [CrossRef]
- Al-alshaqi, M. (2025). A survey of large language models in fake news detection. Computers, 14(6), 237. https://www.mdpi.com/2073-431X/14/6/237.
- Cavus, F. (2024). Real-time fake news detection in online social networks. Scientific Reports, 14, 76102. https://www.nature.com/articles/s41598-024-76102-9.
- Harris, J. (2024). Fake news detection revisited: An extensive review. Technologies, 12(11), 222. https://www.mdpi.com/2227-7080/12/11/222.
- Hu, L. (2025). An overview of fake news detection: From a new perspective. Computers & Society, 17(2), 55–70. https://www.sciencedirect.com/science/article/pii/S2667325824000414.
- Rustam, F. (2024). Fake news detection using enhanced features through text transformation. Education and Information Technologies, 29, 1–17. https://link.springer.com/article/10.1007/s10791-024-09490-1. [CrossRef]
- Thakar, A. (2024). Fake news detection: Recent trends and challenges. Social Network Analysis and Mining, 14(1), 1–15. https://link.springer.com/article/10.1007/s13278-024-01344-4. [CrossRef]
- Zhou, X., Wu, J., Zafarani, R. (2020). SAFE: Similarity-aware multi-modal fake news detection. Proceedings of WSDM 2020.
- Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL.
- Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., ... & Stoyanov, V. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach.
- Sundararajan, M., Taly, A., & Yan, Q. (2017). Axiomatic Attribution for Deep Networks. ICML.
- Doshi-Velez, F., & Kim, B. (2017). Towards a rigorous science of interpretable machine learning.
- Nasir, A. N., et al. (2022). Multilingual Fake News Detection using RoBERTa. IEEE Access, 10, 123456.
- Zhou, X., Wu, J., Zafarani, R. (2020). SAFE: Similarity-aware multi-modal fake news detection. WSDM.
- Shu, K., Sliva, A., Wang, S., Tang, J., & Liu, H. (2017). Fake News Detection on Social Media: A Data Mining Perspective. ACM SIGKDD Explorations, 19(1).
- Garreau, D., & Luxburg, U. (2020). Explaining the explainer: A first theoretical analysis of LIME. In S. Chiappa & R. Calandra (Eds.), Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (Vol. 108, pp. 1287–1296). Proceedings of Machine Learning Research. https://proceedings.mlr.press/v108/garreau20a.html.
- lark, K., Khandelwal, U., Levy, O., & Manning, C. D. (2019). What does BERT look at? An analysis of BERT's attention. In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (pp. 276–286). Association for Computational Linguistics. https://aclanthology.org/W19-4828/.
- Patwa, P., Sharma, S., Pykl, S., Guptha, V., Kumari, G., Akhtar, M. S., Ekbal, A., Das, A., & Chakraborty, T. (2020). Fighting an Infodemic: COVID-19 Fake News Dataset. arXiv. [CrossRef]






| Epoch | Training Loss | Validation Loss | Accuracy | Precision | Recall | F1-score |
|---|---|---|---|---|---|---|
| 1 | 0.0010 | 0.1565 | 0.9771 | 0.9840 | 0.9676 | 0.9758 |
| 2 | 0.0024 | 0.1349 | 0.9743 | 0.9662 | 0.9804 | 0.9732 |
| 3 | 0.0036 | 0.1322 | 0.9762 | 0.9792 | 0.9706 | 0.9749 |
| Model | Accuracy (%) | Precision (%) | Recall (%) | F1-Score (%) |
|---|---|---|---|---|
| DT (Decision Tree) | 85.37 | 85.47 | 85.37 | 85.39 |
| LR (Logistic Regression) | 91.96 | 92.01 | 91.96 | 91.96 |
| SVM (Support Vector Machine) | 93.32 | 93.33 | 93.32 | 93.32 |
| GDBT (Gradient Boosted Decision Tree) | 86.96 | 87.24 | 86.96 | 86.96 |
| BERT (Proposed Model) | 97.66 | 97.92 | 97.06 | 97.49 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).