Fake News Detection Through LLM-Driven Text Augmentation Across Media and Languages

Abdul Sittar; Mateja Smiljanic; Alenka Guček; Marko Grobelnik

doi:10.20944/preprints202603.0360.v1

Submitted:

04 March 2026

Posted:

05 March 2026

You are already at the latest version

Abstract

The proliferation of fake news across social media, headlines, and news articles poses major challenges for automated detection, particularly in multilingual and cross-media settings affected by data imbalance. We propose a fake news detection framework based on LLM-driven, feature-guided text augmentation. The method generates realistic synthetic samples across languages, media types, and text granularities while preserving factual structure and stylistic coherence. Experiments with classical and transformer-based models (Random Forest, Logistic Regression, BERT, XLM-R) across social media, headline, and multilingual news datasets show consistent improvements in performance. LLM-based augmentation improves overall accuracy by up to 1.6% over imbalanced baselines and increases minority-class F1-scores by up to 2.4% in low-resource languages such as Swahili. Hybrid fact- and style-based models achieve up to 93.8% accuracy with more balanced class-wise F1-scores and reduced language-related disparities, demonstrating improved robustness and cross-lingual generalization.

Keywords:

fake news detection

;

low-resource languages

;

data imbalance

;

synthetic data generation

;

prompt engineering

;

style-based and fact-based features

Subject:

Computer Science and Mathematics - Artificial Intelligence and Machine Learning

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Fake News Detection Through LLM-Driven Text Augmentation Across Media and Languages

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe