Preprint
Article

This version is not peer-reviewed.

Adversarially Robust Real-Time Fake News Detection Using RoBERTa with Continuous Learning and Browser-Native Deployment

Submitted:

12 May 2026

Posted:

13 May 2026

You are already at the latest version

Abstract
Automated fake news detection has advanced substantially through transformer-based classification, yet two critical gaps persist in the literature: static models degrade as misinformation tactics evolve, and high-performing systems rarely reach end users in accessible forms. This paper addresses both gaps through a system that couples RoBERTa-based classification with a post-deployment continuous learning pipeline and a browser-native Chrome extension. We curate a corpus of 70,556 unique articles from three established benchmark datasets—ISOT, WELFake, and the COVID-19 Constraint dataset—after eliminating 42.9% of initially gathered samples as cross-dataset duplicates. A systematic comparison of XGBoost (95.88%), DistilBERT (97.74%), and RoBERTa-base (98.51%) establishes the production model, with selection driven primarily by false negative rate: RoBERTa achieves 1.09%, a 69% reduction over XGBoost and 28% over DistilBERT. A documented vulnerability of transformer classifiers is susceptibility to formally-worded misinformation that mimics journalistic style. We construct a dedicated adversarial training set of 70 examples spanning health misinformation, suppression narratives, and election fraud claims, and demonstrate that targeted fine-tuning raises adversarial detection accuracy from approximately 40% to 95.71% while maintaining 98.60% accuracy on standard benchmarks—achieved through experience replay that prevents catastrophic forgetting. For deployment, ONNX INT8 quantization reduces model size from 500MB to 125MB without accuracy loss, enabling inference on free CPU infrastructure. A GitHub Actions pipeline collects fresh labeled articles nightly, and a FastAPI service running on Hugging Face Spaces serves predictions with 150–200 ms latency. A Chrome extension providing paragraph-level hover detection, LIME-based word attribution, source credibility scoring, and multilingual support across 19 languages makes the system accessible to non-technical users. End-to-end evaluation across 50 curated articles yields 98% accuracy; research-backed adversarial testing across seven categories achieves 91.7%, with perfect detection on adversarial attacks, AI-generated misinformation, temporal domain shifts, and multilingual content.
Keywords: 
;  ;  ;  ;  ;  ;  ;  ;  ;  

1. Introduction

Misinformation spreads six times faster than accurate news on social media platforms [1], reaching peak distribution within hours of publication—long before professional fact-checkers can issue corrections. The consequences are concrete and severe: during the COVID-19 pandemic, false claims about cures and vaccine safety measurably increased hesitancy and delayed public health responses; in electoral contexts, fabricated narratives have influenced voter perceptions and eroded institutional trust. The volume and velocity of online content production make manual verification operationally infeasible at scale, motivating automated detection systems capable of assessing credibility at the point of consumption.
Transformer-based language models have dramatically improved automated text classification, and their application to fake news detection has produced accuracy figures routinely exceeding 95% on standard benchmark datasets [2]. Yet two problems receive insufficient attention in the literature. First, temporal drift: reported evaluations typically occur on static, in-distribution test sets that do not reflect the continuous evolution of misinformation tactics in deployment. A model trained on 2017–2021 content and evaluated on 2026 content faces a distribution shift that static benchmarks cannot capture. Second, deployment accessibility: high-performing research systems rarely translate into tools accessible to the general public, limiting their societal impact to indirect effects through platform-level implementations that users cannot directly inspect or interact with.
Beyond these, a more subtle vulnerability has emerged: adversarially crafted content using formal academic language and investigative journalism conventions can evade transformer classifiers that learned to associate professional register with legitimate news [3]. This creates a paradox in which the misinformation most likely to mislead educated readers is also the misinformation most likely to evade automated detection.
This paper addresses all three problems. The contributions are:
  • A rigorous comparative study of XGBoost, DistilBERT, and RoBERTa on a 70,556-article deduplicated corpus, with false negative rate as the primary selection criterion—a deliberate departure from accuracy-centric evaluation that better reflects the asymmetric cost of missed fake news.
  • A post-deployment continuous learning system comprising a nightly GitHub Actions scraper, experience replay fine-tuning, automatic evaluation gating, and versioned deployment—the first such pipeline reported for fake news detection to our knowledge.
  • An adversarial training procedure that constructs formally-worded misinformation examples and demonstrates 95.71% adversarial accuracy versus approximately 40% for the base model, with no degradation on standard benchmarks.
  • Production deployment via ONNX INT8 quantization, FastAPI hosting on free CPU infrastructure, and a Chrome extension providing paragraph-level detection, LIME explainability, source credibility fusion, multilingual support, and OCR-based image text analysis.
  • Comprehensive evaluation across 50 curated articles (98% accuracy) and a seven-category adversarial test suite (91.7% overall, 100% in five of seven categories).
The remainder of this paper is organized as follows. Section 2 reviews related work. Section 3 describes dataset construction. Section 4 presents model architectures and training. Section 5 details the continuous learning pipeline. Section 6 covers deployment. Section 7 presents evaluation. Section 8 concludes.

3. Dataset Construction

3.1. Source Datasets

The training corpus draws from three complementary benchmark datasets. The ISOT dataset [12] comprises 44,898 articles—authentic news sourced from Reuters.com and fabricated content flagged by PolitiFact— providing a strong foundation for political misinformation detection. WELFake [13] aggregates 72,134 articles from four independent sources (Kaggle, McIntire, Reuters, and BuzzFeed Political), specifically designed to prevent classifier overfitting to single-source artifacts. The COVID-19 Constraint dataset [14] contributes 10,700 social media posts from Twitter and Facebook addressing pandemic misinformation, introducing a distinct register and temporal context absent from the news article datasets. Table 1 summarizes dataset characteristics.

3.2. Preprocessing and Deduplication

Raw text underwent standard preprocessing: HTML tags, URLs, and special characters were stripped; whitespace was normalized; encoding was standardized. Crucially, exact deduplication after normalization identified 52,980 duplicate articles—42.9% of the initial corpus—arising primarily from Reuters content appearing independently in both ISOT and WELFake. Deduplication is essential not merely for data quality but for evaluation integrity: models tested on articles seen during training exhibit inflated metrics that do not generalize to deployment. After filtering, the final corpus comprises 70,556 unique articles with near-perfect class balance: 50.8% real (35,841) and 49.2% fake (34,715).
Table 2. Corpus statistics after deduplication and preprocessing
Table 2. Corpus statistics after deduplication and preprocessing
Statistic Value Detail
Total unique articles 70,556 After deduplication
Real articles 35,841 (50.8%) Label: 0
Fake articles 34,715 (49.2%) Label: 1
Duplicates removed 52,980 (42.9%) Cross-dataset overlap
Median article length 2,283 chars Range: 50–12,000
Temporal range 2016–2021 Six-year span
Training split (70%) 49,388 Stratified
Validation split (15%) 10,584 Stratified
Test split (15%) 10,584 Stratified
The 70/15/15 train-validation-test split was applied through stratified sampling, preserving class balance across all partitions. This avoids the evaluation artifacts that arise when class proportions differ between training and evaluation subsets.

4. Model Architecture and Training

4.1. Comparative Study

Three architectures spanning the range from traditional machine learning to full-scale transformer models were evaluated, enabling principled selection rather than assumption of transformer superiority.

4.1.1. XGBoost with TF-IDF Features

Text was represented using TF-IDF vectorization with 35,000 features incorporating unigrams, bigrams, and trigrams with sublinear scaling. XGBoost was trained using histogram-based gradient boosting with early stopping (patience = 100 rounds), tree depth 6, and learning rate 0.1, running for 1,058 rounds before convergence.

4.1.2. DistilBERT

DistilBERT [15] compresses BERT through knowledge distillation to 67M parameters across six transformer layers, retaining 97% of language understanding at 40% of the parameter count. Fine-tuning used a learning rate of 2 × 10 5 , batch size 16, AdamW optimizer with weight decay 0.01, and linear warmup schedule over five epochs with maximum sequence length 512 tokens.

4.1.3. RoBERTa-Base

RoBERTa-base employs twelve transformer encoder layers, 768-dimensional hidden states, twelve attention heads, and 125M parameters. Its byte-pair encoding tokenizer maintains a vocabulary of 50,265 tokens—66% larger than DistilBERT’s WordPiece vocabulary of 30,522—providing finer-grained subword representations for the varied linguistic register of news content.
Fine-tuning applied label smoothing ( ε = 0.1 ) to prevent overconfidence [7], transforming hard labels into soft targets:
y i , c smooth = ( 1 ε ) · y i , c + ε C , L = i c y i , c smooth log y ^ i , c
with ε = 0.1 and C = 2 classes. A cosine annealing learning rate schedule with 10% linear warmup provided stable convergence. Gradient accumulation over two steps yielded effective batch size 32. Dropout (0.1) and gradient clipping (max norm 1.0) provided additional regularization.

4.2. Results and Model Selection

Table 3 presents comprehensive results on the held-out test set of 10,584 articles. Figure 1 visualizes the key metrics.
Model selection rationale. RoBERTa was selected for production not solely on accuracy grounds but on false negative rate, the primary safety-critical metric. In a deployment processing one million articles, RoBERTa would miss approximately 10,900 fake articles versus 35,100 for XGBoost—a difference with direct consequences for users exposed to undetected misinformation. The training cost of 409 minutes is incurred once; subsequent fine-tuning cycles complete in 10–15 minutes.
The performance differential between DistilBERT and RoBERTa reflects the contribution of pre-training scale. RoBERTa’s larger vocabulary enables more precise tokenization of neologisms and domain-specific terminology prevalent in political and health misinformation, while its 160 GB pre-training corpus produces richer contextual representations of the subtle language patterns characteristic of sophisticated fake news.

4.3. RoBERTa Training Dynamics

Table 4 documents training progression across five epochs. Figure 2 shows the corresponding loss and accuracy curves.
Validation loss stabilizes at approximately 0.23 after the second epoch, indicating effective convergence. The 1.19% generalization gap between training accuracy (99.86%) and test accuracy (98.51%) is consistent with well-regularized transformer fine-tuning and confirms the model generalizes rather than memorizes the training distribution.

5. Continuous Learning Pipeline

5.1. Motivation and Architecture

A static fake news detector trained on historical content faces inevitable performance degradation as misinformation tactics and topical domains evolve. The gap between training distribution and deployment distribution widens continuously in a field where adversarial content creators actively adapt to detection systems. We address this through an automated continuous learning system requiring no manual intervention after initial deployment. Figure 3 illustrates the complete pipeline architecture.

5.2. Automated Data Collection

A GitHub Actions workflow executes nightly at 22:00 IST, collecting approximately 420 articles per run from two source categories:
Real news: RSS feed parsing from verified outlets including Reuters, BBC, Al Jazeera, NPR, The Hindu, and NDTV. Articles are labeled REAL (label=0) based on the established editorial standards of the source.
Fake news claims: The Google Fact Check Tools API returns claims rated as false, misleading, or inaccurate by IFCN-certified fact-checking organizations. Claims are labeled FAKE (label=1) based on this multi-authority verification.
Collected articles are deduplicated against the existing corpus by URL matching and committed automatically to the data repository. Over the 13 operational nights documented in this work, the pipeline collected 1,264 unique new articles without manual intervention, spanning real-world events in April–May 2026.

5.3. Experience Replay Fine-Tuning

Naïve fine-tuning on new data alone risks catastrophic forgetting—the overwriting of previously learned representations by the update distribution. We apply experience replay: each update cycle combines new scraped articles with a 2,000-sample random subset of the original 70,556-article corpus, 70 adversarial training examples (Section 6), and 217 health misinformation examples, yielding approximately 3,551 total training articles per cycle.
Fine-tuning employs a conservative learning rate of 5 × 10 6 over two epochs, preventing large parameter updates while permitting meaningful adaptation. The learning rate is deliberately lower than initial fine-tuning ( 2 × 10 5 ) to avoid destabilizing the established decision boundary.

5.4. Model Versioning and Deployment

Each candidate model is evaluated against held-out validation sets before deployment. Deployment proceeds only if accuracy 97 % and false negative rate 2.0 % ; otherwise the previous version is automatically retained. Approved models upload to Hugging Face Hub, from which the production API downloads the latest checkpoint on restart. Table 5 tracks the version history.
V2’s improvement on fresh content (99.58% versus no measurement for V1) validates that experience replay successfully transfers to content beyond the original 2016–2021 training distribution. The FNR reduction from V1 (1.09%) to V2 on fresh content (0.24%) demonstrates that the model learns to detect contemporary misinformation patterns that differ from historical training examples.

6. Adversarial Robustness

6.1. Vulnerability Analysis

A fundamental limitation of transformer classifiers trained on conventional fake news datasets is their learned association between formal register and legitimate news. Training sets such as ISOT and WELFake contain fake news characterized by sensationalist language, conspiracy markers, and vague attribution—stylistic features the model learns to flag. Content that mimics academic or investigative conventions while conveying false claims evades detection entirely. Initial evaluation on a deliberately constructed adversarial set confirmed this: accuracy was approximately 40%, rendering the model no better than chance on sophisticated fake news.

6.2. Adversarial Dataset Construction

We constructed a dataset of 70 adversarial examples across five categories, paired with 25 real news examples to maintain balance:
Formal-language misinformation (15 examples): Fake claims expressed through the stylistic conventions of scientific literature: “Medical professionals have confirmed that 5G wireless networks are responsible for coronavirus spread.” These mimic academic attribution without verifiable sources.
Suppression narratives (15 examples): Claims structured around institutional concealment: “Documents obtained by this publication reveal that senior officials systematically concealed vaccine adverse event data.” The investigative journalism format lends false credibility.
Election misinformation (10 examples): False electoral claims using the language of documentation and eyewitness testimony: “Statistical analysis proves vote counts were manipulated in key swing states.”
Science denialism (10 examples): Formally expressed rejections of scientific consensus that avoid the crude markers of obvious conspiracy content.
Technology misinformation (10 examples): Fabricated health claims about consumer technology using the language of suppressed research.

6.3. Adversarial Fine-Tuning Results

Fine-tuning V3 on the combined adversarial dataset plus experience replay (total 3,551 examples, learning rate 5 × 10 6 , 2 epochs) produced V3-Robust. Table 6 documents the before-after comparison.
The result is striking: adversarial accuracy improves from approximately 40% to 95.71%—a 55 percentage point gain—while accuracy on the original and fresh test sets improves marginally (+0.34% and +0.08% respectively). Experience replay is the mechanism preventing catastrophic forgetting: without it, fine-tuning on 70 adversarial examples would dominate the gradient signal and degrade standard performance.
Figure 4. Adversarial fine-tuning results: V3 versus V3-Robust. (a) Overall accuracy comparison showing the 55.7 percentage point gain on adversarial content while standard test accuracy is maintained. (b) Per-category breakdown across five adversarial misinformation types.
Figure 4. Adversarial fine-tuning results: V3 versus V3-Robust. (a) Overall accuracy comparison showing the 55.7 percentage point gain on adversarial content while standard test accuracy is maintained. (b) Per-category breakdown across five adversarial misinformation types.
Preprints 213194 g004

7. System Deployment

7.1. ONNX INT8 Quantization

The 500 MB PyTorch RoBERTa model requires GPU infrastructure for real-time inference, making free-tier hosting infeasible. ONNX (Open Neural Network Exchange) export followed by dynamic INT8 quantization—converting 32-bit float weights to 8-bit integers—reduces model size from 500 MB to 125 MB (74.8% reduction). Inference latency on commodity CPU measures 150–200 ms per prediction, within the sub-second threshold for interactive use, while accuracy on the original test set remains at 97%.

7.2. FastAPI Backend

A REST API deployed on Hugging Face Spaces serves predictions continuously. Beyond the ONNX classifier, the prediction pipeline applies three sequential components:
Temperature scaling ( T = 1.5 ) calibrates raw logits before softmax, producing more reliable confidence estimates: p ^ = softmax ( z / T ) .
Pattern detection evaluates 12 linguistic categories including unverified statistical claims, vague source attribution (“anonymous sources”, “whistleblowers”), sensationalist language, conspiracy markers, suppression narratives, and leading-question formats. Detected patterns adjust a suspicion score that contributes 20% of the final verdict weight.
Source credibility fusion maintains a database of 200+ domain credibility ratings, combining model confidence (60%), credibility score (20%), and Google Fact Check API verification (20%) into the final score.

7.3. Chrome Browser Extension

The extension (submitted to Chrome Web Store, pending review) operates across all websites using Manifest V3. The primary modality is paragraph-level hover detection: text is extracted when a cursor rests on a paragraph for 800 ms, transmitted to the API, and the result displayed as a floating popup with verdict, confidence bar, source credibility rating, and detected patterns. Figure 5 shows the user interface.
Secondary modalities include: text selection tooltip for on-demand analysis of highlighted text; right-click OCR invocation of Google Cloud Vision API for image-embedded text extraction; and multilingual support via Google Translate API covering 19 languages with detected language shown in the UI. Smart site exclusion prevents spurious analysis on video platforms, email clients, and development tools. Figure 6 provides the complete system architecture.

8. Evaluation

8.1. Benchmark Performance

The production model—V3-Robust in ONNX INT8 format—was evaluated against the held-out test set of 10,584 articles unseen in any training phase. Accuracy of 98.60% was achieved with FNR 1.61% and FPR 2.46%. F1-score of 0.9860 confirms balanced performance. These figures are consistent with state-of-the-art results on comparable benchmark datasets while reflecting the additional robustness challenge of adversarial training exposure.

8.2. 50-Article Curated Evaluation

A manually curated evaluation set of 50 articles was constructed across four categories to assess system-level performance beyond standard benchmarks: obvious fake news employing sensationalist language; subtle fake news using academic register; verified real news from established outlets; and borderline cases presenting genuine classification ambiguity. Table 7 presents results.
The single misclassified article is a whistleblower claim about workplace discrimination—a case where even professional fact-checkers disagree on ground truth, making the misclassification an expected outcome of any binary system.

8.3. Research-Backed Seven-Category Adversarial Testing

Drawing from published adversarial benchmarking frameworks [8], a second evaluation constructed 36 test cases across seven categories representing the range of challenges documented in the fake news detection literature. Table 8 presents results; Figure 7 visualizes the performance profile.
Discussion of failures. The three remaining misclassifications represent distinct and acknowledged limitation classes. The source credibility failure arises from a deliberate design trade-off: the trusted-source override prevents pattern-based mislabeling of legitimate institutional content on high-credibility domains, at the cost of occasionally failing to flag suspicious content on those same domains. The two edge case failures involve a statistics-only text without narrative context and a leading-question format (“Could vaccines cause autism?”)—cases that present genuine classification difficulty even for human annotators, as the claims are neither affirmative nor clearly falsifiable in the absence of surrounding context.

8.4. Improvement Trajectory

Table 9 tracks system improvement across the development iterations on the seven-category adversarial test suite, illustrating the contribution of each component.
Figure 8. System improvement trajectory. (a) Overall accuracy on the seven-category adversarial test suite: 66.7% (V3) → 86.1% (V3-Robust) → 91.7% (final system). (b) Per-category improvement in percentage points; adversarial robustness shows the largest gain (+60 pp), followed by multilingual content (+50 pp).
Figure 8. System improvement trajectory. (a) Overall accuracy on the seven-category adversarial test suite: 66.7% (V3) → 86.1% (V3-Robust) → 91.7% (final system). (b) Per-category improvement in percentage points; adversarial robustness shows the largest gain (+60 pp), followed by multilingual content (+50 pp).
Preprints 213194 g008

9. Discussion

9.1. The False Negative Priority

The central design decision in this work—prioritizing false negative rate over accuracy as the selection criterion—deserves explicit justification. In misinformation detection, the two error types carry asymmetric costs. A false positive (legitimate news flagged as fake) creates momentary friction that an informed user can resolve by seeking a second source. A false negative (fake news classified as real) exposes the user to undetected misinformation, potentially influencing beliefs and decisions without their awareness. This asymmetry is particularly severe for the formally-worded fake news that motivated our adversarial training: it is precisely the content most likely to be trusted by educated readers and least likely to trigger skepticism.

9.2. Practical Contributions and Limitations

The continuous learning pipeline addresses temporal drift without human intervention, but its effectiveness over extended periods requires longitudinal study. The 13-night observation window in this work validates operational reliability; it does not provide evidence about performance after months of deployment. The experience replay strategy prevents catastrophic forgetting in the short term, but the optimal replay ratio and update frequency may require tuning as the cumulative corpus grows.
The ONNX quantization approach achieves free-tier deployment at the cost of a modest accuracy reduction (98.6% PyTorch to 97% ONNX). For applications where that gap is critical, GPU inference remains available at increased hosting cost.
The Chrome extension’s broad host permissions (<all_urls>) are necessary for the system to function across the full range of news sources and social media platforms where misinformation appears—a restriction to specific domains would eliminate coverage of precisely the novel sites where misinformation tends to originate. This permission necessitates extended review by the Chrome Web Store, which is an acknowledged friction point for deployment.

9.3. Comparison with Related Work

Relative to Aletheia [11], which achieves evidence-grounded explanations through RAG but requires paid LLM infrastructure, the present system achieves comparable functionality through quantized inference at zero hosting cost. The LIME-based attribution provided by this work, while less evidence-rich than RAG explanations, operates at 150–200 ms versus the several seconds typical of LLM generation, making it more compatible with natural reading pace.
Relative to prior adversarial robustness work [3,9], this paper contributes a concrete and reproducible adversarial fine-tuning procedure—constructing a small, targeted adversarial set rather than relying on automated perturbation—and validates it on a production system rather than a research prototype.

10. Conclusion

This paper presented a system for real-time fake news detection that addresses three documented gaps in the literature: static model decay, adversarial vulnerability to formal-language misinformation, and the absence of accessible deployment for non-technical users.
On a 70,556-article corpus, RoBERTa-base achieves 98.51% accuracy with a 1.09% false negative rate—a 69% reduction over XGBoost chosen as the primary selection criterion. Adversarial fine-tuning raises detection accuracy on formally-worded fake news from approximately 40% to 95.71% while maintaining standard benchmark performance, enabled by experience replay that prevents catastrophic forgetting. ONNX INT8 quantization enables inference on free CPU infrastructure, and a continuously operating nightly scraper provides fresh labeled data without human intervention. End-to-end evaluation yields 98% accuracy on curated real-world content and 91.7% across seven adversarial categories, with 100% accuracy in five of seven.
The system’s principal limitations are the 3% accuracy reduction from ONNX quantization relative to PyTorch inference, the short observation window for the continuous learning pipeline, and residual difficulty on context-free edge cases (short texts, statistics-only content, leading-question format). These represent concrete directions for future work alongside multimodal extension to images and video, direct multilingual modeling without translation, and a longitudinal study of pipeline effectiveness over extended deployment.

Data Availability Statement

The production API is publicly accessible at https://anantsingh12-fakenews-detector.hf.space. The trained model weights are available at https://huggingface.co/AnantSingh12/fakenews-roberta-v2. The data collection pipeline is available at https://github.com/AnantSingh12/fake-news-detection-pipeline.

Acknowledgments

The authors thank the open-source communities behind Hugging Face Transformers, PyTorch, and the ONNX runtime for the tools that made this work possible. The ISOT, WELFake, and COVID-19 Constraint dataset creators are acknowledged for making their data publicly available. Training was conducted on Google Colab using NVIDIA Tesla T4 GPU hardware. The Google Fact Check Tools API was instrumental in the automated data collection pipeline.

References

  1. Vosoughi, S.; Roy, D.; Aral, S. The spread of true and false news online. Science 2018, vol. 359(no. 6380), 1146–1151. [Google Scholar] [CrossRef] [PubMed]
  2. Kaliyar, R. K.; Goswami, A.; Narang, P. FakeBERT: Fake news detection in social media with a BERT-based deep learning approach. Multimed. Tools Appl. 2021, vol. 80(no. 8), 11765–11788. [Google Scholar] [CrossRef] [PubMed]
  3. Xu, C.; Pan, F.; Qiu, Y. Fake news detectors are biased against texts generated by large language models. arXiv 2023, arXiv:2309.08674. [Google Scholar] [CrossRef]
  4. Shu, K.; Sliva, A.; Wang, S.; Tang, J.; Liu, H. Fake news detection on social media: A data mining perspective. ACM SIGKDD Explor. Newsl. 2017, vol. 19(no. 1), 22–36. [Google Scholar] [CrossRef]
  5. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 5998–6008. [Google Scholar]
  6. Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. Proc. NAACL-HLT, 2019; pp. 4171–4186. [Google Scholar]
  7. Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. RoBERTa: A robustly optimized BERT pretraining approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
  8. Przybyła, P.; Shardlow, A.; Zerva, S.; Nawaz, M.; Ives, M.; Ananiadou, S.; Procter, R. BODEGA: Benchmark for adversarial example generation in credibility assessment. Proc. LREC-COLING, 2024; pp. 15411–15422. [Google Scholar]
  9. Tahmasebi, S.; Müller-Budack, E.; Ewerth, R. Robust fake news detection using large language models under adversarial sentiment attacks. arXiv 2025, arXiv:2601.15277. [Google Scholar]
  10. McCloskey, M.; Cohen, N. J. Catastrophic interference in connectionist networks: The sequential learning problem. In in Psychology of Learning and Motivation; 1989; vol. 24, pp. 109–165. [Google Scholar]
  11. Sallami, D.; Aïmeur, E. Verify as you go: An LLM-powered browser extension for fake news detection. arXiv 2026, arXiv:2603.05519. [Google Scholar] [CrossRef]
  12. Ahmed, H.; Traore, I.; Saad, S. Detection of online fake news using n-gram analysis and machine learning techniques. Proc. ISDCS, 2017; pp. 127–138. [Google Scholar]
  13. Verma, P. K.; Agrawal, P.; Amorim, I.; Prodan, R. WELFake: Word embedding over linguistic features for fake news detection. IEEE Trans. Comput. Soc. Syst. 2021, vol. 8(no. 4), 881–893. [Google Scholar] [CrossRef]
  14. Patwa, P.; Sharma, S.; Pykl, S.; Guptha, V.; Kumari, G.; Akhtar, M. S.; Ekbal, A.; Das, A.; Chakraborty, T. Fighting an infodemic: COVID-19 fake news dataset. Combat. Online Hostile Posts Reg. Lang. Dur. Emerg. Situat. 2021, 21–29. [Google Scholar]
  15. Sanh, V.; Debut, L.; Chaumond, J.; Wolf, T. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv 2019, arXiv:1910.01108. [Google Scholar]
Figure 1. Comparative model performance across XGBoost, DistilBERT, and RoBERTa. (a) Accuracy, Precision, Recall, and F1-Score. (b) False negative counts showing 69% reduction from XGBoost to RoBERTa. (c) False negative rate (3.51% → 1.52% → 1.09%).
Figure 1. Comparative model performance across XGBoost, DistilBERT, and RoBERTa. (a) Accuracy, Precision, Recall, and F1-Score. (b) False negative counts showing 69% reduction from XGBoost to RoBERTa. (c) False negative rate (3.51% → 1.52% → 1.09%).
Preprints 213194 g001
Figure 2. RoBERTa-base training dynamics and test set performance. (a) Loss curves: validation loss stabilizes at 0.23 after epoch 2. (b) Accuracy progression with 1.19% generalization gap. (c) Confusion matrix: 57 false negatives (FNR = 1.09%), 101 false positives (FPR = 1.88%).
Figure 2. RoBERTa-base training dynamics and test set performance. (a) Loss curves: validation loss stabilizes at 0.23 after epoch 2. (b) Accuracy progression with 1.19% generalization gap. (c) Confusion matrix: 57 false negatives (FNR = 1.09%), 101 false positives (FPR = 1.88%).
Preprints 213194 g002
Figure 3. Continuous learning pipeline model version progression. (a) Accuracy on original (2016–21) and fresh (2024–26) test sets across all four versions. (b) False negative rate trajectory showing fresh test FNR reduction from 0.24% (V2) to 0.40% (V3-Robust).
Figure 3. Continuous learning pipeline model version progression. (a) Accuracy on original (2016–21) and fresh (2024–26) test sets across all four versions. (b) False negative rate trajectory showing fresh test FNR reduction from 0.24% (V2) to 0.40% (V3-Robust).
Preprints 213194 g003
Figure 5. Chrome extension user interface. Left: hover popup showing FAKE verdict at 99.68% confidence with detected suspicious patterns. Right: detailed analysis sidebar with LIME word attribution, source credibility score, and web verification results.
Figure 5. Chrome extension user interface. Left: hover popup showing FAKE verdict at 99.68% confidence with detected suspicious patterns. Right: detailed analysis sidebar with LIME word attribution, source credibility score, and web verification results.
Preprints 213194 g005
Figure 6. End-to-end system architecture spanning four layers: data collection (GitHub Actions nightly scraper), training (experience replay fine-tuning with ONNX quantization), inference (FastAPI on Hugging Face Spaces), and user interface (Chrome extension with multilingual support).
Figure 6. End-to-end system architecture spanning four layers: data collection (GitHub Actions nightly scraper), training (experience replay fine-tuning with ONNX quantization), inference (FastAPI on Hugging Face Spaces), and user interface (Chrome extension with multilingual support).
Preprints 213194 g006
Figure 7. System performance profile across seven adversarial testing categories. V3-Robust (dark polygon) achieves 100% on five of seven axes versus V3 baseline (gray polygon) at 66.7% overall, demonstrating targeted improvements without regression on previously strong categories.
Figure 7. System performance profile across seven adversarial testing categories. V3-Robust (dark polygon) achieves 100% on five of seven axes versus V3 baseline (gray polygon) at 66.7% overall, demonstrating targeted improvements without regression on previously strong categories.
Preprints 213194 g007
Table 1. Source datasets and their characteristics
Table 1. Source datasets and their characteristics
Dataset Total Real Fake Year Type
ISOT 44,898 21,417 23,481 2017 News
WELFake 72,134 35,028 37,106 2021 News
COVID-19 Constraint 10,700 5,600 5,100 2021 Social
Combined 127,732 62,045 65,687 2016–21 Mixed
Table 3. Comparative performance on test set (10,584 samples)
Table 3. Comparative performance on test set (10,584 samples)
Metric XGBoost DistilBERT RoBERTa
Accuracy 95.88% 97.74% 98.51%
Precision 95.21% 96.98% 98.08%
Recall 96.49% 98.48% 98.91%
F1-Score 95.84% 97.72% 98.49%
False Negatives 183 79 57
FN Rate 3.51% 1.52% 1.09%
False Positives 253 160 101
Parameters 67 M 125 M
Training Time 59 min 207 min 409 min
Model Size 100 MB 268 MB 500 MB
Table 4. RoBERTa training progression across five epochs
Table 4. RoBERTa training progression across five epochs
Epoch Train Loss Train Acc Val Loss Val Acc Time (min)
1 0.3163 91.55% 0.2538 97.35% 81.8
2 0.2319 98.25% 0.2298 98.35% 81.9
3 0.2143 99.22% 0.2303 98.55% 81.9
4 0.2052 99.69% 0.2299 98.65% 81.9
5 0.2019 99.86% 0.2291 98.67% 81.9
Table 5. Model version performance across test sets
Table 5. Model version performance across test sets
Version Original (2016–21) Fresh (2024–26) FNR Old FNR Fresh
V1 (baseline) 98.51% 1.09%
V2 (replay) 98.53% 99.58% 1.15% 0.24%
V3 (nightly) 98.26% 99.52% 1.44% 0.59%
V3-Robust (adv.) 98.60% 99.60% 1.61% 0.40%
Table 6. Adversarial fine-tuning results: V3 versus V3-Robust
Table 6. Adversarial fine-tuning results: V3 versus V3-Robust
Test Set V3 V3-Robust Change
Original (2016–21) 98.26% 98.60% + 0.34 %
Fresh (2024–26) 99.52% 99.60% + 0.08 %
Adversarial (70 ex.) ≈40% 95.71% + 55 %
Table 7. 50-article curated evaluation results by category
Table 7. 50-article curated evaluation results by category
Category Correct / Total Accuracy Notes
Obvious fake news 10 / 10 100% Sensationalist language
Subtle fake news 10 / 10 100% Academic register
Real news (verified) 20 / 20 100% Established outlets
Borderline cases 9 / 10 90% Inherent ambiguity
Overall 49 / 50 98%
Table 8. Research-backed adversarial testing across seven categories
Table 8. Research-backed adversarial testing across seven categories
Category Correct / Total Accuracy Key Finding
Adversarial robustness 5 / 5 100% Formal language detected
AI-generated fake news 4 / 4 100% LLM content detected
Temporal domain shift 5 / 5 100% COVID, election, climate
Multilingual content 4 / 4 100% Hindi, Spanish, Hinglish
Source credibility 4 / 5 80% Trusted domain override
Domain-specific 8 / 8 100% Medical, political, science
Edge cases 3 / 5 60% Short text, stats-only
Overall 33 / 36 91.7%
Table 9. System improvement trajectory on seven-category adversarial test suite
Table 9. System improvement trajectory on seven-category adversarial test suite
System Version Score Absolute Gain Primary Gain
V3 (initial) 24/36 = 66.7%
V3-Robust (adv. training) 31/36 = 86.1% + 19.4 % Adversarial cats.
+ Pattern fixes 33/36 = 91.7% + 5.6 % Edge cases, domain
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated