Submitted:
01 June 2025
Posted:
03 June 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
- We propose LLM-as-Critic, a novel large language model fine-tuning paradigm that leverages an LLM’s intrinsic linguistic understanding for highly effective AI-generated text detection.
- We introduce a unique training methodology incorporating contrastive learning and adversarial training to enhance the LLM’s ability to discern subtle, yet consistent, linguistic artifacts indicative of AI generation.
- We demonstrate through extensive experimentation across diverse real-world datasets that LLM-as-Critic significantly outperforms current state-of-the-art AI detection methods, showcasing superior accuracy and robustness.
2. Related Work
2.1. Large Language Models
2.2. Generative AI Detection
3. Method
3.1. Overall Architecture and Human-Likeness Scoring
3.2. Training Objective
3.3. Learning Strategies Details
Supervised Fine-tuning with Binary Cross-Entropy Loss
Contrastive Learning Loss
Adversarial Training Process
4. Experiments
4.1. Comparative Performance Analysis
4.2. Ablation Study for Method Validity
- LLM-as-Critic (BCE only): This baseline model was exclusively fine-tuned using the fundamental Binary Cross-Entropy loss. It serves as the essential starting point for comparison, representing a straightforward supervised learning approach without our advanced enhancements.
- LLM-as-Critic (+CL): This configuration involved training the model with the BCE loss, augmented by our proposed Contrastive Learning loss. The objective here was to quantitatively evaluate the direct impact of explicitly pushing inter-class boundaries further apart in the model’s feature space.
- LLM-as-Critic (+Adv): In this setup, the model was trained with the BCE loss combined with our integrated Adversarial Training scheme. This variant allowed us to specifically assess the contribution of exposing the model to iteratively challenging AI-generated content, thereby improving its robustness.
- Full LLM-as-Critic: This represents our complete proposed model, embodying the synergistic combination of the BCE loss, the Contrastive Learning loss, and the Adversarial Training scheme. It stands as the culmination of our methodological design.
4.3. Human Evaluation Analysis
4.4. Further Analysis and Discussion
4.4.1. Generalization Across Different AI Generators
4.4.2. Performance on Adversarially Attacked Texts
Error Analysis and False Positives/Negatives
5. Conclusion
References
- Zhou, Y.; Long, G. Improving Cross-modal Alignment for Text-Guided Image Inpainting. In Proceedings of the Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023, pp. 3445–3456.
- Diaz-Garcia, J.A.; Carvalho, J.P. A survey of textual cyber abuse detection using cutting-edge language models and large language models. CoRR 2025, abs/2501.05443, [2501.05443]. [CrossRef]
- Megías, A.J.G.; Lopez, L.A.U.; Martínez-Cámara, E. The influence of the perplexity score in the detection of machine-generated texts. In Proceedings of the Proceedings of the First International Conference on Natural Language Processing and Artificial Intelligence for Cyber Security, 2024, pp. 80–85.
- Kadhim, A.K.; Jiao, L.; Shafik, R.A.; Granmo, O. Adversarial Attacks on AI-Generated Text Detection Models: A Token Probability-Based Approach Using Embeddings. CoRR 2025, abs/2501.18998, [2501.18998]. [CrossRef]
- Sadasivan, V.S.; Kumar, A.; Balasubramanian, S.; Wang, W.; Feizi, S. Can AI-generated text be reliably detected? arXiv preprint arXiv:2303.11156 2023.
- Kushnareva, L.; Gaintseva, T.; Magai, G.; Barannikov, S.; Abulkhanov, D.; Kuznetsov, K.; Tulchinskii, E.; Piontkovskaya, I.; Nikolenko, S. AI-generated text boundary detection with RoFT. arXiv preprint arXiv:2311.08349 2023.
- Uchendu, A.; Venkatraman, S.; Le, T.; Lee, D. Catch me if you gpt: Tutorial on deepfake texts. In Proceedings of the Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 5: Tutorial Abstracts), 2024, pp. 1–7.
- Zhou, Y.; Shen, T.; Geng, X.; Tao, C.; Xu, C.; Long, G.; Jiao, B.; Jiang, D. Towards Robust Ranker for Text Retrieval. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023, pp. 5387–5401.
- Kumarage, T.; Agrawal, G.; Sheth, P.; Moraffah, R.; Chadha, A.; Garland, J.; Liu, H. A Survey of AI-generated Text Forensic Systems: Detection, Attribution, and Characterization. CoRR 2024, abs/2403.01152, [2403.01152]. [CrossRef]
- Rezaei, M.; Kwon, Y.; Sanayei, R.; Singh, A.; Bethard, S. CLULab-UofA at SemEval-2024 Task 8: Detecting Machine-Generated Text Using Triplet-Loss-Trained Text Similarity and Text Classification. In Proceedings of the Proceedings of the 18th International Workshop on Semantic Evaluation, SemEval@NAACL 2024, Mexico City, Mexico, June 20-21, 2024; Ojha, A.K.; Dogruöz, A.S.; Madabushi, H.T.; Martino, G.D.S.; Rosenthal, S.; Rosá, A., Eds. Association for Computational Linguistics, 2024, pp. 1498–1504. [CrossRef]
- Wornow, M.; Xu, Y.; Thapa, R.; Patel, B.S.; Steinberg, E.; Fleming, S.L.; Pfeffer, M.A.; Fries, J.A.; Shah, N.H. The Shaky Foundations of Clinical Foundation Models: A Survey of Large Language Models and Foundation Models for EMRs. CoRR 2023, abs/2303.12961, [2303.12961]. [CrossRef]
- Wang, Q.; Hu, H.; Zhou, Y. Memorymamba: Memory-augmented state space model for defect recognition. arXiv preprint arXiv:2405.03673 2024.
- Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Ichter, B.; Xia, F.; Chi, E.H.; Le, Q.V.; Zhou, D. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. In Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022; Koyejo, S.; Mohamed, S.; Agarwal, A.; Belgrave, D.; Cho, K.; Oh, A., Eds., 2022.
- Zhou, Y.; Rao, Z.; Wan, J.; Shen, J. Rethinking Visual Dependency in Long-Context Reasoning for Large Vision-Language Models. arXiv preprint arXiv:2410.19732 2024.
- Chowdhery, A.; Narang, S.; Devlin, J.; Bosma, M.; Mishra, G.; Roberts, A.; Barham, P.; Chung, H.W.; Sutton, C.; Gehrmann, S.; et al. PaLM: Scaling Language Modeling with Pathways. J. Mach. Learn. Res. 2023, 24, 240:1–240:113.
- Zhou, Y.; Geng, X.; Shen, T.; Pei, J.; Zhang, W.; Jiang, D. Modeling event-pair relations in external knowledge graphs for script reasoning. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 2021.
- Lee, J. InstructPatentGPT: Training patent language models to follow instructions with human feedback. CoRR 2024, abs/2406.16897, [2406.16897]. [CrossRef]
- He, Y.; Wang, J.; Li, K.; Wang, Y.; Sun, L.; Yin, J.; Zhang, M.; Wang, X. Enhancing Intent Understanding for Ambiguous Prompts through Human-Machine Co-Adaptation. arXiv preprint arXiv:2501.15167 2025.
- Zhou, Y.; Li, X.; Wang, Q.; Shen, J. Visual In-Context Learning for Large Vision-Language Models. In Proceedings of the Findings of the Association for Computational Linguistics, ACL 2024, Bangkok, Thailand and virtual meeting, August 11-16, 2024. Association for Computational Linguistics, 2024, pp. 15890–15902.
- Sui, Y.; Chuang, Y.; Wang, G.; Zhang, J.; Zhang, T.; Yuan, J.; Liu, H.; Wen, A.; Zhong, S.; Chen, H.; et al. Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models. CoRR 2025, abs/2503.16419, [2503.16419]. [CrossRef]
- Zhou, Y.; Zhang, J.; Chen, G.; Shen, J.; Cheng, Y. Less Is More: Vision Representation Compression for Efficient Video Generation with Large Language Models, 2024.
- He, Y.; Li, S.; Wang, J.; Li, K.; Song, X.; Yuan, X.; Li, K.; Lu, K.; Huo, M.; Chen, J.; et al. Enhancing low-cost video editing with lightweight adaptors and temporal-aware inversion. arXiv preprint arXiv:2501.04606 2025.
- Maitín, A.M.; Nogales, A.; Fernández-Rincón, S.; Aranguren, E.; Cervera-Barba, E.; Denizon-Arranz, S.; Mateos-Rodríguez, A.; García-Tejedor, Á.J. Application of large language models in clinical record correction: a comprehensive study on various retraining methods. J. Am. Medical Informatics Assoc. 2025, 32, 341–348. [CrossRef]
- Cao, C.; Yuan, Z.; Chen, H. ScholarGPT: Fine-tuning Large Language Models for Discipline-Specific Academic Paper Writing. In Proceedings of the 28th Pacific Asia Conference on Information Systems, PACIS 2024, Ho Chi Minh City, Vietnam, July 1-5, 2024; Phan, T.Q.; Tan, B.C.Y.; Le, H.; Thuan, N.H.; Chau, M.; Goh, K.Y., Eds., 2024.
- Rodriguez, J.D.; Hay, T.; Gros, D.; Shamsi, Z.; Srinivasan, R. Cross-domain detection of GPT-2-generated technical text. In Proceedings of the Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022, pp. 1213–1233.
- Mitchell, E.; Lee, Y.; Khazatsky, A.; Manning, C.D.; Finn, C. Detectgpt: Zero-shot machine-generated text detection using probability curvature. In Proceedings of the International Conference on Machine Learning. PMLR, 2023, pp. 24950–24962.
- Hao, J.; Qiang, J.; Zhu, Y.; Li, Y.; Yuan, Y.; Ouyang, X. Post-Hoc Watermarking for Robust Detection in Text Generated by Large Language Models. In Proceedings of the Proceedings of the 31st International Conference on Computational Linguistics, COLING 2025, Abu Dhabi, UAE, January 19-24, 2025; Rambow, O.; Wanner, L.; Apidianaki, M.; Al-Khalifa, H.; Eugenio, B.D.; Schockaert, S., Eds. Association for Computational Linguistics, 2025, pp. 5430–5442.
- Bhattacharjee, A.; Liu, H. Fighting fire with fire: can ChatGPT detect AI-generated text? ACM SIGKDD Explorations Newsletter 2024, 25, 14–21.
- Dubois, M.; Yvon, F.; Piantanida, P. Zero-Shot Machine-Generated Text Detection Using Mixture of Large Language Models. arXiv preprint arXiv:2409.07615 2024.
- Subhash, P.M.; Gupta, D.; Palaniswamy, S.; Venugopalan, M. Fake news detection using deep learning and transformer-based model. In Proceedings of the 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT). IEEE, 2023, pp. 1–6.
- Kim, J.K.; Chua, M.; Rickard, M.; Lorenzo, A. ChatGPT and large language model (LLM) chatbots: The current state of acceptability and a proposal for guidelines on utilization in academic medicine. Journal of Pediatric Urology 2023, 19, 598–604.
| Dataset | LLM-as-Critic F1 | PPL Detector F1 | Stylometric Feat. F1 | FT RoBERTa F1 |
|---|---|---|---|---|
| News | 0.95 | 0.88 | 0.85 | 0.89 |
| Creative Writing | 0.92 | 0.84 | 0.81 | 0.86 |
| Student Papers | 0.94 | 0.87 | 0.83 | 0.88 |
| Code | 0.96 | 0.90 | 0.87 | 0.91 |
| Yelp Reviews | 0.93 | 0.86 | 0.82 | 0.87 |
| arXiv Abstracts | 0.97 | 0.91 | 0.89 | 0.92 |
| Dataset | LLM-as-Critic (BCE only) | LLM-as-Critic (+CL) | LLM-as-Critic (+Adv) | Full LLM-as-Critic |
|---|---|---|---|---|
| News | 0.89 | 0.92 | 0.93 | 0.95 |
| Creative Writing | 0.85 | 0.88 | 0.89 | 0.92 |
| Student Papers | 0.87 | 0.90 | 0.91 | 0.94 |
| arXiv Abstracts | 0.92 | 0.94 | 0.95 | 0.97 |
| Dataset | LLM-as-Critic (%) | PPL Detector (%) | Avg. Human Confidence) |
|---|---|---|---|
| News | 91.5 | 85.2 | 4.3 |
| Creative Writing | 88.1 | 79.5 | 4.1 |
| Student Papers | 89.3 | 82.8 | 4.2 |
| Code | 92.0 | 86.7 | 4.4 |
| Yelp Reviews | 89.8 | 80.1 | 4.0 |
| arXiv Abstracts | 93.4 | 88.9 | 4.5 |
| Dataset (Unseen Generator) | LLM-as-Critic F1 | Fine-tuned RoBERTa Classifier F1 |
|---|---|---|
| News (GPT-4 Turbo) | 0.90 | 0.83 |
| Creative Writing (Claude 3 Opus) | 0.86 | 0.78 |
| Student Papers (Gemini 1.5 Pro) | 0.88 | 0.81 |
| Code (CoPilot) | 0.92 | 0.85 |
| Dataset (Adversarial Attack Type) | LLM-as-Critic F1 | PPL Detector F1 | FT RoBERTa F1 |
|---|---|---|---|
| News (Synonym Substitution) | 0.91 | 0.75 | 0.82 |
| Creative Writing (Paraphrasing) | 0.88 | 0.70 | 0.79 |
| Student Papers (Grammatical Restructuring) | 0.89 | 0.72 | 0.80 |
| Dataset | False Positive Rate (%) | False Negative Rate (%) |
|---|---|---|
| News | 2.5 | 3.0 |
| Creative Writing | 4.0 | 4.5 |
| Student Papers | 3.5 | 4.0 |
| Code | 2.0 | 2.5 |
| Yelp Reviews | 3.0 | 3.5 |
| arXiv Abstracts | 1.5 | 2.0 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).