Submitted:
09 November 2024
Posted:
12 November 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
- Evaluating AI Models: We will assess the effectiveness of advanced AI models, particularly Large Language Models (LLMs), in classifying news articles and identifying fake news.
- Comparative Performance Analysis: The research will compare the performance of various models both before and after fine-tuning using few-shot learning techniques. This comparison will provide insights into the adaptability and robustness of these models in real-world scenarios.
-
Exploring Unanswered Questions: We aim to investigate critical questions that have yet to be thoroughly examined in previous studies:
- Q1: Do traditional NLP and CNN models or LLMs are more accurate in spam detection tasks?
- Q2: Among the GPT-4 Omni family, which model performs best prior to fine-tuning?
- Q3: After fine-tuning with few-shot learning, which model in the GPT-4 Omni family demonstrates superior performance?
- Q4: What is the significance of the costs associated with fine-tuning LLMs, and how do these costs impact performance in the news sector?
- Q5: How can LLMs be effectively leveraged to assess fake news, and what transformative effects can they have on the news industry through automated detection and actionable insights?
2. Literature Review
2.1. Feature-Based Detection Approaches
2.2. Deep Learning Techniques
2.3. Multi-Modal and Hybrid Approaches
2.4. NLP and Machine Learning
2.5. Network-Based Detection Approaches
2.6. Meta-Analytic and Comparative Studies
2.7. Specialized Detection Models
2.8. Emerging Trends and Novel Techniques
2.9. Augmentation and Transfer Learning
2.10. Cooperative and Feedback-Based Models
2.11. Toxic News and Multiclass Classification
3. Materials and Methods
3.1. Dataset Cleaning, Preprocessing, and Splitting
3.1.1. Dataset Preprocessing
- Column Removal: The first step involved removing the “Unnamed: 0” column, which was deemed redundant and irrelevant to our analysis.
- Empty Row Removal: We performed a thorough check across the “Title,” “Text,” and “Label” columns to identify and delete any rows with missing values. This ensured that all remaining entries were complete and would contribute fully to the model’s training and validation processes.
- Column Merging: We combined the “Title” and “Text” columns into a new consolidated column, named “Text.” This allowed the model to process the headline and article content together, enhancing the context available for classification.
- Label Standardization: The “label” column was standardized and renamed “Label” to ensure consistency across our data and streamline integration with our modeling pipeline.
- Text Length Restriction: We restricted the “Text” column entries to a maximum length of 2,560 characters. Limiting input size at 2,560 characters is especially effective for training CNN and BERT models, as it provides sufficient contextual information while keeping memory and processing efficiency manageable. After this truncation, the dataset contained 7,573 entries labeled as 1 and 7,313 entries labeled as 0.
- Data Standardization: Following the character limit restriction, we standardized the dataset to improve feature consistency and facilitate model convergence. Post-standardization, we re-checked for any empty rows that may have emerged and removed them, resulting in 7,568 entries labeled as 1 and 7,313 entries labeled as 0.
- Balanced Sampling: To ensure a balanced dataset, we selected 5,000 entries with stratified sampling based on the “Label” column, yielding 2,500 entries per class (Label 1 and Label 0).
- ID Addition: We introduced a unique identifier (ID) for each entry to aid in referencing and error tracking throughout the modeling process.
3.1.2. Dataset Splitting
3.2. LLM Prompt Engineering
- Content Independent of Model Architecture: We designed the prompt to be versatile and not dependent on any single model’s framework. This flexibility ensures that it can be applied across different LLMs with minimal adjustment, focusing on clear communication of the task with relevant context and instructions interpretable by any LLM.
- Structured Output for Accessibility: Recognizing the importance of usability, we created a response format that aligns with coding and accessibility standards. The output was organized in compliance with the JSON standard, offering a logical, intuitive structure that meets both human readability and machine processing requirements.
| Listing 1. Model-agnostic prompt. |
![]() |
3.3. Model Deployment, Fine-Tuning, and Predictive Evaluation
3.3.1. GPT Model Deployment and Fine-Tuning
| Listing 2. Prompt and completion pairs – JSONL files. |
![]() |
3.3.2. BERT Model Deployment and Fine-Tuning
3.3.3. CNN Model Deployment and Fine-Tuning
4. Results
4.1. Overview of Fine-Tuning Metrics
4.2. Model Evaluation Phase
4.2.1. Pre-Fine-Tuning Evaluation
4.2.2. Post-Fine-Tuning Evaluation
5. Discussion
5.1. Evaluating Traditional NLP Models vs. LLMs in Fake News Detection
- Research Question 1: Do traditional NLP and CNN models or LLMs are more accurate in spam detection tasks?
- Research Statement 1: Fine-tuned LLMs outperform traditional NLP and CNN models in spam detection, achieving near-perfect accuracy.
- The role of pretraining and architecture: Unlike CNNs, which are primarily designed for pattern recognition in structured data like images [44], LLMs are built with transformer-based architectures that allow for deep attention mechanisms and sequence-based learning. These transformers, pretrained on vast and diverse datasets, are adept at capturing language patterns, idiomatic expressions, and subtle semantic relationships. In spam detection, this translates to a model that can understand nuanced phrasing or stylistic cues typical of spam, even when these cues are subtle or context-dependent.
- CNN limitations: The CNN model (ft:cnn_adam) in the study achieved only 58.6% accuracy, which is markedly lower than the transformer-based models. CNNs are effective at identifying repetitive, structured patterns but fall short when tasked with understanding the complexities of human language, especially when spam content relies on nuanced or indirect language. Since CNNs don’t inherently process sequential information as effectively as transformers, they struggle to recognize the sequential and contextual patterns often necessary for distinguishing spam. Furthermore, CNNs require substantial labeled data tailored to the target task to perform well in NLP tasks, given their lack of extensive pretraining on varied textual data [45].
- Comparing BERT and GPT models in spam detection: The BERT model (ft:bert-adam), while achieving a respectable 97.5% accuracy, still fell short of the fine-tuned GPT-4 Omni models. This difference, although minor, may be attributed to the GPT-4 Omni models’ extensive pretraining and perhaps larger scale compared to BERT. Additionally, while both BERT and GPT are transformer-based, GPT models are autoregressive, which means they are trained to predict the next word in a sequence, potentially enhancing their understanding of sentence flow and structure—elements that are crucial for detecting deceptive or misleading language. BERT’s bidirectional nature gives it a slight advantage in understanding context but might limit its proficiency in tasks requiring generation or classification of nuanced language.
5.2. Pre-Fine-Tuning Performance Assessment Within the GPT-4 Omni Family
- Research Question 2: Among the GPT-4 Omni family, which model performs best prior to fine-tuning?
- Research Statement 2: Prior to fine-tuning, GPT-4 Omni models perform poorly in spam detection, highlighting the necessity of task-specific training.
- Baseline performance and lack of task-specific knowledge: The low accuracy scores of 13.9% for GPT-4o and 14.7% for GPT-4o-mini underscore that both models lack the task-specific knowledge required for effective spam detection in a zero-shot setting. These results suggest that while LLMs have extensive general language understanding, applying this to a nuanced, specialized task like spam detection is challenging without specific tuning. Spam classification often relies on recognizing subtle cues, phrasing patterns, and contextual red flags that are challenging for general-purpose models to identify without tailored training.
- The minimal performance gap between models: The 0.8% difference in accuracy between GPT-4o and GPT-4o-mini is marginal, indicating that, prior to fine-tuning, neither model’s scale provides a significant advantage. This close performance suggests that the larger parameter count of GPT-4o does not inherently improve its capability in zero-shot spam detection. The lack of significant disparity in results points to an underlying need for task-specific adaptation that even a larger model cannot overcome in a zero-shot context. This result aligns with findings in NLP research showing that model size alone doesn’t necessarily enhance performance in specialized classification tasks without targeted training.
- Challenges of zero-shot spam detection: Spam detection is a complex task that requires not only general language understanding but also the ability to differentiate between legitimate and deceptive communication. Spam content often imitates legitimate language, which makes it difficult to classify correctly without exposure to examples during training [39]. Zero-shot models, despite their general versatility, lack the fine-grained knowledge to identify these distinctions [46]. This is especially true in domains like spam detection, where subtle stylistic or structural cues might signal spam, and understanding these cues requires domain-specific data exposure.
- Implications of prompt complexity: Attempts to simplify prompts did not result in significant improvements in zero-shot performance, suggesting that prompt engineering alone may not be sufficient to bridge the knowledge gap in specialized tasks [47]. While prompt optimization can enhance zero-shot performance in some general tasks, its limited impact here implies that spam detection requires more than refined prompting; it needs models that have been trained on data specific to the task. This finding emphasizes that while LLMs are powerful, there are limits to what can be achieved through zero-shot learning alone in cases where the task requires deep contextual familiarity.
5.3. Fine-Tuning Impact on GPT-4 Omni Models with Few-Shot Learning
- Research Question 3: After fine-tuning with few-shot learning, which model in the GPT-4 Omni family demonstrates superior performance?
- Research Statement 3: After fine-tuning with few-shot learning, GPT-4o and GPT-4o-mini both achieve 98.8% accuracy, with GPT-4o-mini offering a resource-efficient alternative.
- High accuracy and comparable performance: Both GPT-4o and GPT-4o-mini achieved a remarkable accuracy of 98.8% after fine-tuning, suggesting that fine-tuning with few-shot learning equipped both models with a deep understanding of spam-related cues and patterns. This high accuracy indicates that fine-tuning enabled these models to internalize task-specific patterns, transforming general-purpose models into highly competent classifiers. The minimal difference between the two models implies that few-shot learning was sufficient for the task, effectively compensating for any initial knowledge gaps.
- The marginal advantage of GPT-4o: The 0.8% advantage of GPT-4o over GPT-4o-mini could be attributed to its larger parameter size, which theoretically allows for more detailed representations of data patterns. The larger model may have a slight edge in capturing subtle distinctions in spam characteristics, such as variations in tone, language, or structure. However, this difference in performance is marginal, implying that while larger models may have more capacity, they don’t always translate that into substantial performance gains for specific tasks, especially when smaller models perform almost as well after fine-tuning.
- Fine-tuning efficacy across model sizes: Fine-tuning proved equally effective for both the large and smaller model, suggesting that even a model with fewer parameters, like GPT-4o-mini, can achieve high accuracy when task-specific knowledge is provided through fine-tuning. This reinforces that model scaling is not always necessary for high performance in specialized tasks if effective fine-tuning methods, like few-shot learning, are applied. It also demonstrates that a smaller model, given the right training, can leverage its pre-existing language understanding to learn task-specific requirements efficiently.
- Scalability and flexibility in model deployment: The fact that GPT-4o-mini can achieve comparable performance to GPT-4o after fine-tuning suggests that smaller models in the GPT-4 Omni family can be scaled down without sacrificing substantial accuracy. This scalability is particularly beneficial for businesses or developers looking to deploy multiple models across various tasks, as smaller models require less computational power for deployment and can be trained more quickly [48]. Organizations that need to adapt quickly to new spam detection patterns, for instance, might find GPT-4o-mini advantageous, as it combines high performance with adaptability and cost-effectiveness.
- Strategic model selection for application needs: For organizations with stringent accuracy standards in spam detection, both models offer strong choices. However, GPT-4o-mini’s close performance to GPT-4o and its lower computational footprint make it particularly suitable for real-time spam detection systems, mobile applications, or cloud deployments where resource limitations are a concern [49]. By achieving high accuracy with fewer resources, GPT-4o-mini serves as an example of how model selection can be aligned with specific operational and budgetary needs without compromising on task accuracy [50].
5.4. Cost-Performance Analysis of Fine-Tuning LLMs for Spam Detection
- Research Question 4: What is the significance of the costs associated with fine-tuning LLMs, and how do these costs impact performance in the news sector?
- Research Statement 4: Fine-tuning LLMs like GPT-4o incurs high costs, but GPT-4o-mini offers nearly equal performance, making it a cost-effective and sustainable choice for the news sector.
- Cost-performance trade-offs: Fine-tuning costs can vary dramatically between models, particularly as model size and parameter count increase. While larger models like GPT-4o may offer slight accuracy improvements, these benefits often come with exponentially higher computational costs due to the additional resources needed for training and storage. The results of this study suggest that smaller models like GPT-4o-mini can achieve nearly the same accuracy (98.8%) as larger models, meaning that news organizations can achieve high performance without committing to the costs associated with the largest models. For resource-constrained sectors, this cost-performance balance is essential, allowing organizations to access LLM capabilities without overwhelming financial investments.
- Scalability and resource allocation in newsrooms: Many newsrooms, especially smaller or independent ones, operate on limited budgets, making high-cost fine-tuning of large models unfeasible. GPT-4o-mini’s near-parity in performance with GPT-4o after fine-tuning suggests that news organizations could allocate their resources more efficiently by selecting smaller models that require fewer computational resources. By doing so, they can implement robust AI solutions across multiple tasks—such as spam detection, fake news analysis, and content moderation—without incurring prohibitive costs. This approach makes AI-powered solutions more scalable and accessible across diverse newsroom environments.
- Sustainability and environmental impacts: Computationally intensive fine-tuning contributes to energy consumption, which has significant environmental implications [51]. The use of a smaller model like GPT-4o-mini, which requires less power and computational time, aligns with sustainability goals by reducing the carbon footprint associated with model training. For news organizations committed to minimizing their environmental impact, smaller models represent a more sustainable alternative that still delivers high performance. This consideration is becoming increasingly important for industries striving to balance technological advancement with environmental responsibility.
5.5. Harnessing LLMs for Fake News Detection: Impact and Industry Transformation
- Research Question 5: How can LLMs be effectively leveraged to assess fake news, and what trans-formative effects can they have on the news industry through automated detection and actionable insights?
- Research Statement 5: LLMs can revolutionize fake news detection in the news industry by automating fact-checking, analyzing misinformation patterns, and optimizing journalistic workflows.
- Automated fake news detection and verification: LLMs excel in detecting subtle linguistic cues, including tone, intent, and inconsistencies in phrasing that may indicate misinformation. By analyzing text with high sensitivity to such patterns, these models can flag potentially deceptive articles, posts, or statements [52]. Automating fake news detection enables near-instant identification of suspicious content, providing journalists and editors with a tool to screen and verify information before it reaches the public. This real-time verification can significantly reduce the spread of fake news by catching it early in the content distribution pipeline.
- Analyzing patterns and trends in missinformation: LLMs can analyze large datasets to identify recurring patterns in misinformation [53]. For instance, they can detect repeated themes, sources, or specific phrasing commonly associated with fake news, which helps newsrooms understand how misinformation is structured and spread. These insights allow media organizations to better understand the origins and propagation mechanisms of fake news, helping them create targeted counter-narratives and education campaigns to inform the public. Moreover, such analysis can assist journalists in investigating and debunking trends in misinformation at their root, reducing their overall impact.
- Efficient allocation of journalistic resources: Fake news detection traditionally requires extensive time and effort from journalists to verify sources, cross-check facts, and consult experts. With LLMs automating much of this initial verification process, journalists are free to focus on in-depth investigative reporting or nuanced storytelling. LLMs can serve as frontline tools, handling large volumes of content for preliminary screening and allowing human editors to prioritize the content that truly needs expert analysis [54]. This efficiency can lead to increased productivity in newsrooms, allowing them to cover more stories and provide richer, more balanced perspectives.
- Content moderation and community engagement: News outlets can deploy LLMs to moderate user-generated content, such as comments on articles or social media platforms, where misinformation often proliferates. By filtering out or flagging misleading comments in real-time, LLMs could enable news organizations to maintain respectful and informative discussions around their content. This content moderation creates a safer, more reliable environment for audience engagement, reducing misinformation on news platforms and fostering healthier community discourse [55].
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Papageorgiou, E.; Chronis, C.; Varlamis, I.; Himeur, Y. A Survey on the Use of Large Language Models (LLMs) in Fake News. Future Internet 2024, Vol. 16, Page 298 2024, 16, 298. [Google Scholar] [CrossRef]
- ShuKai; SlivaAmy; WangSuhang; TangJiliang; LiuHuan Fake News Detection on Social Media. ACM SIGKDD Explorations Newsletter 2017, 19, 22–36. [CrossRef]
- Sakas, D.P.; Reklitis, D.P.; Trivellas, P. Social Media Analytics for Customer Satisfaction Based on User Engagement and Interactions in the Tourism Industry. Springer Proceedings in Business and Economics 2024, 103–109. [Google Scholar] [CrossRef]
- Poulopoulos, V.; Vassilakis, C.; Wallace, M.; Antoniou, A.; Lepouras, G. The Effect of Social Media Trending Topics Related to Cultural Venues’ Content. Proceedings - 13th International Workshop on Semantic and Social Media Adaptation and Personalization, SMAP 2018 2018, 7–12. [CrossRef]
- Reis, J.C.S.; Correia, A.; Murai, F.; Veloso, A.; Benevenuto, F.; Cambria, E. Supervised Learning for Fake News Detection. IEEE Intell Syst 2019, 34, 76–81. [Google Scholar] [CrossRef]
- Pérez-Rosas, V.; Kleinberg, B.; Lefevre, A.; Mihalcea, R. Automatic Detection of Fake News. COLING 2018 - 27th International Conference on Computational Linguistics, Proceedings 2017, 3391–3401.
- Al Asaad, B.; Erascu, M. A Tool for Fake News Detection. Proceedings - 2018 20th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, SYNASC 2018 2018, 379–386. [CrossRef]
- Thota, A.; Tilak, P.; Ahluwalia, S.; Lohia, N. Fake News Detection: A Deep Learning Approach. SMU Data Science Review 2018, 1. [Google Scholar]
- Kaliyar, R.K.; Goswami, A.; Narang, P.; Sinha, S. FNDNet – A Deep Convolutional Neural Network for Fake News Detection. Cogn Syst Res 2020, 61, 32–44. [Google Scholar] [CrossRef]
- Yang, Y.; Zheng, L.; Zhang, J.; Cui, Q.; Zhang, X.; Li, Z.; Yu, P.S. TI-CNN: Convolutional Neural Networks for Fake News Detection. 2018.
- Singhal, S.; Shah, R.R.; Chakraborty, T.; Kumaraguru, P.; Satoh, S. SpotFake: A Multi-Modal Framework for Fake News Detection. Proceedings - 2019 IEEE 5th International Conference on Multimedia Big Data, BigMM 2019 2019, 39–47. [CrossRef]
- Devarajan, G.G.; Nagarajan, S.M.; Amanullah, S.I.; Mary, S.A.S.A.; Bashir, A.K. AI-Assisted Deep NLP-Based Approach for Prediction of Fake News from Social Media Users. IEEE Trans Comput Soc Syst 2024, 11, 4975–4985. [Google Scholar] [CrossRef]
- Almarashy, A.H.J.; Feizi-Derakhshi, M.R.; Salehpour, P. Enhancing Fake News Detection by Multi-Feature Classification. IEEE Access 2023, 11, 139601–139613. [Google Scholar] [CrossRef]
- Oshikawa, R.; Qian, J.; Wang, W.Y. A Survey on Natural Language Processing for Fake News Detection. LREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings 2018, 6086–6093.
- Mehta, D.; Patel, M.; Dangi, A.; Patwa, N.; Patel, Z.; Jain, R.; Shah, P.; Suthar, B. Exploring the Efficacy of Natural Language Processing and Supervised Learning in the Classification of Fake News Articles. Advances in Robotic Technology 2024, 2, 1–6. [Google Scholar] [CrossRef]
- Madani, M.; Motameni, H.; Roshani, R. Fake News Detection Using Feature Extraction, Natural Language Processing, Curriculum Learning, and Deep Learning. https://doi.org/10.1142/S0219622023500347 2023, 23, 1063-1098, 10.1142/S0219622023500347.
- ZhouXinyi; ZafaraniReza Network-Based Fake News Detection. ACM SIGKDD Explorations Newsletter 2019, 21, 48–60. [CrossRef]
- Conroy, N.J.; Rubin, V.L.; Chen, Y. Automatic Deception Detection: Methods for Finding Fake News. Proceedings of the Association for Information Science and Technology 2015, 52, 1–4. [Google Scholar] [CrossRef]
- Kozik, R.; Pawlicka, A.; Pawlicki, M.; Choraś, M.; Mazurczyk, W.; Cabaj, K. A Meta-Analysis of State-of-the-Art Automated Fake News Detection Methods. IEEE Trans Comput Soc Syst 2024, 11, 5219–5229. [Google Scholar] [CrossRef]
- Farhangian, F.; Cruz, R.M.O.; Cavalcanti, G.D.C. Fake News Detection: Taxonomy and Comparative Study. Information Fusion 2024, 103, 102140. [Google Scholar] [CrossRef]
- Alghamdi, J.; Lin, Y.; Luo, S. Towards COVID-19 Fake News Detection Using Transformer-Based Models. Knowl Based Syst 2023, 274, 110642. [Google Scholar] [CrossRef]
- Mahmud, M.A.I.; Talha Talukder, A.A.; Sultana, A.; Bhuiyan, K.I.A.; Rahman, M.S.; Pranto, T.H.; Rahman, R.M. Toward News Authenticity: Synthesizing Natural Language Processing and Human Expert Opinion to Evaluate News. IEEE Access 2023, 11, 11405–11421. [Google Scholar] [CrossRef]
- Yang, S.; Shu, K.; Wang, S.; Gu, R.; Wu, F.; Liu, H. Unsupervised Fake News Detection on Social Media: A Generative Approach. Proceedings of the AAAI Conference on Artificial Intelligence 2019, 33, 5644–5651. [Google Scholar] [CrossRef]
- Liu, Y.; Wu, Y.F.B. FNED. ACM Transactions on Information Systems (TOIS) 2020, 38. [Google Scholar] [CrossRef]
- Wani, M.A.; Elaffendi, M.; Shakil, K.A.; Abuhaimed, I.M.; Nayyar, A.; Hussain, A.; El-Latif, A.A.A. Toxic Fake News Detection and Classification for Combating COVID-19 Misinformation. IEEE Trans Comput Soc Syst 2024, 11, 5101–5118. [Google Scholar] [CrossRef]
- Kapusta, J.; Držik, D.; Šteflovič, K.; Nagy, K.S. Text Data Augmentation Techniques for Word Embeddings in Fake News Classification. IEEE Access 2024, 12, 31538–31550. [Google Scholar] [CrossRef]
- Raja, E.; Soni, B.; Borgohain, S.K. Fake News Detection in Dravidian Languages Using Transfer Learning with Adaptive Finetuning. Eng Appl Artif Intell 2023, 126, 106877. [Google Scholar] [CrossRef]
- Liu, Y.; Zhu, J.; Zhang, K.; Tang, H.; Zhang, Y.; Liu, X.; Liu, Q.; Chen, E. Detect, Investigate, Judge and Determine: A Novel LLM-Based Framework for Few-Shot Fake News Detection. 2024.
- Mallick, C.; Mishra, S.; Senapati, M.R. A Cooperative Deep Learning Model for Fake News Detection in Online Social Networks. J Ambient Intell Humaniz Comput 2023, 14, 4451–4460. [Google Scholar] [CrossRef] [PubMed]
- Shushkevich, E.; Alexandrov, M.; Cardiff, J. Improving Multiclass Classification of Fake News Using BERT-Based Models and ChatGPT-Augmented Data. Inventions 2023, Vol. 8, Page 112 2023, 8, 112. [Google Scholar] [CrossRef]
- Models - OpenAI API . Available online: https://platform.openai.com/docs/models (accessed on 11 October 2024).
- Verma, P.K.; Agrawal, P.; Amorim, I.; Prodan, R. WELFake: Word Embedding Over Linguistic Features for Fake News Detection. IEEE Trans Comput Soc Syst 2021, 8, 881–893. [Google Scholar] [CrossRef]
- Fake News Classification . Available online: https://www.kaggle.com/datasets/saurabhshahane/fake-news-classification (accessed on 30 October 2024).
- Zhang, K.; Zhou, F.; Wu, L.; Xie, N.; He, Z. Semantic Understanding and Prompt Engineering for Large-Scale Traffic Data Imputation. Information Fusion 2024, 102, 102038. [Google Scholar] [CrossRef]
- Zheng, Y.; Cai, R.; Maimaiti, M.; Abiderexiti, K. Chunk-BERT: Boosted Keyword Extraction for Long Scientific Literature via BERT with Chunking Capabilities. 2023 IEEE 4th International Conference on Pattern Recognition and Machine Learning, PRML 2023 2023, 385–392. [CrossRef]
- Bert-Base-Uncased · Hugging Face . Available online: https://huggingface.co/bert-base-uncased (accessed on 17 December 2023).
- Pretrained Models — Transformers 3.3.0 Documentation. Available online: https://huggingface.co/transformers/v3.3.1/pretrained_models.html (accessed on 17 December 2023).
- BERT — Transformers 3.0.2 Documentation. Available online: https://huggingface.co/transformers/v3.0.2/model_doc/bert.html (accessed on 5 November 2024).
- Roumeliotis, K.I.; Tselikas, N.D.; Nasiopoulos, D.K.; Roumeliotis, K.I.; Tselikas, N.D.; Nasiopoulos, D.K. Next-Generation Spam Filtering: Comparative Fine-Tuning of LLMs, NLPs, and CNN Models for Email Spam Classification. Electronics 2024, Vol. 13, Page 2034 2024, 13, 2034. [Google Scholar] [CrossRef]
- Tqdm · PyPI . Available online: https://pypi.org/project/tqdm/ (accessed on 17 December 2023).
- GitHub - Applied-AI-Research-Lab/CNN-BERT-and-GPT-Models-for-Robust-Fake-News-Classification-and-Spam-Detection . Available online: https://github.com/Applied-AI-Research-Lab/CNN-BERT-and-GPT-Models-for-Robust-Fake-News-Classification-and-Spam-Detection/tree/main (accessed on 9 November 2024).
- Garcia, C.I.; Grasso, F.; Luchetta, A.; Piccirilli, M.C.; Paolucci, L.; Talluri, G. A Comparison of Power Quality Disturbance Detection and Classification Methods Using CNN, LSTM and CNN-LSTM. Applied Sciences 2020, Vol. 10, Page 6755 2020, 10, 6755. [Google Scholar] [CrossRef]
- Roumeliotis, K.I.; Tselikas, N.D.; Nasiopoulos, D.K. LLMs and NLP Models in Cryptocurrency Sentiment Analysis: A Comparative Classification Study. Big Data and Cognitive Computing 2024, Vol. 8, Page 63 2024, 8, 63. [Google Scholar] [CrossRef]
- Amiri, Z.; Heidari, A.; Navimipour, N.J.; Unal, M.; Mousavi, A. Adventures in Data Analysis: A Systematic Review of Deep Learning Techniques for Pattern Recognition in Cyber-Physical-Social Systems. Multimed Tools Appl 2024, 83, 22909–22973. [Google Scholar] [CrossRef]
- Bhatti, U.A.; Tang, H.; Wu, G.; Marjan, S.; Hussain, A. Deep Learning with Graph Convolutional Networks: An Overview and Latest Applications in Computational Intelligence. International Journal of Intelligent Systems 2023, 2023, 8342104. [Google Scholar] [CrossRef]
- Rojas-Galeano, S. Zero-Shot Spam Email Classification Using Pre-Trained Large Language Models. 2025, 3–18. [CrossRef]
- Mu, Y.; Wu, B.P.; Thorne, W.; Robinson, A.; Aletras, N.; Scarton, C.; Bontcheva, K.; Song, X. Navigating Prompt Complexity for Zero-Shot Classification: A Study of Large Language Models in Computational Social Science. 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings 2023, 12074–12086.
- OpenAI Launches GPT-4o Mini, a Slimmer, Cheaper AI Model for Developers -- Pure AI . Available online: https://pureai.com/Articles/2024/07/18/OpenAI-Launches-GPT-4o-Mini.aspx (accessed on 9 November 2024).
- GPT-4o, vs. GPT-4o-Mini: Which AI Model to Choose? Available online: https://anthemcreation.com/en/artificial-intelligence/comparative-gpt-4o-gpt-4o-mini-open-ai/ (accessed on 9 November 2024).
- A Guide to GPT4o Mini: OpenAI’s Smaller, More Efficient Language Model . Available online: https://kili-technology.com/large-language-models-llms/a-guide-to-gpt4o-mini-openai-s-smaller-more-efficient-language-model (accessed on 9 November 2024).
- Huang, K.; Yin, H.; Huang, H.; Gao, W. Towards Green AI in Fine-Tuning Large Language Models via Adaptive Backpropagation. 12th International Conference on Learning Representations, ICLR 2024, 2023. [Google Scholar]
- Papageorgiou, E.; Chronis, C.; Varlamis, I.; Himeur, Y. A Survey on the Use of Large Language Models (LLMs) in Fake News. Future Internet 2024, Vol. 16, Page 298 2024, 16, 298. [Google Scholar] [CrossRef]
- Teo, T.W.; Chua, H.N.; Jasser, M.B.; Wong, R.T.K. Integrating Large Language Models and Machine Learning for Fake News Detection. 2024 20th IEEE International Colloquium on Signal Processing and Its Applications, CSPA 2024 - Conference Proceedings 2024, 102–107. [CrossRef]
- Kumar, R.; Goddu, B.; Saha, S.; Jatowt, A. Silver Lining in the Fake News Cloud: Can Large Language Models Help Detect Misinformation? IEEE Transactions on Artificial Intelligence 2024. [Google Scholar] [CrossRef]
- Ma, H.; Zhang, C.; Fu, H.; Zhao, P.; Wu, B. Adapting Large Language Models for Content Moderation: Pitfalls in Data Engineering and Supervised Fine-Tuning. 2023.
| Model | Resources | Training Loss | Validation Loss | Training Time (Seconds) | Training Cost |
|---|---|---|---|---|---|
| ft:gpt-4o | API | 0.0000 | 0.0073 | 2,804 | $31.08 |
| ft:gpt-4o-mini | API | 0.0000 | 0.0077 | 1,462 | $1.16 |
| ft:bert-adam | Tesla V100-SXM2-16 GB | 0.0294 | 0.0386 | 877 | $2.54 |
| ft:cnn-adam | Tesla V100-SXM2-16 GB | 0.6253 | 0.5884 | 47.90 | $0.14 |
| Model | Accuracy | Precision | Recall | F1 |
|---|---|---|---|---|
| base:gpt-gpt-4o-2024-08-06 | 0.139 | 0.1243 | 0.139 | 0.1305 |
| base: gpt-4o-mini-2024-07-18 | 0.147 | 0.125 | 0.147 | 0.1343 |
| ft:gpt-4o | 0.988 | 0.988 | 0.988 | 0.988 |
| ft:gpt-4o-mini | 0.988 | 0.9881 | 0.988 | 0.988 |
| ft:bert-adam | 0.975 | 0.9758 | 0.975 | 0.975 |
| ft:cnn_adam | 0.586 | 0.6334 | 0.586 | 0.5457 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

