Submitted:
17 August 2023
Posted:
18 August 2023
You are already at the latest version
Abstract
Keywords:
Introduction
Contribution
Results
The Outcome of ChatGPT Bigram Analysis
The Structural Analysis of ChatGPT Bigram Network
Classification of ChatGPT Fake Articles
Methods
Data Collection
Code Availability
Exploratory Analysis of ChatGPT Articles
ChatGPT Bigram-Similarity Analysis
ChatGPT Network Structural Analysis
| Algorithm 1 The calibration steps involve computing using 10 individual folds for comparison against the training baseline. |
|
Classification of ChatGPT Fake Articles
| Algorithm 2 A detailed description of the xFakeBibs algorithm |
|
Limitations
Discussion
Conclusions
- Utilizing ChatGPT APIs to generate complete publications and comparing them with archived full-text articles.
- Testing the xFakeBibs algorithm with publications in various subject areas.
- Fact-checking ChatGPT’s responses for well-known questions that require reasoning (ongoing) [38].
- Training ChatGPT to provide answers specific to certain domains, such as clinical, medical, chemical, and biological applications.
Author contributions statement
Acknowledgments
Conflicts of Interest
References
- ChatGPT, 2023. Accessed August 15, 2023.
- Synnestvedt, M.B.; Chen, C.; Holmes, J.H. CiteSpace II: visualization and knowledge discovery in bibliographic databases. AMIA annual symposium proceedings. American Medical Informatics Association, 2005, Vol. 2005, p. 724.
- Holzinger, A.; Ofner, B.; Stocker, C.; Calero Valdez, A.; Schaar, A.K.; Ziefle, M.; Dehmer, M. On graph entropy measures for knowledge discovery from publication network data. Availability, Reliability, and Security in Information Systems and HCI: IFIP WG 8.4, 8.9, TC 5 International Cross-Domain Conference, CD-ARES 2013, Regensburg, Germany, September 2-6, 2013. Proceedings 8. Springer, 2013, pp. 354–362.
- Usai, A.; Pironti, M.; Mital, M.; Aouina Mejri, C. Knowledge discovery out of text data: a systematic review via text mining. Journal of knowledge management 2018, 22, 1471–1488. [Google Scholar] [CrossRef]
- Thaler, A.D.; Shiffman, D. Fish tales: Combating fake science in popular media. Ocean & Coastal Management 2015, 115, 88–91. [Google Scholar]
- Hopf, H.; Krief, A.; Mehta, G.; Matlin, S.A. Fake science and the knowledge crisis: ignorance can be fatal. Royal Society open science 2019, 6, 190161. [Google Scholar] [CrossRef] [PubMed]
- Ho, S.S.; Goh, T.J.; Leung, Y.W. Let’s nab fake science news: Predicting scientists’ support for interventions using the influence of presumed media influence model. Journalism 2022, 23, 910–928. [Google Scholar] [CrossRef]
- Frederickson, R.M.; Herzog, R.W. Addressing the big business of fake science. Molecular Therapy 2022, 30, 2390. [Google Scholar] [CrossRef]
- Rocha, Y.M.; de Moura, G.A.; Desidério, G.A.; de Oliveira, C.H.; Lourenço, F.D.; de Figueiredo Nicolete, L.D. The impact of fake news on social media and its influence on health during the COVID-19 pandemic: A systematic review. Journal of Public Health 2021, 1–10. [Google Scholar] [CrossRef]
- Walter, N.; Brooks, J.J.; Saucier, C.J.; Suresh, S. Evaluating the impact of attempts to correct health misinformation on social media: A meta-analysis. Health communication 2021, 36, 1776–1784. [Google Scholar] [CrossRef] [PubMed]
- Loomba, S.; de Figueiredo, A.; Piatek, S.J.; de Graaf, K.; Larson, H.J. Measuring the impact of COVID-19 vaccine misinformation on vaccination intent in the UK and USA. Nature human behaviour 2021, 5, 337–348. [Google Scholar] [CrossRef]
- Lewandowsky, S.; Ecker, U.K.; Seifert, C.M.; Schwarz, N.; Cook, J. Misinformation and its correction: Continued influence and successful debiasing. Psychological science in the public interest 2012, 13, 106–131. [Google Scholar] [CrossRef] [PubMed]
- Myers, M.; Pineda, D. Misinformation about vaccines. Vaccines for biodefense and emerging and neglected diseases 2009, 225–270. [Google Scholar]
- Matthews, S.; Spencer, B. Government orders review into vitamin D’s role in Covid-19, 2020.
- Abdeen, M.A.; Hamed, A.A.; Wu, X. Fighting the COVID-19 Infodemic in News Articles and False Publications: The NeoNet Text Classifier, a Supervised Machine Learning Algorithm. Applied Sciences 2021, 11, 7265. [Google Scholar] [CrossRef]
- Eysenbach, G.; others. The role of ChatGPT, generative language models, and artificial intelligence in medical education: a conversation with ChatGPT and a call for papers. JMIR Medical Education 2023, 9, e46885. [Google Scholar] [CrossRef]
- IEEE Special Issue on Education in the World of ChatGPT and other Generative AI, 2023. Accessed April 13, 2023.
- Financial Innovation. Accessed April 13, 2023.
- Special Issue "Language Generation with Pretrained Models", Year. Accessed April 13, 2023.
- Call for Papers for the Special Focus Issue on ChatGPT and Large Language Models (LLMs) in Biomedicine and Health, Year. Accessed July 4, 2023.
- Do you allow the use of ChatGPT or other generative language models and how should this be reported?, Year. Published March 16, 2023. Accessed April 13, 2023.
- Null, N. The PNAS Journals Outline Their Policies for ChatGPT and Generative AI. PNAS Updates 2023. Published online.
- As scientists explore AI-written text, journals hammer out policies, Year. Accessed April 13, 2023.
- Fuster, V.; Bozkurt, B.; Chandrashekhar, Y.; Grapsa, J.; Ky, B.; Mann, D.L.; Moliterno, D.J.; Shivkumar, K.; Silversides, C.K.; Turco, J.V. ; others. JACC Journals’ Pathway Forward With AI Tools: The Future Is Now, 2023.
- Flanagin, A.; Bibbins-Domingo, K.; Berkwits, M.; Christiansen, S.L. Nonhuman “authors” and implications for the integrity of scientific publication and medical knowledge. Jama 2023, 329, 637–639. [Google Scholar] [CrossRef]
- ChatGPT plugins, Year. Accessed April 13, 2023.
- Gilson, A.; Safranek, C.W.; Huang, T.; Socrates, V.; Chi, L.; Taylor, R.A.; Chartash, D.; others. How does ChatGPT perform on the United States medical licensing examination? The implications of large language models for medical education and knowledge assessment. JMIR Medical Education 2023, 9, e45312. [Google Scholar] [CrossRef] [PubMed]
- Aizawa, A. An information-theoretic perspective of tf–idf measures. Information Processing & Management 2003, 39, 45–65. [Google Scholar]
- Qaiser, S.; Ali, R. Text mining: use of TF-IDF to examine the relevance of words to documents. International Journal of Computer Applications 2018, 181, 25–29. [Google Scholar] [CrossRef]
- Ramos, J. ; others. Using tf-idf to determine word relevance in document queries. Proceedings of the first instructional conference on machine learning. Citeseer, 2003, Vol. 242,1, pp. 29–48.
- Trstenjak, B.; Mikac, S.; Donko, D. KNN with TF-IDF based framework for text categorization. Procedia Engineering 2014, 69, 1356–1364. [Google Scholar] [CrossRef]
- Wu, H.C.; Luk, R.W.P.; Wong, K.F.; Kwok, K.L. Interpreting TF-IDF term weights as making relevance decisions. ACM Transactions on Information Systems (TOIS) 2008, 26, 1–37. [Google Scholar] [CrossRef]
- Zhang, W.; Yoshida, T.; Tang, X. A comparative study of TF* IDF, LSI and multi-words for text classification. Expert systems with applications 2011, 38, 2758–2765. [Google Scholar] [CrossRef]
- Tan, C.M.; Wang, Y.F.; Lee, C.D. The use of bigrams to enhance text categorization. Information processing & management 2002, 38, 529–546. [Google Scholar]
- Hirst, G.; Feiguina, O. Bigrams of syntactic labels for authorship discrimination of short texts. Literary and Linguistic Computing 2007, 22, 405–417. [Google Scholar] [CrossRef]
- Hamed, A.A.; Ayer, A.A.; Clark, E.M.; Irons, E.A.; Taylor, G.T.; Zia, A. Measuring climate change on Twitter using Google’s algorithm: Perception and events. International Journal of Web Information Systems 2015, 11, 527–544. [Google Scholar] [CrossRef]
- Kitsak, M.; Ganin, A.A.; Eisenberg, D.A.; Krapivsky, P.L.; Krioukov, D.; Alderson, D.L.; Linkov, I. Stability of a giant connected component in a complex network. Physical Review E 2018, 97, 012309. [Google Scholar] [CrossRef]
- Hamed, A.A.; Crimi, A.; Misiak, M.M.; Lee, B.S. Establishing Trust in ChatGPT BioMedical Generated Text: An Ontology-Based Knowledge Graph to Validate Disease-Symptom Links, 2023. arXiv:cs.AI/2308.03929.






| Data Source | Number of Bigram Overlaps | Percent to Self % | Similarity to Training% |
|---|---|---|---|
| Fold-1 | 202 | 0.21 | 0.22 |
| Fold-2 | 183 | 0.19 | 0.20 |
| Fold-3 | 178 | 0.22 | 0.19 |
| Fold-4 | 178 | 0.20 | 0.19 |
| Fold-5 | 180 | 0.21 | 0.20 |
| Fold-6 | 180 | 0.19 | 0.20 |
| Fold-7 | 179 | 0.19 | 0.19 |
| Fold-8 | 181 | 0.20 | 0.20 |
| Fold-9 | 184 | 0.20 | 0.20 |
| Fold-10 | 201 | 0.23 | 0.22 |
| GPT-Test | 81 | 0.16 | 0.09 |
| Index | Data Source | No. Nodes | No. Edges | No. Connected Components | Connected Components Percent |
|---|---|---|---|---|---|
| 0 | F-1 | 1086 | 1624 | 855 | 0.67 |
| 1 | F-2 | 1078 | 1625 | 867 | 0.69 |
| 2 | F-3 | 1019 | 1494 | 777 | 0.51 |
| 3 | F-4 | 1058 | 1570 | 827 | 0.61 |
| 4 | F-5 | 1068 | 1533 | 825 | 0.61 |
| 5 | F-6 | 1102 | 1624 | 860 | 0.68 |
| 6 | F-7 | 1077 | 1593 | 871 | 0.70 |
| 7 | F-8 | 1108 | 1580 | 846 | 0.65 |
| 8 | F-9 | 1108 | 1580 | 846 | 0.65 |
| 9 | F-10 | 1075 | 1540 | 848 | 0.65 |
| 10 | ChatGPT | 801 | 1312 | 632 | 0.23 |
| Fold | F-1 | F-2 | F-3 | F-4 | F-5 | F-6 | F-7 | F-8 | F-9 | F-10 |
|---|---|---|---|---|---|---|---|---|---|---|
| Avg. Bigrams | 24.93 | 25.12 | 21.57 | 23.80 | 22.49 | 22.91 | 23.52 | 22.65 | 23.73 | 21.96 |
| Source | Bigram-1 | Bigram-2 | Bigram-3 | Bigram-4 | Bigram-5 |
|---|---|---|---|---|---|
| ChatGPT | AD patients | Cognitive impairment | older adults | Alzheimer’s disease | Risk factors |
| Real Bibs | Alzheimer’s disease | Disease AD | AD patients | Cognitive impairment | Increased risk |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).