Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Improving Detection of ChatGPT-Generated Fake Science Using Real Publication Text: Introducing xFakeBibs a Supervised Learning Network Algorithm

Version 1 : Received: 14 April 2023 / Approved: 14 April 2023 / Online: 14 April 2023 (04:41:04 CEST)
Version 2 : Received: 16 April 2023 / Approved: 17 April 2023 / Online: 17 April 2023 (08:12:18 CEST)
Version 3 : Received: 17 August 2023 / Approved: 18 August 2023 / Online: 18 August 2023 (11:19:23 CEST)

How to cite: Hamed, A.A. Improving Detection of ChatGPT-Generated Fake Science Using Real Publication Text: Introducing xFakeBibs a Supervised Learning Network Algorithm. Preprints 2023, 2023040350. https://doi.org/10.20944/preprints202304.0350.v1 Hamed, A.A. Improving Detection of ChatGPT-Generated Fake Science Using Real Publication Text: Introducing xFakeBibs a Supervised Learning Network Algorithm. Preprints 2023, 2023040350. https://doi.org/10.20944/preprints202304.0350.v1

Abstract

Background: ChatGPT is becoming a new reality. Where do we go from here? Objective: to show how we can distinguish ChatGPT-generated publications from counterparts produced by scientist. Methods:By means of a newly devised algorithm, called xFakeBibs, we show the difference in contents and structure of bigram networks generated from ChatGPT fake publications are significantly different from real publications. Specifically, we triggered ChatGPT to generate 100 publications related to Alzheimer’s and comorbidity. Using TF-IDF, we constructed a network of bigrams and compared with 10 other networks constructed from real PubMed publications. Each of those networks were constructed from exactly one of 10 folds. Each fold was equally comprised of 100 publications to ensure fairness. We trained the xFakeBibs algorithm using the 10-folds which were used to test the ChatGPT fake publications. The algorithm successfully assigned the POSITIVE label for real and NEGATIVE for fake ones. Results: When comparing the bigrams of the training set against all the other 10 folds, we found that the similarities fluctuated between (19%-21%). On the other hand, the bigram similarity from the ChatGPT was only (8%). Additionally, when testing how the bigrams alter the structure of the training model, we found that all 10 folds contributed (51%-70%) new, while ChatGPT contributed only 23% which is less than half of any of the other 10 folds. Upon calibrating the xFakeBibs algorithm using the 10-fold real publications, we learned that they contribute 21.96-24.93 number of edges on average. When xFakeBibs classified the individual articles, we found that 98 of the 100 publications were detected as fake, while 2 articles failed the test and were classified as real publications. Conclusions:While it is indeed possible to distinguish, in bulk, a dataset of ChatGPT-generated publications from counterparts of real publications (as was the case for the Alzheimer’s dataset). Classifying individual articles as fake though can be done to a high degree of accuracy, it may not be possible to detect all fake and ChatGPT automatically generated articles. ChatGPT may seem to be a useful tool, but it certainly presents a threat to our knowledge of authentic and real science. This work is indeed a step in the right direction to counter fake science and misinformation.

Keywords

ChatGPT; Generative AI; Fake Publications; Human-Generated Publications; Supervised Learning; ML Algorithm; Fake Science; NeoNet Algorithm

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.