Search | Preprints.org

Preprint ARTICLE | doi:10.20944/preprints202304.0350.v3

Improving Detection of ChatGPT-Generated Fake Science Using Real Publication Text: Introducing xFakeBibs a Supervised-Learning Network Algorithm

Ahmed Abdeen Hamed

Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: ChatGPT; Generative AI; Fake Publications; Human-Generated Publications; Supervised Learning; ML Algorithm; Fake Science; NeoNet Algorithm

Online: 18 August 2023 (11:19:23 CEST)

Show abstract| Download PDF| Share

Background: ChatGPT is becoming a new reality. Where do we go from here? Objective: to show how we can distinguish ChatGPT-generated publications from counterparts produced by scientist. Methods:By means of a new algorithm, called xFakeBibs, we show the significant difference between ChatGPT-generated fake publications and real publications. Specifically, we triggered ChatGPT to generate 100 publications that were related to Alzheimer’s disease and comorbidity. Using TF-IDF, using the real publications, we constructed a training network of bigrams comprised of 100 publications. Using 10-folds of 100 publications each, we also 10 calibrating networks to derive lower/upper bounds for classifying articles as real or fake. The final step was to test xFakeBibs against each of the ChatGPT-generated articles and predict its class. The algorithm successfully assigned the POSITIVE label for real ones and NEGATIVE for fake ones. Results: When comparing the bigrams of the training set against all the other 10 calibrating folds, we found that the similarities fluctuated between (19%-21%). On the other hand, the mere bigram similarity from the ChatGPT was only (8%). Additionally, when testing how the various bigrams generated from the calibrating 10-folds against ChatGPT we found that all 10 calibrating folds contributed (51%-70%) of new bigrams, while ChatGPT contributed only 23%, which is less than 50% of any of the other 10 calibrating folds. The final classification results using the xFakeBibs set a lower/upper bound of (21.96-24.93) number of new edges to the training mode without contributing new nodes. Using this calibration range, the algorithm predicted 98 of the 100 publications as fake, while 2 articles failed the test and were classified as real publications. Conclusions: This work provided clear evidence of how to distinguish, in bulk ChatGPT-generated fake publications from real publications. Also, we also introduced an algorithmic approach that detected fake articles with a high degree of accuracy. However, it remains challenging to detect all fake records. ChatGPT may seem to be a useful tool, but it certainly presents a threat to our authentic knowledge and real science. This work is indeed a step in the right direction to counter fake science and misinformation.

Preprint ARTICLE | doi:10.20944/preprints202106.0482.v3

Fighting the COVID-19 Infodemic in News articles and False Publications: The NeoNet Text Classifier, a Supervised Machine Learning Algorithm

Mohammad AR Abdeen, Ahmed Abdeen Hamed, Xindong Wu

Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: COVID-19 Infodemic; Text Classification; TFIDF Features; Network Training modes; Supervised Learning; Misinformation; News Classification; False Publications; PubMed; Anomaly Detection

Online: 26 July 2021 (12:06:04 CEST)

Show abstract| Download PDF| Share

Preprint ARTICLE | doi:10.20944/preprints202208.0305.v1

Mining Literature-Based Knowledge Graph for Predicting Combination Therapeutics: A COVID-19 Use Case

Ahmed Abdeen Hamed, Jakub Jonczyk, Mohammad Zaiyan Alam, Ewa Deelman, Byung Suk Lee

Subject: Medicine And Pharmacology, Immunology And Allergy Keywords: drug repurposing; combination therapeutics; PubMed; ChEBI; disease ontology; gene ontology; drug interaction; MeSH terms; COVID-19

Online: 17 August 2022 (05:51:53 CEST)

Show abstract| Download PDF| Share

Search Results

3 articles found