ARTICLE | doi:10.20944/preprints202304.0350.v3
Subject: Computer Science And Mathematics, Artificial Intelligence And Machine Learning Keywords: ChatGPT; Generative AI; Fake Publications; Human-Generated Publications; Supervised Learning; ML Algorithm; Fake Science; NeoNet Algorithm
Online: 18 August 2023 (11:19:23 CEST)
Background: ChatGPT is becoming a new reality. Where do we go from here? Objective: to show how we can distinguish ChatGPT-generated publications from counterparts produced by scientist. Methods:By means of a new algorithm, called xFakeBibs, we show the significant difference between ChatGPT-generated fake publications and real publications. Specifically, we triggered ChatGPT to generate 100 publications that were related to Alzheimer’s disease and comorbidity. Using TF-IDF, using the real publications, we constructed a training network of bigrams comprised of 100 publications. Using 10-folds of 100 publications each, we also 10 calibrating networks to derive lower/upper bounds for classifying articles as real or fake. The final step was to test xFakeBibs against each of the ChatGPT-generated articles and predict its class. The algorithm successfully assigned the POSITIVE label for real ones and NEGATIVE for fake ones. Results: When comparing the bigrams of the training set against all the other 10 calibrating folds, we found that the similarities fluctuated between (19%-21%). On the other hand, the mere bigram similarity from the ChatGPT was only (8%). Additionally, when testing how the various bigrams generated from the calibrating 10-folds against ChatGPT we found that all 10 calibrating folds contributed (51%-70%) of new bigrams, while ChatGPT contributed only 23%, which is less than 50% of any of the other 10 calibrating folds. The final classification results using the xFakeBibs set a lower/upper bound of (21.96-24.93) number of new edges to the training mode without contributing new nodes. Using this calibration range, the algorithm predicted 98 of the 100 publications as fake, while 2 articles failed the test and were classified as real publications. Conclusions: This work provided clear evidence of how to distinguish, in bulk ChatGPT-generated fake publications from real publications. Also, we also introduced an algorithmic approach that detected fake articles with a high degree of accuracy. However, it remains challenging to detect all fake records. ChatGPT may seem to be a useful tool, but it certainly presents a threat to our authentic knowledge and real science. This work is indeed a step in the right direction to counter fake science and misinformation.
ARTICLE | doi:10.20944/preprints202106.0482.v3
Subject: Computer Science And Mathematics, Algebra And Number Theory Keywords: COVID-19 Infodemic; Text Classification; TFIDF Features; Network Training modes; Supervised Learning; Misinformation; News Classification; False Publications; PubMed; Anomaly Detection
Online: 26 July 2021 (12:06:04 CEST)
The spread of the Coronavirus pandemic has been accompanied by an infodemic. The false information that is embedded in the infodemic affects people’s ability to have access to safety information and follow proper procedures to mitigate the risks. This research aims to target the falsehood part of the infodemic, which prominently proliferates in news articles and false medical publications. Here, we present NeoNet, a novel supervised machine learning text mining algorithm that analyzes the content of a document (news article, a medical publication) and assigns a label to it. The algorithm is trained by TFIDF bigram features which contribute a network training model. The algorithm is tested on two different real-world datasets from the CBC news network and Covid-19 publications. In five different fold comparisons, the algorithm predicted a label of an article with a precision of 97-99 %. When compared with prominent algorithms such as Neural Networks, SVM, and Random Forests NeoNet surpassed them. The analysis highlighted the promise of NeoNet in detecting disputed online contents which may contribute negatively to the COVID-19 pandemic.
ARTICLE | doi:10.20944/preprints202208.0305.v1
Subject: Medicine And Pharmacology, Immunology And Allergy Keywords: drug repurposing; combination therapeutics; PubMed; ChEBI; disease ontology; gene ontology; drug interaction; MeSH terms; COVID-19
Online: 17 August 2022 (05:51:53 CEST)
This paper presents a computational approach designed to construct and query a literature-based knowledge graph for predicting novel drug therapeutics. The main objective is to offer a platform that discovers drug combinations from FDA-approved drugs and accelerates their investigations by domain scientists. Specifically, the paper introduced the following algorithms: (1) an algorithm for constructing the knowledge graph from drug, gene, and disease mentions in the biomedical literature; (2) an algorithm for vetting the knowledge graph from drug combinations that may pose a risk of drug interaction; (3) and two querying algorithms for searching the knowledge graph by a single drug or a combination of drugs. The resulting knowledge graph had 844 drugs, 306 gene/protein features, and 19 disease mentions. The original number of drug combinations generated was 2,001. We queried the knowledge graph to eliminate noise generated from chemicals that are not drugs. This step resulted in 614 drug combinations. When vetting the knowledge graph to eliminate the potentially risky drug combinations, it resulted in predicting 200 combinations. Our domain expert manually eliminated extra 54 combinations which left only 146 combination candidates. Our three-layered knowledge graph, empowered by our algorithms, offered a tool that predicted drug combination therapeutics for scientists who can further investigate from the viewpoint of drug targets and side effects.