Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Extraction of the Relations between Significant Pharmacological Entities in Russian-Language Internet Reviews on Medications

Version 1 : Received: 18 November 2021 / Approved: 19 November 2021 / Online: 19 November 2021 (10:40:10 CET)

How to cite: Sboev, A.; Selivanov, A.; Moloshnikov, I.; Rybka, R.; Gryaznov, A.; Sboeva, S.; Rylkov, G. Extraction of the Relations between Significant Pharmacological Entities in Russian-Language Internet Reviews on Medications. Preprints 2021, 2021110344 (doi: 10.20944/preprints202111.0344.v1). Sboev, A.; Selivanov, A.; Moloshnikov, I.; Rybka, R.; Gryaznov, A.; Sboeva, S.; Rylkov, G. Extraction of the Relations between Significant Pharmacological Entities in Russian-Language Internet Reviews on Medications. Preprints 2021, 2021110344 (doi: 10.20944/preprints202111.0344.v1).

Abstract

Nowadays, an analysis of virtual media to predict society’s reaction to any events or processes is a task of great relevance. Especially it concerns meaningful information on healthcare problems. Internet sources contain a large amount of pharmacologically meaningful information useful for pharmacovigilance purposes and repurposing drug use. An analysis of such a scale of information demands developing the methods that require the creation of a corpus with labeled relations among entities. Before, there have been no such Russian language datasets. This paper considers the first Russian language dataset where labeled entity pairs are divided into multiple contexts within a single text (by used drugs, by different users, by the cases of use, etc.), and a method based on the XLM-RoBERTa language model, previously trained on medical texts to evaluate the state-of-the-art accuracy for the task of indication of the four types of relationships among entities: ADR–Drugname, Drugname–Diseasename, Drugname–SourceInfoDrug, Diseasename–Indication. As shown based on the presented dataset from the Russian Drug Review Corpus, the developed method achieves the F1-score of 81.2% (obtained using cross-validation and averaged for the four types of relationships), which is 7.8% higher than the basic classifiers.

Keywords

pharmacological text corpus; automatic relation extraction; natural language processing; deep learning

Subject

MATHEMATICS & COMPUTER SCIENCE, Artificial Intelligence & Robotics

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our diversity statement.

Leave a public comment
Send a private comment to the author(s)
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.