Preprint
Review

This version is not peer-reviewed.

Artificial Intelligence and Machine Learning in Pharmacovigilance: A Systematic Review of Models for Drug Safety in Polypharmacy

Submitted:

05 February 2026

Posted:

06 February 2026

You are already at the latest version

Abstract
Background: Polypharmacy is increasingly prevalent worldwide and is strongly associated with adverse drug reactions (ADRs) and drug–drug interactions (DDIs). Traditional pharmacovigilance (PV) systems rely heavily on spontaneous reporting and manual signal detection, which suffer from substantial underreporting, delayed signal identification, and limited capacity to manage complex multidrug interactions. Artificial intelligence (AI) and machine learning (ML) offer scalable, data-driven approaches that may enhance the detection, assessment, and prevention of medication-related harms in polypharmacy populations. This systematic review evaluates current AI/ML models in PV for polypharmacy, compares their predictive performance, and identifies key methodological facilitators and challenges. Methods: A systematic review was conducted in accordance with PRISMA 2020 guidelines and a pre-registered PROSPERO protocol (CRD420251134685). Seven electronic databases, including grey literature sources, were searched from inception to 2025. Eligible studies were primary computational modeling investigations that developed AI/ML models using polypharmacy-focused datasets and reported performance using the area under the receiver operating characteristic curve (AUROC) and the area under the precision–recall curve (AUPRC). Study quality was appraised using the Mixed Methods Appraisal Tool (MMAT). Results: Of 7,513 records identified, 23 studies met the inclusion criteria. All studies employed computational modeling designs using large pharmacovigilance and biomedical datasets such as TWOSIDES, Decagon, STITCH, SIDER, and DrugBank. Reported AUROC values ranged from 0.553 to 0.9993, while AUPRC values ranged from 0.112 to 0.999. High-performing models included PU-MLP, DPSP, SimVec, TriVec, and SimplE, many of which achieved AUROC and AUPRC values exceeding 0.95. Key strengths of AI/ML approaches included high predictive accuracy, effective handling of class imbalance, and integration of heterogeneous data sources. Major challenges included data sparsity, limited interpretability, cold-start prediction problems, and a lack of external or clinical validation. Conclusion: AI and ML models demonstrate strong potential to enhance pharmacovigilance in polypharmacy settings by improving the detection of ADRs and DDIs. Future research should prioritize standardized benchmarking, model interpretability, and real-world clinical validation to support safe and effective regulatory and clinical implementation.
Keywords: 
;  ;  ;  ;  ;  

1. Introduction

The World Health Organization defines Pharmacovigilance (PV) as the science and activities related to the detection, assessment, understanding, and prevention of adverse effects and other drug-related problems [1]. Despite advances in medication safety, adverse drug reactions (ADRs) and medication-related harms remain a major global public health concern [2]. A critical limitation of current PV systems is the substantial underreporting of ADRs, with a median global underreporting rate estimated at approximately 94%, which delays signal detection and exposes patients to preventable harm [3].
Polypharmacy, commonly defined as the concurrent use of five or more medications, is a major contributor to medication-related risk, particularly among older adults and patients with multiple chronic conditions [4]. Evidence suggests that up to 70% of geriatric patients experience polypharmacy and that these patients have nearly twice the risk of ADRs compared with non-polypharmacy populations, with a substantial proportion of ADRs considered preventable [5,6]. The complexity of multiple drug combinations, comorbidities, and physiological changes makes the detection of ADRs and DDIs in polypharmacy especially challenging for traditional PV approaches.
Conventional PV systems rely largely on spontaneous reporting, manual review, and rule-based signal detection. While these approaches have contributed to drug safety monitoring, they are reactive, resource-intensive, and poorly suited to the analysis of large, heterogeneous datasets such as electronic health records, molecular interaction networks, and post-marketing safety databases [7,8]. As medication use continues to expand in scale and complexity, these limitations have become increasingly apparent.
Artificial intelligence (AI) and machine learning (ML) have emerged as promising tools to address these challenges. AI refers to computational systems capable of performing tasks that typically require human intelligence, while ML enables algorithms to learn patterns from data and improve performance over time [9]. In pharmacovigilance, AI/ML models can analyze structured and unstructured data, identify complex non-linear relationships, and detect rare or multifactorial adverse events that may be missed by traditional methods [10,11]. Recent studies have demonstrated that advanced ML models, including graph-based neural networks, tensor factorization methods, transformer architectures, and positive–unlabeled learning approaches, can achieve high predictive performance in detecting ADRs and DDIs, particularly in polypharmacy contexts [12,13,14,15].
Despite growing interest and promising results, important methodological and implementation challenges remain. These include limited external validation, poor interpretability of complex models, concerns about data quality and representativeness, and uncertainty regarding integration into clinical and regulatory workflows [16,17,18]. To date, there is no consensus on the most effective AI/ML approaches for pharmacovigilance in polypharmacy populations.
The objective of this systematic review is to comprehensively evaluate existing AI and ML applications in pharmacovigilance with a specific focus on polypharmacy. The review aims to (1) identify and classify AI/ML models used in polypharmacy-focused PV, (2) assess their effectiveness using standardized performance metrics, and (3) synthesize reported advantages, challenges, and future directions for research and practice.

2. Materials and Methods

Study Design
This study employed a mixed-methods systematic review design, integrating quantitative performance outcomes with a qualitative synthesis of reported advantages and challenges. The review followed the PRISMA 2020 reporting guidelines [19].
Research Questions
  • What AI and ML models are currently used in pharmacovigilance to detect and prevent ADRs and DDIs in polypharmacy settings?
  • How effective are these models based on reported AUROC and AUPRC metrics?
  • What are the key advantages, limitations, and implementation challenges of AI/ML-based pharmacovigilance systems?
The PICO framework is structured [20] as the following: P (Population): Polypharmacy patients; I (Intervention): Use of artificial intelligence (AI) and machine learning (ML) in pharmacovigilance; C (Comparator): Traditional pharmacovigilance methods or no AI/ML use, or other AI/ML models; and (Outcome): Improved detection, assessment, and prevention of ADRs and DDIs; implementation, advantages and challenges of AI/ML.
Eligibility Criteria
The inclusion criteria for this review included primary computational modeling studies that used polypharmacy-related datasets and focused on developing original artificial intelligence or machine learning models. Eligible studies were required to report key performance metrics, including AUROC and AUPRC, and be published in English between 2015 and 2025. Exclusion criteria included reviews, editorials, commentaries, opinion pieces, non-computational studies, research not centered on polypharmacy, studies lacking the required performance metrics, and conference abstracts without full-text availability.
Effect Measures
The primary performance metrics assessed were the Area Under the Receiver Operating Characteristic Curve (AUROC) and the Area Under the Precision-Recall Curve (AUPRC). AUROC reflects the overall discriminative ability of a binary classification model, indicating how well it distinguishes between positive and negative cases across different thresholds [21]. Its values range from 0 to 1, where a score close to 1 denotes excellent discrimination, 0.5 suggests random guessing, and values near 0 indicate poor performance. AUPRC, on the other hand, is particularly informative for imbalanced datasets, such as those encountered in pharmacovigilance, where true ADRs and DDIs are rare. Unlike AUROC, AUPRC accounts for precision and recall, with baseline performance equal to the prevalence of the positive class (e.g., 8% positives → baseline ≈ 0.08). Higher AUPRC values indicate better model performance, with 1.0 representing perfect precision and recall. Given the challenges posed by imbalanced data in polypharmacy research, AUPRC was considered particularly relevant for evaluating AI/ML models in this context.
In addition, this study carried out a narrative synthesis of the advantages and challenges of AI and/or ML.
Information Sources and Search Strategy
Seven electronic databases (PubMed, EMBASE, CINAHL, Web of Science, Scopus, ProQuest Central, and ProQuest Dissertations and Theses Global) were searched from inception to 2025. Search strategies combined controlled vocabulary and keywords across three domains: (1) artificial intelligence and machine learning, (2) pharmacovigilance and drug safety, and (3) polypharmacy. Boolean operators were used to optimize sensitivity and specificity.
Study Selection and Data Management
All retrieved records were imported into EndNote for deduplication and then screened using Rayyan. Title and abstract screening was followed by full-text review by independent reviewers, with disagreements resolved by consensus.
Data Extraction
Data extracted included authors, year, country, datasets used, model type, performance metrics (AUROC and AUPRC), and reported advantages and challenges.
Quality Assessment
Study quality was assessed using the Mixed Methods Appraisal Tool (MMAT) 2018 (23). No study was excluded based on the quality assessment.

3. Results

Study Selection and Characteristics
The literature search identified 7,513 records. After removal of duplicates and screening, 23 studies met all inclusion criteria and were included in the final synthesis [24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46] (Figure 1).
All included studies employed computational modeling designs and were published between 2018 and 2025 (Table 1). Studies originated from Asia, North America, and Europe, with the highest representation from China, the United States, and Iran (Figure 2). Commonly used datasets in the studies to represent DDIs and ADRs included TWOSIDES, Decagon, STITCH, SIDER, OFFSIDES, and DrugBank. Many studies integrated additional biological data such as protein–protein interaction networks and chemical structure representations. Biological interaction databases/datasets such as STITCH, Protein–Protein Interaction (PPI) datasets, KEGG and PubChem were utilized in the studies. Furthermore, some studies used LINCS L1000, SMILES-based chemical datasets, and Inxight Drugs API to cover gene expression.
AI/ML Models and Performance
A wide range of AI/ML architectures were identified, including graph neural networks, tensor factorization models, embedding-based approaches, transformer-based models, and positive–unlabeled learning frameworks (Figure 3, Table 2). AUROC values ranged from 0.553 to 0.9993, while AUPRC values ranged from 0.112 to 0.999. The highest-performing models included PU-MLP (AUROC 0.992; AUPRC 0.999), DPSP (AUROC up to 0.999; AUPRC up to 0.977), and SimVec (AUROC 0.975; AUPRC 0.968) (Figure 4). These models demonstrated strong ability to detect ADRs and DDIs despite severe class imbalance. Lower performance was observed in cold-start scenarios involving previously unseen drug combinations.
Advantages and Challenges
The advantages and challenges associated with applying AI and ML in polypharmacy-focused pharmacovigilance were systematically identified from the included studies and are summarized in Table 3. Reported advantages include high predictive accuracy and enhanced sensitivity to rare adverse events, effective handling of imbalanced datasets, and the ability to integrate heterogeneous pharmacological and biological data sources. These models also demonstrate scalability, enabling analysis of large datasets and supporting high-throughput screening. However, several challenges persist. Model performance is highly dependent on data quality and completeness, and the interpretability of complex algorithms remains limited, posing barriers to clinical adoption. Additionally, reduced effectiveness in cold-start scenarios, lack of external and clinical validation, and practical constraints related to integration into routine clinical workflows continue to hinder widespread implementation.
Quality Assessment
To assess the quality of the included studies in the systematic review, the Mixed Methods Appraisal Tool (MMAT) was used because it enables standardized, consistent quality appraisal across studies with heterogeneous methodologies. Table 4 in the appendix shows full details of the assessment. 22 of 23 studies met all criteria and received a full score. This indicates strong methodological quality with fulfillment of all appraisal criteria. However, only one study received a score of 4 out of 5 because it did not meet one of the criteria. No study was excluded due to methodological weakness.

4. Discussion

This systematic review evaluated the current applications of artificial intelligence and machine learning in pharmacovigilance, focusing on their capacity to detect, assess, and predict adverse drug reactions and drug-drug interactions in polypharmacy contexts. Among the 23 studies reviewed, AI/ML models consistently exhibited strong predictive performance in identifying and preventing ADRs, DDIs and other medication-related issues in data from polypharmacy patients, with most reporting AUROC and AUPRC values exceeding 0.90. These results support the expanding literature that identifies AI as a valuable computational complement to traditional pharmacovigilance systems, providing increased sensitivity, earlier detection of safety signals, and improved management of complex polypharmacy data [47,48]. This advancement is especially significant given the persistent limitations of spontaneous reporting systems, which are affected by substantial underreporting estimated at nearly 94% globally and consequent delays in recognizing clinically significant ADRs [3].
Beyond demonstrating high predictive performance across studies, this review identified that AI and machine learning systems were developed using a range of computational tools and programming environments. The majority of models utilized Python, with additional reliance on frameworks such as TensorFlow and the Adam optimizer. These platforms enabled the creation of diverse architectures capable of distinguishing complex risk signals within high-dimensional polypharmacy datasets. Furthermore, the findings support the increasing consensus that traditional pharmacovigilance approaches, which rely primarily on spontaneous reporting, are insufficient for the timely detection of medication-related harms, especially among populations exposed to multiple drugs simultaneously.
The performance of these models underscores the significant analytical advances enabled by contemporary AI architectures. Notably, models such as TriVec, SimplE, SimVec, and PU-MLP achieved high effectiveness by integrating multiple sources of biomedical data, including chemical fingerprints, drug-target interactions, phenotype-based side effects, and molecular similarity embeddings. This multidimensional approach enabled the identification of interaction patterns that are often undetectable through human evaluation or conventional statistical methods. These results are consistent with broader AI-in-healthcare research demonstrating that combining diverse data streams enhances predictive accuracy, and they illustrate the ongoing shift toward precision-focused, data-intensive pharmacovigilance frameworks [49].
Both AUROC and AUPRC were consistently reported as primary evaluation metrics across all included studies. AUPRC was identified as more clinically meaningful for polypharmacy risk prediction, as adverse drug reactions and harmful interactions are rare but critical outcomes in highly imbalanced datasets. AUROC may be artificially inflated due to the prevalence of true negatives, whereas AUPRC, which emphasizes precision and recall in the positive class, provides a more reliable measure of a model’s ability to detect clinically relevant risks while minimizing false positives [50,51]. Among the evaluated models, PU-MLP demonstrated the highest overall performance, followed by DPSP, SimVec, and TriVec. PU-MLP combines Random Forest, clustering techniques, Graph Neural Networks, and Positive Unlabeled learning to address data sparsity and uncertainty inherent in pharmacovigilance datasets, achieving strong predictive capability when assessed using benchmark datasets such as TWOSIDES and OFFSIDES.
Despite general alignment with external literature, several notable challenges and divergences persist. The cold-start problem, characterized by poor model performance when predicting interactions involving novel or underrepresented drugs, remains a significant limitation and reflects issues commonly reported in computational pharmacology. Additionally, none of the reviewed models underwent prospective or external clinical validation, resulting in a substantial gap between high in-silico accuracy and practical, real-world application. The absence of standardized datasets, inconsistent preprocessing methods, and variability in performance metrics further contributed to heterogeneity across studies and hindered precise model comparisons. These observations are consistent with previous systematic reviews that highlight the necessity for harmonized reporting standards and improved benchmarking in AI-driven pharmacovigilance.
Methodologically, the included studies demonstrated several strengths that enhance the internal validity of the evidence base. Most employed clear analytical workflows, reproducible code, publicly available datasets, and robust validation techniques. The high MMAT scores (22 of 23 studies achieving 5/5) reflect strong adherence to computational research standards. Despite this, important methodological limitations persist. Heavy reliance on secondary datasets introduces uncertainties arising from labeling errors, reporting bias, noise, incompleteness, and imbalanced class distributions, all of which significantly affect the reliability of predictions. High-performing deep neural networks and transformer-based models also suffer from limited interpretability, raising ethical concerns regarding transparency and accountability. These challenges underscore the need for explainable AI frameworks that allow clinicians and regulators to understand how predictions are generated.
The analysis identified several advantages of artificial intelligence and machine learning models in improving pharmacovigilance, particularly in detecting complex, rare, or multi-drug interactions that traditional systems frequently overlook. The integration of diverse data sources, including drug-protein interaction networks and chemical structural information, enabled more comprehensive modeling of polypharmacy-related risks. However, numerous studies have indicated that data sparsity, class imbalance, and reliance on incomplete or noisy public datasets adversely affect predictive performance, particularly for rare interactions. These results help clarify the variability observed across models, even when similar datasets and evaluation metrics are employed.
Ethical considerations are also critical for interpreting these findings. In the absence of clinical validation, the premature deployment of AI models may lead to misclassification of ADRs, the overlooking of safety signals, or the perpetuation of biases present in training datasets. Models predominantly trained on data from the United States, Europe, or East Asia may not generalize effectively to underrepresented populations, raising concerns about fairness. Moreover, the lack of standardized performance reporting undermines reproducibility and may hinder regulatory assessment. Addressing these ethical and methodological challenges is essential to ensure the safe and equitable integration of AI tools into clinical and regulatory settings.
Strengths and limitations
This review demonstrates several strengths that enhance the reliability and relevance of its findings. The search strategy was comprehensive and systematically implemented across multiple major scientific databases, ensuring extensive coverage of literature concerning AI and ML applications in pharmacovigilance. Rigorous screening and data extraction procedures were conducted in accordance with PRISMA guidelines, which supports methodological consistency and transparency. Quality appraisal using the MMAT indicated that nearly all included studies achieved the highest possible score, reflecting strong methodological rigor within the primary evidence base. The review synthesizes findings across a range of AI models, datasets, and analytical approaches, providing a detailed overview of the current technological landscape in adverse drug reaction and drug-drug interaction prediction. By integrating results from multiple high-performing architectures and emphasizing both their strengths and limitations, this review offers a balanced, evidence-based assessment of AI's role in contemporary pharmacovigilance.
However, several limitations of this review should be acknowledged. The search was limited to English-language publications, potentially excluding relevant studies in other languages and introducing selection bias. Considerable heterogeneity was present among the included studies regarding model architectures, data sources, data preprocessing pipelines, and performance metrics. This variability restricted the ability to conduct direct comparisons or quantitative synthesis and may have contributed to inconsistencies in reported performance outcomes. The lack of standardized reporting frameworks across AI and ML pharmacovigilance studies further exacerbated this challenge. Furthermore, as all included studies relied on secondary in-silico datasets rather than real-world clinical data, external validity is limited, and findings cannot be readily generalized to patient populations or real-time pharmacovigilance systems. These limitations underscore the need for harmonized methodologies, multilingual search strategies, and rigorous prospective validation in future research.
Recommendations
Based on the challenges and insights identified in this review, several targeted recommendations are proposed to advance AI-enabled pharmacovigilance. Given the strong performance of leading models such as PU-MLP, DPSP, and SimVec, it is essential to evaluate them in real-world clinical settings to determine whether their high AUROC and AUPRC values are reproducible outside controlled research environments. This recommendation aligns with the first research question outlined in the Discussion, underscoring the importance of assessing the effectiveness of these systems in identifying adverse drug reactions and drug-drug interactions within actual patient populations.
Future research should investigate how AI and machine learning models can adapt to evolving medication patterns and the ongoing introduction of new drugs, as highlighted in the second research question. Maintaining model responsiveness to changes in prescribing trends is critical for ensuring long-term accuracy and clinical relevance.
First, prospective clinical validation should be conducted by evaluating AI and machine learning models with real-world data from electronic health records, hospital medication systems, and active surveillance networks to establish external validity. Second, harmonization of datasets, performance metrics, and reporting guidelines is necessary to reduce methodological variability and facilitate accurate cross-study comparisons. Third, explainable AI approaches should be prioritized to improve transparency and clinician trust, especially for high-performing models with opaque internal reasoning. Policymakers and regulators should also establish robust frameworks for the evaluation, approval, and monitoring of AI tools in medication safety to ensure fairness, accountability, and patient protection. Finally, training and capacity-building initiatives for pharmacists, clinicians, and healthcare regulators should be expanded to support the safe and informed integration of AI systems into pharmacovigilance workflows. Simultaneously, improvements in data quality, model interpretability, and clinical usability are essential. The development of AI and machine learning tools that can be seamlessly integrated into routine clinical workflows will help ensure that these technologies effectively support medication safety in polypharmacy. Collectively, these actions will enable AI-enabled pharmacovigilance to transition from theoretical potential to practical, safe, and equitable implementation in real-world healthcare settings.
In response to the challenges identified in this review, two primary research questions are proposed for future investigation:
  • How does the performance of advanced AI and machine learning models, such as PU-MLP, differ when evaluated on real-world clinical data that are noisy, incomplete, and heterogeneous, as opposed to curated benchmark datasets?
  • Does incorporating additional real-world information, such as patient history or prescribing context, enhance the predictive accuracy of AI and machine learning models for rare adverse drug reactions and drug-drug interactions in patients receiving polypharmacy?

5. Conclusions

This review shows that AI and ML models have strong potential to improve pharmacovigilance, particularly in polypharmacy, where the risk of adverse drug reactions and drug–drug interactions is high. Across 23 studies, model performance was consistently strong, indicating reliable prediction of harmful medication combinations. The best-performing models—PU-MLP, DPSP, and SimVec—achieved near-perfect scores, highlighting the ability of advanced algorithms to capture complex drug relationships and predict adverse outcomes with high precision. Common advantages included strong predictive capability, integration of diverse data sources, and scalability for large-scale screening. However, challenges remain, such as dependence on data quality, limited interpretability, lack of external validation, and practical barriers to clinical integration. Overall, AI and ML offer promising tools to enhance medication safety in polypharmacy. Future efforts should focus on improving data quality, model transparency, and real-world validation to enable successful clinical adoption.

Author Contributions

Conceptualization, Visualization, Writing (Original Draft)— KT, MG, RT, MIMI; Writing (Reviewing and Editing)— RA, MIMI; Project Administration— all authors. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

The APC support is provided by Qatar University.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
ADR Adverse drug reactions
DDIs Drug–drug interactions
AI Artificial intelligence
ML Machine learning
PV Pharmacovigilance
AUROC Area under the receiver operating characteristic curve
AUPRC Area under the precision–recall curve

References

  1. World Health Organization. The importance of pharmacovigilance: safety monitoring of medicinal products; WHO: Geneva, 2002. [Google Scholar]
  2. Edwards, IR; Aronson, JK. Adverse drug reactions: definitions, diagnosis, and management. Lancet 2000, 356(9237), 1255–1259. [Google Scholar] [CrossRef]
  3. Hazell, L; Shakir, SAW. Under-reporting of adverse drug reactions: a systematic review. Drug Saf. 2006, 29(5), 385–396. [Google Scholar]
  4. Maher, RL; Hanlon, J; Hajjar, ER. Clinical consequences of polypharmacy in elderly. Expert Opin Drug Saf. 2014, 13(1), 57–65. [Google Scholar] [CrossRef] [PubMed]
  5. Alhawassi, TM; Krass, I; Bajorek, BV; Pont, LG. A systematic review of the prevalence and risk factors for adverse drug reactions in the elderly in the acute care setting. Clin Interv Aging 2014, 9, 2079–2086. [Google Scholar]
  6. Beijer, HJ; de Blaey, CJ. Hospitalisations caused by adverse drug reactions (ADR): a meta-analysis of observational studies. Pharm World Sci. 2002, 24(2), 46–54. [Google Scholar] [CrossRef] [PubMed]
  7. Waller, P; Evans, SJW. A model for the future conduct of pharmacovigilance. Pharmacoepidemiol Drug Saf. 2003, 12(1), 17–29. [Google Scholar]
  8. Bate, A; Evans, SJW. Quantitative signal detection using spontaneous ADR reporting. Pharmacoepidemiol Drug Saf. 2009, 18(6), 427–436. [Google Scholar] [CrossRef]
  9. Russell, S; Norvig, P. Artificial Intelligence: A Modern Approach, 3rd ed.; Pearson, 2010. [Google Scholar]
  10. Harpaz, R; DuMouchel, W; LePendu, P; Bauer-Mehren, A; Ryan, P; Shah, NH. Performance of pharmacovigilance signal-detection algorithms for the FDA adverse event reporting system. Clin Pharmacol Ther. 2013, 93(6), 539–546. [Google Scholar] [PubMed]
  11. Sarker, A; Gonzalez, G. Portable automatic text classification for adverse drug reaction detection via multi-corpus training. J Biomed Inform. 2015, 53, 196–207. [Google Scholar] [PubMed]
  12. Zitnik, M; Agrawal, M; Leskovec, J. Modeling polypharmacy side effects with graph convolutional networks. Bioinformatics 2018, 34(13), i457–i466. [Google Scholar] [CrossRef]
  13. Zhang, W; Chen, Y; Liu, F; Luo, F; Tian, G; Li, X. Predicting drug–drug interactions based on multi-source information fusion. J Cheminform 2017, 9, 3. [Google Scholar]
  14. Deng, Y; Xu, X; Qiu, Y; Xia, J; Zhang, W; Liu, S. A multimodal deep learning framework for predicting drug–drug interactions. Bioinformatics 2020, 36(15), 4316–4322. [Google Scholar] [CrossRef] [PubMed]
  15. Zhou, T; Wang, X; Li, J. PU-learning based framework for polypharmacy side effect prediction. Brief Bioinform. 2025, 26(1), bbad123. [Google Scholar]
  16. Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell. 2019, 1, 206–215. [Google Scholar] [CrossRef]
  17. Doshi-Velez, F; Kim, B. Towards a rigorous science of interpretable machine learning. arXiv 2017, arXiv:1702.08608. [Google Scholar] [CrossRef]
  18. Topol, EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019, 25, 44–56. [Google Scholar] [CrossRef]
  19. Page, MJ; Moher, D; Bossuyt, PM; Boutron, I; Hoffmann, TC; Mulrow, CD; et al. PRISMA 2020 explanation and elaboration: updated guidance and exemplars for reporting systematic reviews. BMJ 2021, 372, n160. [Google Scholar] [CrossRef]
  20. Programme, CAS. How to use the PICO framework: Critical Appraisal Skills Programme. Available online: https://casp-uk.net/pico-framework/.
  21. AUC-ROC curve in Machine Learning: GeeksforGeeks. 2025. Available online: https://www.geeksforgeeks.org/machine-learning/auc-roc-curve/.
  22. Precision-Recall curve in machine learning: GeeksforGeeks. 2025. Available online: https://www.geeksforgeeks.org/machine-learning/precision-recall-curve-ml/.
  23. Hong, QN PP; Fàbregues, S; Bartlett, G; Boardman, F; Cargo, M; Dagenais, P; et al. Mixed Methods Appraisal Tool (MMAT), version 2018.
  24. Dasgupta, S; Jayagopal, A; Hong, ALJ; Mariappan, R; Rajan, V. Adverse Drug Event Prediction Using Noisy Literature-Derived Knowledge Graphs: Algorithm Development and Validation 2021 2021-10-1. e32730 p.
  25. Deng, Z; Xu, J; Feng, Y; Dong, L; Zhang, Y; Deng, Z; et al. MAVGAE: a multimodal framework for predicting asymmetric drug-drug interactions based on variational graph autoencoder. Computer Methods In Biomechanics And Biomedical Engineering 2025, 28(7), 1098–110. [Google Scholar] [CrossRef] [PubMed]
  26. Gromova, AA; Maida, AS. ChemBERTaDDI: Transformer Driven Molecular Structures and Clinical Data for predicting drug-drug interactions. bioRxiv 2025, 2025.01.22, 634309. [Google Scholar] [CrossRef]
  27. Gromova, AA; Maida, AS. ChemBERTaDDI: Transforming Drug-Drug Interaction Prediction with Transformers and Clinical Insights. 2025, 2025-1-25. [Google Scholar]
  28. Burkhardt, HA; Subramanian, D; Mower, J; Cohen, T. Predicting Adverse Drug-Drug Interactions with Neural Embedding of Semantic Predications. AMIA Annual Symposium proceedings AMIA Symposium 2019, 2019, 992–1001. [Google Scholar]
  29. Keshavarz, A; Lakizadeh, A. PU-MLP: A PU-learning based method for polypharmacy side-effects detection based on multi-layer perceptron and feature extraction techniques. Intell Based Med 2025, 12. [Google Scholar] [CrossRef]
  30. Keshavarz, A; Lakizadeh, A; Keshavarz, A; Lakizadeh, A. PU-GNN: A Positive-Unlabeled Learning Method for Polypharmacy Side-Effects Detection Based on Graph Neural Networks. International Journal Of Intelligent Systems 2024. [Google Scholar] [CrossRef]
  31. Kim, E; Nam, H; Kim, E; Nam, H. DeSIDE-DDI: interpretable prediction of drug-drug interactions using drug-induced gene expressions. Journal Of Cheminformatics 2022, 14(1). [Google Scholar] [CrossRef]
  32. Kim, J; Shin, M; Kim, J; Shin, M. A Knowledge Graph Embedding Approach for Polypharmacy Side Effects Prediction. Applied Sciences-Basel 2023, 13(5). [Google Scholar] [CrossRef]
  33. Lakizadeh, A; Babaei, M; Lakizadeh, A; Babaei, M. Detection of polypharmacy side effects by integrating multiple data sources and convolutional neural networks. Molecular Diversity 2022, 26(6), 3193–203. [Google Scholar] [CrossRef] [PubMed]
  34. Lin, S; Zhang, G; Dong-Qing, W; Xiong, Y. DeepPSE: Prediction of polypharmacy side effects by fusing deep representation of drug pairs and attention mechanism2022 2022-10.
  35. Liu, Q; Yao, E; Liu, C; Zhou, X; Li, Y; Xu, M; et al. M2GCN: multi-modal graph convolutional network for modeling polypharmacy side effects. Applied Intelligence 2023, 53(6), 6814–25. [Google Scholar] [CrossRef]
  36. Liu, T; Cui, J; Zhuang, H; Wang, H; Liu, T; Cui, J; et al. Modeling polypharmacy effects with heterogeneous signed graph convolutional networks. Applied Intelligence 2021, 51(11), 8316–33. [Google Scholar] [CrossRef]
  37. Lukashina, N; Kartysheva, E; Spjuth, O; Virko, E; Shpilman, A. SimVec: predicting polypharmacy side effects for new drugs2022 2022-12.
  38. Masumshah, R; Aghdam, R; Eslahchi, C. A neural network-based method for polypharmacy side effects prediction2021 2021.
  39. Masumshah, R; Eslahchi, C; Masumshah, R; Eslahchi, C. DPSP: a multimodal deep learning framework for polypharmacy side effects prediction. Bioinformatics Advances 2023, 3(1). [Google Scholar] [CrossRef] [PubMed]
  40. Lloyd, O; Liu, Y; Gaunt, TR. Fast polypharmacy side effect prediction using tensor factorization. Bioinformatics (Oxford, England) 2024, 40(12). [Google Scholar] [CrossRef] [PubMed]
  41. Ragkousis, A; Flogera, O; Megalooikonomou, V; Ragkousis, A; Flogera, O; Megalooikonomou, V. MFSE: A Meta-Fusion Model for Polypharmacy Side-Effect Prediction with Graph Neural Networks2022 2022. 563–70 p.
  42. Stock, M; De Baets, B. Cold-Start Problems in Data-Driven Prediction of Drug–Drug Interaction Effects2021 2021.
  43. Nováček, V; Mohamed, SK. Predicting Polypharmacy Side-effects Using Knowledge Graph Embeddings. AMIA Joint Summits on Translational Science proceedings AMIA Joint Summits on Translational Science 2020, 2020, 449–58. [Google Scholar]
  44. Wang, N; Taylor, C; Wang, N; Taylor, CO. DrIVeNN: Drug Interaction Vectors Neural Network. Journal Of Computational Biology 2025. [Google Scholar] [CrossRef]
  45. Wang, Y; Ma, H; Zhang, R; Gao, Z. Gorge: graph convolutional networks on heterogeneous multi-relational graphs for polypharmacy side effect prediction2023 2023-4. 6 p.
  46. Zitnik, M; Agrawal, M; Leskovec, J; Zitnik, M; Agrawal, M; Leskovec, J. Modeling polypharmacy side effects with graph convolutional networks. BIOINFORMATICS 2018, 34(13), 457–66. [Google Scholar] [CrossRef]
  47. Rao, PCV. Adverse Drug Reactions and Poly-Pharmacy in Geriatric Patients: A Pharmacovigilance-Based Observational Study. Journal of Population Therapeutics & Clinical Pharmacology (JPTCP) 2025, 32(3), 181–8. [Google Scholar]
  48. Pathuri, V. Revolutionizing Drug Safety: The Role of Artificial Intelligence and Machine Learning in Pharmacovigilance. International Journal of Computer Science and Information Technology Research (IJCSITR) 2025, 6(1), 51–61. [Google Scholar]
  49. Al-Nafjan, A; Aljuhani, A; Alshebel, A; et al. Artificial Intelligence in Predictive Healthcare: A Systematic Review. Journal of Clinical Medicine 2025, 14(19), 6752. [Google Scholar] [CrossRef] [PubMed]
  50. Lee, S. Mastering AUPRC in Biomedical ML: Unlock the Power of Precision-Recall Curves for Enhanced Model Evaluation. In Number Analytics Blog; 30 June 2025. [Google Scholar]
  51. Li, Y. AUROC and AUPRC. Medium. Available online: https://medium.com/@yukims19/auroc-and-auprc-b25183827e5a.
Figure 1. PRISMA flow chart.
Figure 1. PRISMA flow chart.
Preprints 197687 g001
Figure 2. Country Distribution of Computational Studies.
Figure 2. Country Distribution of Computational Studies.
Preprints 197687 g002
Figure 3. Top High-Performance Models.
Figure 3. Top High-Performance Models.
Preprints 197687 g003
Figure 4. Comparison of Model Performance grouped by country origin.
Figure 4. Comparison of Model Performance grouped by country origin.
Preprints 197687 g004
Table 1. Characteristics of the included studies (n=23).
Table 1. Characteristics of the included studies (n=23).
Study number Authors/year Country of study Study design Sample characteristics
Study 1 Zitnik M et al.

2018
United States Computational modeling study
  • ▪ From a heterogeneous polypharmacy dataset: 645 drug, 19,085 protein, 4,651,131 drug-drug interactions (polypharmacy side effects), 715,612 protein-protein interactions, 18,596 protein-drug interactions.
Study 2 Burkhardt H et al.

2020
United States Computational modeling study
  • ▪ TWOSIDES: contained 9,643,506 triples, which included 7,323,790 drug-drug interaction triples, 2,289,960 protein-protein interaction pairs, and 29,756 drug target pairs.
  • ▪ The analysis focused on 963 side effect types (those that occurred in 500 or more drug interaction triples).
Study 3 Nováček V, Mohamed S

2020
Ireland Computational modeling study
  • ▪ The study used the Decagon dataset. The final dataset splits were:
  • ▪ Training data: ~32,000 entities, 967 relations, 4.7 million triples.
  • ▪ Validation data: 643 entities, 963 relations, 459,000 triples.
  • ▪ Testing data: 643 entities, 963 relations, 459,000 triples.
Study 4 Dasgupta S et al.

2021
Singapore and

India
Computational modeling study
  • ▪ ADE prediction: 4,484 drug–disease pairs (EU-ADR + OMOP).
  • ▪ Polypharmacy prediction: ≈ 8,270,000 interaction samples (7.32 M training + 0.91 M testing).
Study 5 Masumshah R et al.

2021
Iran Computational modeling study
  • ▪ TWOSIDES database: 645 drugs and 63,473 drug pairs.
  • ▪ 8934 proteins; 18,690 drug–protein interactions (STITCH).
  • ▪ 10,184 mono side effects (SIDER + OFFSIDES)
  • ▪ 964 polypharmacy side effects used for training and testing.
Study 6 Dewulf P et al.

2021
Belgium Computational modeling study
  • ▪ 645 drugs, 963 adverse effects, 63,473 drug–drug combinations (2% of possible triplets labeled as occurring effects, caused by 70% of all drug pairs).
Study 7 Liu T et al.

2021
China Computational modeling study
  • ▪ Decagon: 645 drugs; 7,795 proteins; 4,576,785 drug-drug interactions; 18,690 drug-protein interactions.
  • ▪ TWOSIDES/PubChem: 548 drugs; 97,168 drug-druginteractions.
Study 8 Lin S et al.

2022
China Computational modeling study
  • ▪ TWOSIDES database: 645 drugs and 63,473 drug pairs (964 polypharmacy side effects).
  • ▪ 10,184 mono side effects from SIDER + OFFSIDES; 7,795 proteins from STITCH.
Study 9 Kim E, Nam H

2022
Republic of Korea Computational modeling study
  • ▪ Dataset 1 (LINCS L1000): 19,156 compounds, 978 landmark genes per signature.
  • ▪ Dataset 2 (TWOSIDES): 4,576,287 interactions, 63,472 drug combinations, 963 side-effect types (only those with ≥ 500 combinations retained).
▪ Dataset 3 (DrugBank):
  • ▪ Version 5.0.0 → 33,497 interactions among 1,129 drugs.
  • ▪ Version 5.1.7 → 782,405 interactions among 2,616 drugs.
Study 10 Lakizadeh A, Babaei M

2022
Iran Computational modeling study
  • ▪ 643 drugs, 964 side effect types, 731 proteins, and 17,435 drug–protein interactions derived from integrated public databases (STITCH, DrugBank, KEGG, PubChem, OFFSIDES, SIDER, and TWOSIDES).
Study 11 Liu Q et al.

2022
China and Singapore Computational modeling study
  • ▪ Conducted experiments on a publicly available dataset compiled by Zitnik M et al.
  • ▪ Drug-drug interactions: From SIDER, OFFSIDES, and TWOSIDES.
  • ▪ Protein-protein Interactions: From the PPI dataset.
  • ▪ Drug-protein Interactions: From STITCH.
Study 12 Ragkousis A et al.

2022
Greece Computational modeling study
  • ▪ BioSNAP-Decagon polypharmacy dataset: 645 drugs; 19,081 proteins; 4,625,608 side-effect edges; 715,612 protein-protein interactions edges.
Study 13 Lukashina N et al.2022 Russia and

Sweden
Computational modeling study
  • ▪ The study used the preprocessed dataset from the Decagon study, which contains 645 drug nodes.
  • ▪ The new "weak nodes split" created for the evaluation consisted of 98 "new" drugs and 20,000 "weak triples"
Study 14 Kim J, Shin M

2023
Republic of Korea Computational modeling study
  • ▪ Publicly available data provided by Zitnik M et al.
  • ▪ 284 drugs linked to ≥ 1 protein; 19,089 proteins (715,612 protein-protein 18,690 drug-protein edges); 14,247 labeled drug pairs × 1308 side effects (+ 244 single drugs labeled with same side effects).
Study 15 Masumshah R, Eslahchi C 2023 Iran Computational modeling study
  • ▪ Databases used: DrugBank, KEGG, PubChem, OFFSIDES, SIDER, TWOSIDES.
  • ▪ Dataset 1: 572 drugs, 37,264 drug–druginteractions, 65 event types, 9991 side effects, 1162 targets, 202 enzymes, 881 chemical substructures, 957 pathways.
  • ▪ Dataset 2: 1258 drugs, 161,770 drug–drug interactions, 100 event types, 1651 targets, 316 enzymes, 2040 chemical substructures.
  • ▪ Dataset 3: 645 drugs, 63,473 drug–druginteractions, 185 adverse effects, 10,184 mono side effects, 8934 targets.
Study 16 Wang Y et al.

2023
China Computational modeling study
  • ▪ 645 drugs and 964 common drug side effects are collected from SIDER database and protein-protein interactions are obtained from STITCH database. Overall, it contains 4,651,131 drug combination-side effect associations.
  • ▪ The study focuses on predicting the 964 commonly occurring types of polypharmacy side effects that each occurred in at least 500 drug combinations.
Study 17 Lloyd O et al.

2024
United Kingdom Computational comparative analysis of tensor factorization (TF) models
  • ▪ Using data provided by Zitnik M et al.
Study 18 Keshavarz H, Lakizadeh A

2024
Iran Computational modeling study
  • ▪ Dataset1 (DrugBank and KEG): 572 drugs, 37,264 drug-drug interactions, and 47 event types.
  • ▪ Dataset2 (TWOSIDES): 645 drugs, 4,649,441 drug-drug interactions, and 963 event types.
Study 19 Gromova A, Maida A

2025
United States Computational modeling study
  • ▪ 645 drugs, 63,473 drug pairs, 964 polypharmacy side effects, 10,184 mono side effects, and 8,934 protein targets.
  • ▪ TWOSIDES database: Total of 4,576,785 labeled associations.
Study 20 Gromova A, Maida A

2025
United States Computational modeling study
  • ▪ 645 drugs, 63,473 drug pairs, 964 polypharmacy side effects, 10,184 mono side effects, and 8,934 protein targets (TWOSIDES + SIDER + OFFSIDES + STITCH + PubChem).
Study 21 Wang N, Taylor C

2025
United States Computational modeling study
  • ▪ Datasets used: (STITCH, SIDER, OFFSIDES, SMILES and Inxight DrugsAPI) compiled and preprocessed by Zitnik M et al.
Study 22 Deng Z et al.

2025
China Computational modeling study
  • ▪ DrugBank version 5.1.10 → 1,974 drugs and 603,816 asymmetric drug–drug interactions. After removing drugs with invalid SMILES or missing fingerprints, they kept 1,752 drugs and 504,468 asymmetric interactions.
Study 23 Keshavarz H, Lakizadeh A

2025
Iran Computational modeling study
  • ▪ Dataset1 (DrugBank, KEGG): 645 drugs, 963 side effects, and 4,649,441 drug-drug side effect associations.
  • ▪ Dataset1 (TWOSIDES, OFFSIDES): 572 drugs, 47 side effects, and 37,264 drug-drug side effect associations.
Table 2. Models and Performance Metrics Outcomes.
Table 2. Models and Performance Metrics Outcomes.
Study Number Authors Year of Publication Model Programs and Applications used AUROC AUPRC
1 Zitnik M et al. 2018 Decagon
  • ▪ Adam optimizer
  • ▪ T-SNE package
  • ▪ 0.872
  • ▪ 0.832
2 Burkhardt H et al. 2020 ESP (4 epochs)
  • ▪ Semantic Vector package
  • ▪ Anaconda2
  • ▪ Python
  • ▪ Scikit-learnpackage
  • ▪ Umap-learn package
  • ▪ Decagon code from GitHub
  • ▪ 0.896
  • ▪ 0.868
ESP (8 epochs)
  • ▪ 0.903
  • ▪ 0.875
3 Nováček V, Mohamed S 2020 TriVec
  • ▪ TensorFlow
  • ▪ Python
  • ▪ 0.975
  • ▪ 0.966
4 Dasgupta S et al. 2021 Weighted TransE
Weighted DeepWalk
  • ▪ Python
  • ▪ Scikit-learn
  • ▪ DeepWalk Implementation
  • ▪ TransE Implementation
  • ▪ Polypharmacy prediction:
  • ▪ Weighted TransE: 0.932
  • ▪ Weighted DeepWalk: 0.935
  • ▪ Weighted TransE: 0.896
  • ▪ Weighted DeepWalk: 0.913
5 Masumshah R et al. 2021 NNPS
  • ▪ PCA (Principal Component Analysis)
  • ▪ SGD (Stochastic Gradient Descent)
  • ▪ 0.966
  • ▪ 0.953
6 Dewulf P et al. 2021 Three-Step Kernel Ridge Regression (3S-KRR)
  • ▪ N/A
  • ▪ Unknown drug-drug effect: 0.957
  • ▪ Unknown drug-drug pair: 0.919
  • ▪ Unknown drug: 0.910
  • ▪ Two unknown drugs: 0.843
  • ▪ Unknown drug-drug effect: 0.557
  • ▪ Unknown drug-drug pair: 0.286
  • ▪ Unknown drug: 0.221
  • ▪ Two unknown drugs: 0.112
7 Liu T et al. 2021 SC-DDIS
  • ▪ TensorFlow
  • ▪ Adam optimizer
  • ▪ 0.947
  • ▪ 0.93
8 Lin S et al. 2022 DeepPSE
  • ▪ N/A
  • ▪ 0.93
  • ▪ 0.92
9 Kim E, Nam H 2022 DeSIDE-DDI
  • ▪ Mordred
  • ▪ Unseen interactions: 0.889
  • ▪ One-unseen interaction: 0.640
  • ▪ Both unseen interaction: 0.553
  • ▪ Unseen interactions: 0.915
  • ▪ One-unseen interaction: 0.627
  • ▪ Both unseen interaction: 0.556
10 Lakizadeh A, Babaei M 2022 PSECNN
  • ▪ Python
  • ▪ TensorFlow
  • ▪ Unbalanced data: 0.901
  • ▪ Balanced data: 0.947
  • ▪ Unbalanced data:0.896
  • ▪ Balanced data: 0.922
11 Liu Q et al. 2022 M2GCN
  • ▪ Adam algorithm
  • ▪ 0.9483
  • ▪ 0.9531
12 Ragkousis A et al. 2022 MFSE
  • ▪ PyTorch
  • ▪ PyTorchGeometric
  • ▪ TorchDrugpackage
  • ▪ 0.951
  • ▪ 0.936
13 Lukashina N et al. 2022 SimVec
  • ▪ RDKit package
  • ▪ PubChem
  • ▪ 0.975
  • ▪ 0.968
14 Kim J, Shin M 2023 Unified Embedding Neural Network
  • ▪ Enrichr
  • ▪ Node2vec
  • ▪ Adam Optimizer
  • ▪ DPU: 0.708
  • ▪ Non-DPU: 0.912
  • ▪ DPU: 0.697
  • ▪ Non-DPU: 0.901
15 Masumshah R, Eslahchi C 2023 DPSP
  • ▪ N/A
  • ▪ DS1: 0.9990
  • ▪ DS2: 0.9993
  • ▪ DS3: 0.9849
  • ▪ DS1: 0.9773
  • ▪ DS2: 0.9633
  • ▪ DS3: 0.9465
16 Wang Y et al. 2023 Gorge
  • ▪ Python
  • ▪ Adam optimizer
  • ▪ 0.822
  • ▪ 0.775
17 Lloyd O et al. 2024 SimplE
ComplEx
DistMult
  • ▪ CentOS-7
  • ▪ Python
  • ▪ PyTorch
  • ▪ SimplE(Selfloops): 0.978
  • ▪ SimplE (non-naïve): 0.973
  • ▪ ComplEx (Selfloops): 0.970
  • ▪ ComplEx (non-naïve): 0.967
  • ▪ DistMult (Selfloops): 0.946
  • ▪ DistMult (non-naïve): 0.938
  • ▪ SimplE (Selfloops) 0.971
  • ▪ SimplE (non-naïve): 0.965
  • ▪ ComplEx(Selfloops): 0.963
  • ▪ ComplEx (Non-naïve): 0.960
  • ▪ DistMult(Selfloops): 0.930
  • ▪ DistMult (Non-naïve): 0.921
18 Keshavarz H, Lakizadeh A 2024 PU-GNN
  • ▪ GaussianNB
  • ▪ 0.977
  • ▪ 0.96
19 Gromova A, Maida A 2025 ChemBERTaDDI
  • ▪ Hugging Face Transformers
  • ▪ ChemBERTa-77M-MLM
  • ▪ Adam optimizer
  • ▪ PCA
  • ▪ Hugging Face Transformers
  • ▪ ChemBERTa-77M-MLM
  • ▪ Adam optimizer
  • ▪ PCA
  • ▪ 0.965
  • ▪ 0.954
20 Gromova A, Maida A 2025 ChemBERTaDDI
  • ▪ Hugging Face Transformers
  • ▪ ChemBERTa-77M-MLM
  • ▪ Adam optimizer
  • ▪ PCA
  • ▪ 0.965
  • ▪ 0.954
21 Wang N, Taylor C 2025 DrIVeNN
  • ▪ Python
  • ▪ Hyperband algorithm
  • ▪ PCA
  • ▪ UMAP
  • ▪ General model: 0.901
  • ▪ CVD-specific model: 0.975
  • ▪ General model: 0.821
  • ▪ CVD-specific model: 0.952
22 Deng Z et al. 2025 MAVAGE-DDI
  • ▪ N/A
  • ▪ 0.971
  • ▪ 0.964
23 Keshavarz H, Lakizadeh A 2025 PU-MLP
  • ▪ Multi-Layer Perceptron
  • ▪ Elkan-Noto PU Classifier
  • ▪ Random Forest Classifier
  • ▪ Adam optimizer
  • ▪ Stochastic Gradient Desentoptmizer
  • ▪ 0.992
  • ▪ 0.999
Table 3. Advantages and Challenges of AI and/or ML.
Table 3. Advantages and Challenges of AI and/or ML.
Study number Authors/ year Advantages of AI and/or ML Challenges of AI and/or ML
1 Zitnik M et al.

2018
  • ▪ Predicts specific side effect types, identifying exact outcomes such as "nausea" or "bleeding."
  • ▪ Integrates PPI, DTI, and DDI data, utilizing a vast biological data network for decision-making.
  • ▪ End-to-end trainable GCN, employing a powerful deep learning model that learns comprehensively in one pass.
  • ▪ Shared parameters aid rare side effect prediction, learning from common effects to improve predictions for rare ones.
  • ▪ Clusters related side effects, grouping similar effects to enhance understanding.
  • ▪ Impossible to physically test all combinations (computational necessity), necessitating AI reliance due to the infeasibility of testing every drug pair on humans.
2 Burkhardt H et al.

2020
  • ▪ Outperformed Decagon (AUC 0.903), demonstrating higher accuracy than the prominent predecessor model.
  • ▪ 43x faster training; significantly simpler (fewer bits), offering vastly increased speed while using minimal memory.
  • ▪ Trainable on commodity hardware, capable of running on standard laptops without needing supercomputers.
  • ▪ Reusable and interpretable embeddings, producing output that is easily adapted for other inquiries.
▪ Training data (mined text) may be low quality, stemming from literature mining that may contain inaccuracies.
  • ▪ Lacks relationship directionality, identifying the existence of an effect without specifying the causative drug.
  • ▪ Ambiguous comparison protocols with previous models, complicating fair comparison due to unclear rules.
3 Nováček V, Mohamed S

2020
  • ▪ Outperforms Decagon (12-16% margin), achieving significantly greater accuracy than the older model.
  • ▪ Novel tensor decomposition embedding, using advanced mathematics to identify superior patterns in the data.
  • ▪ "TriVec" uses multi-part vectors, employing three distinct vectors to describe relationships with greater clarity.
  • ▪ Relies on known polypharmacy data (limited), requiring existing data for learning and thus struggling with unknown variables.
  • ▪ "Weak nodes" reduce predictive accuracy, resulting in poor guesses for drugs with minimal information.
  • ▪ Standard eval ratio (1:1) might be too easy, suggesting the test used for verification may have been overly simple.
  • ▪ Does not learn from chemical structure, ignoring the chemical appearance of the drug.
4 Dasgupta S et al.

2021
  • ▪ Improved graph learning using NLP-derived confidence scores, which enhances the model's learning process by verifying information certainty.
  • ▪ Literature-based KG (SemMedDB) enhanced prediction precision, utilizingmedical literature to improve the accuracy of drug effect forecasts.
  • ▪ Weighted TransE/DeepWalkimproved ADE prediction (8.4% AUC gain), utilizing specific mathematical techniques to significantly boost side effect prediction accuracy.
  • ▪ Open-source code supports reproducibility, allowing the scientific community to freely access and verify results.
  • ▪ Relies on accuracy of NLP-inferred scores, meaning the model's validity is contingent on the correctness of initial language processing tools.
  • ▪ Evaluation limited to two representation methods, lacking a comprehensive assessment across diverse data presentation styles.
  • ▪ Small datasets affect generalizability, potentially limiting performance when applied to new or broader data sources due to limited training volume.
  • ▪ Performance sensitive to noisy data, leading to potential confusion or inaccuracies when input data is inconsistent or incorrect.
  • ▪ Requires cross-domain validation, necessitating testing across various medical fields to ensure universal applicability.
5 Masumshah R et al.
2021
  • ▪ Fast runtime (8h vs 15 days), completing processing tasks significantly faster than legacy models.
  • ▪ Simple architecture with high accuracy, offering a streamlined design that maintains precision while being easier to implement.
  • ▪ Novel feature vectors (merged side effects + interactions), describing drugs through a unique combination of multiple distinct cues.
  • ▪ Efficient dimensionality reduction (PCA), simplifying complex data structures without discarding critical details.
  • ▪ Complex models have high computational costs, requiring excessive time and financial investment to operate.
  • ▪ Limited dataset types (no PPIs), missing crucial biological context such as protein-protein interactions.
  • ▪ Needs validation on other datasets, requiring broader testing to confirm efficacy across different data types.
6 Dewulf P et al.

2021
  • ▪ Unified cold-start prediction tasks, enabling effect prediction for novel drugs lacking historical data.
  • ▪ Fast cross-validation via algebraic shortcuts, utilizingmathematical efficiencies to rapidly verify model accuracy.
  • ▪ Integrated mono-drug side effects/target similarities, comparing drugs to identifypatterns in side effect profiles.
  • ▪ High accuracy (0.957 AUC) for cold-start problems, achieving high correct prediction rates even for new drugs.
  • ▪ Depends on dataset completeness, requiring comprehensive data coverage to function correctly.
  • ▪ Performance declines with cold-start difficulty, struggling to predict effects for highly unique or unusual new drugs.
  • ▪ Limited interpretability/clinical validation, lacking clear explanatory logic and testing on human subjects.
  • ▪ Restricted to binary predictions, limited to simple "yes/no" outputs rather than assessing severity or magnitude.
  • ▪ High computational demand for similarity matrices, requiringsubstantial processing power to compare drug characteristics.
7 Liu T et al.

2021
  • ▪ Signed network classifies positive/negative relations, distinguishing between helpful and harmful interactions.
  • ▪ Structural Balance Theory infers latent features, applying social network principles to estimate unseen relationships.
  • ▪ Integrates multi-modal features, combining different drug information into a single unified model.
  • ▪ Weighted loss handles class imbalance, prioritizing rare side effects to prevent omission.
  • ▪ Decoding matrix reduces noise, filtering out data clutter to clarify analysis.
  • ▪ Excludes protein-protein interactions (PPIs), ignoring protein interactions which are a key analytical component.
8 Lin S et al.

2022
  • ▪ Multi-view fusion captured complementary features, analyzing the problem from multiple perspectives to derive a superior solution.
  • ▪ Self-attention improved weighting and interpretability, focusing on critical details while filtering out irrelevant noise.
  • ▪ Outperformed 5 state-of-the-artbaselines, delivering consistentlysuperior results relative to other models.
  • ▪ PCA improved efficiency and robustness, simplifying data handling while strengthening the model.
  • ▪ Open-source code, ensuring ease of sharing and accessibility.
  • ▪ High model complexity and cost, involving intricate construction and expensive operation.
  • ▪ Limited to existing drug targets, unable to process previously unseen drug types.
  • ▪ Risk of overfitting, potentially memorizing test data rather than learning underlying principles.
  • ▪ Dependent on FAERS data quality, relying on public reports that may contain inconsistencies or errors.
  • ▪ Lack of external clinical validation, requiring real-world evidence to confirm practical utility.
9 Kim E, Nam H

2022
  • ▪ N/A
  • ▪ Data sparsity and imbalance, suffering from insufficient and unevenly distributed data.
  • ▪ Limited gene selection, omitting certain segments of human DNA from analysis.
  • ▪ Heterogeneous cell line noise, resulting from inconsistent and messy laboratory testing conditions.
  • ▪ Difficult to predict transcriptome features from structure, challenging the inference of gene changes solely from drug shape.
10 Lakizadeh A, Babaei M

2022
  • ▪ Integrated multiple heterogeneous drug features, utilizing diverse clues to resolve analytical challenges.
  • ▪ Strong validation via five-fold cross-validation, demonstratingreliability through rigorous testing protocols.
  • ▪ Improved performance on polypharmacy side effects, showing particular strength in predicting interactions involving multiple drugs.
  • ▪ Unbalanced and noisy data, where an excess of negative examples confuses the model.
  • ▪ Missing biological features, failing to incorporate all potential biological components.
  • ▪ Need for stronger feature correlations, requiring better connection of disparate clues to improve results.
11 Liu Q et al.

2022
  • ▪ N/A
  • ▪ Accurate prediction but lacks specific side effect interpretation, identifying problems without specifying the exact nature of the issue.
12 Ragkousis A et al.

2022
  • ▪ First to fuse Molecular, DDI, and PPI+DTI data, combining three major data sources typically kept separate.
  • ▪ Specialized GNN encoders for each source, utilizing specific tools to optimize reading of each data type.
  • ▪ "Meta-fusion" for specific side effects, creating a custom data mix for each distinct side effect.
  • ▪ Solves "Sparse Node" problem (17% AUPRC gain), performing effectively even for drugs with minimal available information.
▪ N/A
13 Lukashina N et al. 2022
  • ▪ SimVec initializes with chemical embeddings (prior knowledge), starting with established chemical knowledge rather than random guesses.
  • ▪ Weighted similarity edges propagate info, sharing information between similar drugs to aid prediction.
▪ 3-step learning process, learning in stages to produce superior results.
  • ▪ Cache-based negative sampling, selecting training examples intelligently to accelerate learning.
  • ▪ Limited scalability due to highnumber of weighted edges, becoming excessively slow and heavy when processing large drug datasets.
14 Kim J, Shin M

2023
  • ▪ Unified embedding integrated single/multi-drug data, combining distinct drug information types to provide a comprehensive analytical perspective.
  • ▪ Virtual drug pairs reduced overfitting, employing synthetic examples to enhance training breadth and prevent rote memorization.
  • ▪ Regularization improved robustness, preventing the model from becoming excessively sensitive to minor data fluctuations.
  • ▪ Better prediction for new drug pairs, effectively estimating outcomes for previously unexamined drug combinations.
  • ▪ Interpretable protein-side effect associations, clarifying the specific reasoning behind predicted side effects.
  • ▪ Limited by training data availability, requiring specific datasets that may not always be accessible for learning.
  • ▪ High computational/GPU requirements, demanding expensive and high-performance hardware infrastructure.
  • ▪ Cannot handle multi-typed edges, struggling with complex relationships involving diverse connection types.
  • ▪ Generalizability depends on KG quality, meaning model success is strictly tied to the reliability of the background knowledge base.
  • ▪ Limited real-world validation, lacking sufficient testing on actual patient data to confirm clinical utility.
15 Masumshah R, Eslahchi C 2023
  • ▪ Integrated 5 heterogeneous features to capture multi-dimensional relationships, employing five distinct data types to elucidate drug interrelations.
  • ▪ N/A
16 Wang Y et al.

2023
  • ▪ N/A
  • ▪ Ignores biochemical properties of drugs, overlooking the actual chemical mechanisms of drug action.
  • ▪ Lacks multi-source data fusion, failing to combine diverse data sources for a more comprehensive answer.
17 Lloyd O et al.

2024
  • ▪ High performance (98.3%) in short time (~4 min), delivering both speed and high accuracy.
  • ▪ Low electricity consumption/carbon footprint, reducing energy usage for better environmental sustainability.
  • ▪ N/A
18 Keshavarz H, Lakizadeh A

2024
  • ▪ PU learning handles data uncertainty, managing missing information intelligently rather than making incorrect assumptions.
  • ▪ "SPP" biclustering extracts optimal features, identifying the best data patterns to prioritize.
  • ▪ GNN captures hidden topological features, utilizing graph technology to uncover concealed connections.
  • ▪ N/A
19 Gromova A, Maida A

2025
  • ▪ Transformer embeddings improved DDI accuracy, utilizingadvanced AI to enhance the understanding of chemical structures.
  • ▪ Pretrained ChemBERTa reduced engineering, leveraging an existing chemistry-aware model to save development time.
  • ▪ 99% variance retention via PCA, efficiently compressing data while preserving essential information.
  • ▪ Outperformed 5 state-of-the-artmodels, demonstrating superior results compared to leading contemporary benchmarks.
  • ▪ Open-source code and data, ensuring free accessibility for public download and utilization.
  • ▪ Extensive computational resources required, necessitatinghigh-performance computing infrastructure for training.
  • ▪ Performance depends on external dataset quality, limiting accuracy to the reliability of the source databases.
  • ▪ Potential info loss in aggregation,risking the omission of important granular details during data summarization.
  • ▪ Frozen embeddings limit fine-tuning, restricting the ability to adjust or retrain the core model components.
  • ▪ No external clinical validation, limiting testing to in-silico environments rather than real-world clinical settings.
20 Gromova A, Maida A

2025
  • ▪ Combined transformer embeddings with clinical features, merging chemical knowledge with real-world medical data for enhanced results.
  • ▪ Captured high-level molecular representations, demonstrating a deep understanding of complex drug molecule structures.
  • ▪ Outperformed multiple baselines (Decagon, DeepWalk, etc.), achieving superior accuracy compared to established competitors.
  • ▪ Scalable and adaptable framework, capable of handling vast datasets and adjusting to new drug compounds.
  • ▪ High memory and computational cost, consuming significant computer memory resources.
  • ▪ Dependent on database completeness, creating knowledge gaps if the underlying database is incomplete.
  • ▪ Limited embedding interpretability, making it difficult to precisely explain the decision-making logic of the model.
  • ▪ No external clinical validation, lacking proof of efficacy in hospital or clinical environments.
  • ▪ Frozen transformer weights, making it difficult to modify the fundamental parts of the model.
21 Wang N, Taylor C

2025
  • ▪ N/A
  • ▪ Small domain-specific dataset, containing too few examples for comprehensive learning.
  • ▪ Evaluated only on known side effects, failing to identify novel or unknown adverse events.
  • ▪ Lacks validation on real-world patient data (EHRs), necessitating testing with actual patient records to ensure reliability.
22 Deng Z et al.

2025
  • ▪ MAVGAE framework predicts asymmetric interactions, recognizing that the effect of Drug A on Drug B differs from B on A.
  • ▪ Multimodal integration enhances accuracy/efficiency, combining various data types to improve intelligence and speed.
  • ▪ Handles heterogeneity well, effectively managing diverse types of information.
  • ▪ Limited availability of asymmetric interaction datasets, making it difficult to find quality data for this specific prediction type.
  • ▪ Hinders comprehensive evaluation, complicating full testing due to data scarcity.
23 Keshavarz H, Lakizadeh A

2025
  • ▪ PU learning addresses uncertainty, effectively managing uncertain data to improve accuracy.
  • ▪ Hybrid feature extraction (Random Forest + GNN), mixing different methods to obtain the best possible data representation.
  • ▪ MLP classifier surpasses other approaches, utilizing a simple yet powerful prediction tool that outperforms competitors.
  • ▪ Simpler and more efficient structure, featuring a streamlined design that enhances operation speed.
  • ▪ N/A
Table 4. Quality assessment summary using MMAT tool - Quantitative non randomized studies. 
Table 4. Quality assessment summary using MMAT tool - Quantitative non randomized studies. 
Screening questions Quantitative non-randomized studies
Study Authors S1. Are there clear research questions? S2. Do the collected data allow to address the research questions? 3.1. Are the participants representative of the target population? 3.2. Are measurements appropriate regarding both the outcome and intervention (or exposure)? 3.3. Are there complete outcome data? 3.4. Are the confounders accounted for in the design and analysis? 3.5. During the study period, is the intervention administered (or exposure occurred) as intended? MMAT Score
Study 1 Zitnik M et al.
2018
Yes Yes Yes Yes Yes Yes Yes 5/5
Study 2 Burkhardt H et al.
2020
Yes Yes Yes Yes Yes Yes Yes 5/5
Study 3 Nováček V, Mohamed S
2020
Yes Yes Yes Yes Yes Yes Yes 5/5
Study 4 Dasgupta S et al.
2021
Yes Yes Yes Yes Yes Yes Yes 5/5
Study 5 MasumshahR et al.
2021
Yes Yes Yes Yes Yes Yes Yes 5/5
Study 6 Dewulf P et al.
2021
Yes Yes Yes Yes Yes Yes Yes 5/5
Study 7 Liu T et al.
2021
Yes Yes Yes Yes Yes Yes Yes 5/5
Study 8 Lin S et al.
2022
Yes Yes Yes Yes Yes Yes Yes 5/5
Study 9 Kim E, Nam H
2022
Yes Yes Yes Yes Yes Yes Yes 5/5
Study 10 LakizadehA, Babaei M
2022
Yes Yes Yes Yes Yes Yes Yes 5/5
Study 11 Liu Q et al.
2022
Yes Yes Yes Yes Yes Yes Yes 5/5
Study 12 RagkousisA et al.
2022
Yes Yes Yes Yes No Yes Yes 4/5
Study 13 LukashinaN et al. 2022 Yes Yes Yes Yes Yes Yes Yes 5/5
Study 14 Kim J, Shin M
2023
Yes Yes Yes Yes Yes Yes Yes 5/5
Study 15 MasumshahR,EslahchiC 2023 Yes Yes Yes Yes Yes Yes Yes 5/5
Study 16 Wang Y et al.
2023
Yes Yes Yes Yes Yes Yes Yes 5/5
Study 17 Lloyd O et al.
2024
Yes Yes Yes Yes Yes Yes Yes 5/5
Study 18 Keshavarz H,LakizadehA
2024
Yes Yes Yes Yes Yes Yes Yes 5/5
Study 19 Gromova A, Maida A
2025
Yes Yes Yes Yes Yes Yes Yes 5/5
Study 20 Gromova A, Maida A
2025
Yes Yes Yes Yes Yes Yes Yes 5/5
Study 21 Wang N, Taylor C
2025
Yes Yes Yes Yes Yes Yes Yes 5/5
Study 22 Deng Z et al.
2025
Yes Yes Yes Yes Yes Yes Yes 5/5
Study 23 Keshavarz H, Lakizadeh A
2025
Yes Yes Yes Yes Yes Yes Yes 5/5
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated