Submitted:
09 June 2025
Posted:
10 June 2025
You are already at the latest version
Abstract

Keywords:
1. Introduction
2. Methods
Framework Development Approach
Literature Analysis and Evidence Synthesis
Implementation Science Perspective Analysis
Theoretical Framework Application
Multi-Dimensional Comparative Assessment
Integration Strategy Development
Framework Validation and Testing Considerations
3. Results
3.1. Empirical Evidence on Automated Synthesis Capabilities
3.2. Identified Limitations and Concerns
3.3. EPIS-Guided Integration Framework
3.3.1. Framework Overview
3.3.2. Phase-Specific Implementation Guidance
3.3.3. Core Principles Integration
3.3.4. Implementation Considerations
4. Discussion
4.1. Implications for Implementation Science Practice
4.2. Theoretical Contributions and Field Evolution
4.3. Implementation Challenges and Organizational Considerations

4.4. Framework Limitations and Future Development Needs
4.5. Research Priorities and Future Directions
Conclusion
Acknowledgements
References
- Bauer MS, Damschroder L, Hagedorn H, Smith J, Kilbourne AM. An introduction to implementation science for the non-specialist. BMC Psychology. 2015;3(1):32. [CrossRef]
- Borah R, Brown AW, Capers PL, Kaiser KA. Analysis of the time and workers needed to conduct systematic reviews of medical interventions using data from the PROSPERO registry. BMJ Open. 2017;7(2):e012545. [CrossRef]
- Bastian H, Glasziou P, Chalmers I. Seventy-five trials and eleven systematic reviews a day: how will we ever keep up? PLOS Medicine. 2010;7(9):e1000326. [CrossRef]
- Landhuis E. Scientific literature: Information overload. Nature. 2016;535(7612):457-458. [CrossRef]
- Shojania KG, Sampson M, Ansari MT, Ji J, Doucette S, Moher D. How quickly do systematic reviews go out of date? A survival analysis. Annals of Internal Medicine. 2007;147(4):224-233. [CrossRef]
- Wang Z, Lin H, Zhang P, Yu L, Sun J. TrialMind: a human-machine teaming framework for evidence synthesis. JAMA Network Open. 2024;7(2):e2355683.
- Trad C, Mohamad El-Hajj H, Khanji MYG, Nasr R, Kahale LA, Akl EA. Artificial intelligence for abstract and full-text screening of articles for a systematic review. BMJ Evidence-Based Medicine. 2024;29(3):145-151.
- Sanghera R, Soltan AA, Jaskulski S, et al. Large language model ensemble for systematic review article screening: speed, accuracy and at scale. medRxiv. 2024.
- van de Schoot R, de Bruin J, Schram R, et al. An open source machine learning framework for efficient and transparent systematic reviews. Nature Machine Intelligence. 2021;3(2):125-133. [CrossRef]
- Cao C, Chi G, Ma Z, McCrae C, Bobrovitz N. Using large language models in systematic reviews: a generalizable framework for accelerating literature screening. Journal of Clinical Epidemiology. 2024;167:153-169.
- Damschroder LJ, Aron DC, Keith RE, Kirsh SR, Alexander JA, Lowery JC. Fostering implementation of health services research findings into practice: a consolidated framework for advancing implementation science. Implementation Science. 2009;4:50. [CrossRef]
- Nilsen P. Making sense of implementation theories, models and frameworks. Implementation Science. 2015;10:53. [CrossRef]
- Moullin JC, Dickson KS, Stadnick NA, Rabin B, Aarons GA. Systematic review of the Exploration, Preparation, Implementation, Sustainment (EPIS) framework. Implementation Science. 2019;14(1):1. [CrossRef]
- Tran VT, Riveros C, Ravaud P. Automation of systematic review screening using large language models. Artificial Intelligence in Medicine. 2023;141:102581.
- Guo E, Goh E, Barrow E, et al. Automated paper screening for clinical reviews using large language models: data analysis study. Journal of Medical Internet Research. 2023;25:e48996. [CrossRef]
- Gartlehner G, Wagner G, Grootendorst D, et al. Assessing the accuracy of machine learning assisted abstract screening with Claude: a pilot study. Research Synthesis Methods. 2023;14(6):763-769. [CrossRef]
- Przybyła P, Brockmeier AJ, Kontonatsios G, et al. Prioritising references for systematic reviews with RobotAnalyst: a user study. Research Synthesis Methods. 2018;9(3):470-488. [CrossRef]
- Rathbone J, Hoffmann T, Glasziou P. Faster title and abstract screening? Evaluating Abstrackr, a semi-automated online screening program for systematic reviewers. Systematic Reviews. 2015;4:80. [CrossRef]
- Khan H, Kiong JZH, Das A, et al. Assessment of large language models for data extraction in living systematic reviews. JAMA Network Open. 2025;8(1):e2453892.
- Akl EA, El Khoury R, Khamis AM, et al. The life and death of living systematic reviews: a methodological survey. Journal of Clinical Epidemiology. 2023;156:11-21. [CrossRef]
- Page MJ, McKenzie JE, Bossuyt PM, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372:n71. [CrossRef]
- Kislov R, Pope C, Martin GP, Wilson PM. Harnessing the power of theorising in implementation science. Implementation Science. 2019;14(1):103. [CrossRef]
- Budhwar K, Bitterman A. Towards a more equitable future: understanding and addressing bias in artificial intelligence for healthcare. Journal of Medical Systems. 2022;46(11):81.
- Chiang T, Roberts K, Perer A. Challenges in equity-centered research in machine learning for healthcare. medRxiv. 2022.
- Moullin JC, Dickson KS, Stadnick NA, Rabin B, Aarons GA. Systematic review of the exploration, preparation, implementation, sustainment (EPIS) framework. Implementation Science. 2019;14(1):1-16. [CrossRef]
- Damschroder LJ, Reardon CM, Widerquist MAO, Lowery J. The updated Consolidated Framework for Implementation Research based on user feedback. Implementation Science. 2022;17:75. [CrossRef]
- Powell BJ, Waltz TJ, Chinman MJ, et al. A refined compilation of implementation strategies: results from the Expert Recommendations for Implementing Change (ERIC) project. Implementation Science. 2015;10:21. [CrossRef]
- Khandelwal AR, Adhikari PR, Martinez D, Gordon C. AI-generated content in evidence synthesis: a study on accuracy, hallucination, and citations by large language models. Journal of Medical Internet Research. 2023;25:e49122.
- Morrison A, Polisena J, Husereau D, et al. Reference accuracy in the medical artificial intelligence literature: critical need for improvement. Journal of Medical Internet Research. 2023;25:e46265.
- Tsamados A, Aggarwal N, Cowls J, et al. The ethics of algorithms: key problems and solutions. AI & Society. 2022;37:215-230. [CrossRef]
- Parasuraman R, Manzey DH. Complacency and bias in human use of automation: an attentional integration. Human Factors. 2010;52(3):381-410. [CrossRef]
- Hassan M, Kushniruk A, Borycki E. Barriers to and facilitators of artificial intelligence adoption in health care: scoping review. JMIR Human Factors. 2024;11:e48633. [CrossRef]
- Liu Y, Liu C, Xu H, et al. Assessing the feasibility and accuracy of Claude-2 large language model for data extraction from randomized controlled trials: a proof-of-concept study. JMIR Medical Informatics. 2025;13(1):e52914. [CrossRef]
- Hamel C, Kelly SE, Thavorn K, et al. An evaluation of DistillerSR's machine learning-based prioritization tool for title/abstract screening -- impact on reviewer-relevant outcomes. BMC Medical Research Methodology. 2020;20:256. [CrossRef]
- Gorelik A, Ridley D, Shaffer R, et al. Modeling the cost and effectiveness of text mining for rapid systematic reviews. BMC Medical Informatics and Decision Making. 2020;20:1-9. [CrossRef]
- Ali S, Swarup S, Wang H, et al. Separability as a practical indicator for machine learning automation potential in education-focused systematic reviews. Review of Educational Research. 2025;95(1):105-138.
- Wang Z, Nayfeh T, Tetzlaff J, et al. Error rates of human reviewers during abstract screening in systematic reviews. PLOS ONE. 2020;15(1):e0227742. [CrossRef]
- Ferdinands G, Schram R, de Bruin J, et al. Performance of active learning models for screening prioritization in systematic reviews: a simulation study. Systematic Reviews. 2023;12:38. [CrossRef]
- Ames HMR, Glenton C, Lewin S, et al. Accuracy and efficiency of machine learning-assisted risk-of-bias assessments in "real-world" systematic reviews. BMC Medical Research Methodology. 2022;22:288. [CrossRef]
- Qureshi R, Shaughnessy D, Gill KAR, et al. Are ChatGPT and large language models "the answer" to bringing us closer to systematic review automation? Systematic Reviews. 2023;12:72. [CrossRef]
- Marshall IJ, Wallace BC, Noel-Storr A, et al. Generating living, breathing evidence syntheses: RobotReviewer live for continuous updating. Journal of Clinical Epidemiology. 2022;144:126-133. [CrossRef]
- Shemilt I, Noel-Storr A, Thomas J, et al. Cost and value of different approaches to searching for and identifying studies for a systematic evidence map of research on COVID-19. Research Synthesis Methods. 2021;12(6):742-754. [CrossRef]
- Wallace BC, Small K, Brodley CE, Lau J, Trikalinos TA. Deploying an interactive machine learning system in an evidence-based practice center: abstrackr. In: Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium. 2012:819-823. [CrossRef]
- Oami T, Suwa H, Oshima K, et al. The efficiency and efficacy of large language models for title and abstract screening in systematic reviews. Systematic Reviews. 2024;13(1):49.
- Sun L, Kim CJ, Baumer EP, et al. Using large language models in software engineering: an exploration of use cases and implications. Studies in Big Data. 2025;117:1-28.
- Liu X, Shi J, Maglalang DD, et al. Qualitative data analysis using large language models: an examination of ChatGPT performance. Journal of Technology in Behavioral Science. 2023;8(3):429-439.
- Bittermann A, Grant S. Exploring the potential of large language models in qualitative data analysis: promises and perils. International Journal of Qualitative Methods. 2024;23:1-14.
- Greenhalgh T, Pawson R, Wong G, et al. Realist methods in review in action: the case of complex health interventions. Journal of Advanced Nursing. 2013;69(7):1453-1464. [CrossRef]
- Paparini S, Green J, Papoutsi C, et al. Case study research for better evaluations of complex interventions: rationale and challenges. BMC Medicine. 2020;18(1):301. [CrossRef]
- Touvron H, Lavril T, Izacard G, et al. LLaMA: Open and efficient foundation language models. arXiv preprint. 2023;arXiv:2302.13971. [CrossRef]
- Ji Z, Lee N, Frieske R, et al. Survey of hallucination in natural language generation. ACM Computing Surveys. 2023;55(12):1-38. [CrossRef]
- Owens B. The potential and pitfalls of AI for global health equity. The Lancet Digital Health. 2023;5(3):e116-e117.
- Wieringa S, Engebretsen E, Heggen K, Greenhalgh T. How and why AI explanations matter in medicine: interview study of the sociotechnical context. Journal of Medical Internet Research. 2023;25:e49197. [CrossRef]
- Edwards P, Green S, Clarke M, et al. ADVISE: automated data-driven value of information synthesis to evaluate policy decisions in sustainable development. Frontiers in Research Metrics and Analytics. 2023;8:1123996.
- Bienefeld N, Keller E, Grote G. Human-AI teaming in critical care: a comparative analysis of data scientists' and clinicians' perspectives on AI augmentation and automation. Journal of Medical Internet Research. 2024;26:e50130. [CrossRef]
- Danaher J. The threat of algocracy: reality, resistance and accommodation. Philosophy & Technology. 2016;29:245-268. [CrossRef]
- Farzaneh N, Williamson CA, Gryak J, Najarian K. Collaborative strategies for deploying AI-based physician decision support systems: challenges and deployment approaches. npj Digital Medicine. 2023;6:137.
- Shea BJ, Reeves BC, Wells G, et al. AMSTAR 2: a critical appraisal tool for systematic reviews that include randomised or non-randomised studies of healthcare interventions, or both. BMJ. 2017;358:j4008. [CrossRef]



| Dimension | Benefits (Evidence) | Concerns (Evidence/Considerations) |
|---|---|---|
| Time Efficiency | 50-95% reduction in synthesis time [6,14,15,16,17,18]; screening tasks completed in days vs. months [6] | Quality may be compromised for speed; reduced engagement with literature nuances [44,46] |
| Comprehensiveness | Expanded scope of evidence inclusion; reduced cost for including additional sources [35,36] | Over-inclusion of irrelevant evidence; misinterpretation of diverse study types [44,45] |
| Consistency | Higher inter-rater reliability; reduced variability in application of criteria [37,38,39] | May consistently apply wrong or biased criteria; algorithmic rigidity [30,51] |
| Living Evidence | Enables continuous evidence surveillance and synthesis; supports dynamic adaptation [20,41,42] | Might create information overload; potential for premature adaptation based on single studies [31] |
| Resource Equity | Democratizes access to synthesis capabilities across diverse settings [54,55] | May create new technological divides; requires infrastructure and expertise [52] |
| Contextual Sensitivity | Can process more contextual information than humans when properly directed | Risk of losing critical contextual nuance and implementation-relevant details [11,12,44,45,46,47] |
| Trustworthiness | Standardized, reproducible processes | "Hallucinations" and reference inaccuracies [38,39]; black-box processing [50,51] |
| Stakeholder Engagement | Frees human resources for stakeholder collaboration | May reduce meaningful stakeholder input in synthesis process; technological mediation of evidence [53,56] |
| Human Expertise | Augments human capabilities; handles routine tasks | Risk of skill atrophy; reduced development of critical appraisal abilities [30,31] |
| Equity | Potential for more comprehensive representation of diverse evidence | May amplify existing biases in literature; perpetuate gaps in underrepresented populations [23,24] |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).