Submitted:
22 March 2026
Posted:
23 March 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. The Expert Systems Era (1965–1989)
2.1. The Birth of Knowledge Engineering
2.2. Modelling Scientific Discovery
2.3. The Shift from Rules to Data
3. Science Becomes Searchable (1989–2004)
3.1. From Rules to Patterns
3.2. The Preprint Revolution
3.3. The Data Flood
4. When Scholarship Became a Score (2004–2016)
4.1. From Citation Counts to Incentive Systems
4.2. The Robot in the Laboratory
4.3. The Scholarly Stack Takes Shape
5. Science in the Age of Oracles (2016–2022)
5.1. AlphaFold
5.2. ASReview and the Case for Augmentation
5.3. The Open Science Policy
6. The Emergence of Scientific Agents (2022–Present)
6.1. Scientific Agents
6.2. The Prototype Researcher
6.3. The Tool Landscape
6.4. The Integrity Crisis
6.5. The Democratisation
7. The Oldest Argument
7.1. Automation, Augmentation, and the Unstable Proportion
7.2. Preprints, Metrics, and the Scholar Profile
7.3. Historical and Future
7.4. Open Problems
8. Conclusions
References
- Baumgart, M.; Wegmeth, L.; Vente, T.; Beel, J. : Evaluating Sakana’s AI Scientist for autonomous research: Wishful thinking or an emerging reality towards “Artificial Research Intelligence” (ARI)? ACM SIGIR Forum 2025, arXiv:2502.14297. [Google Scholar]
- Bran, A.M. : ChemCrow: Augmenting large language models with chemistry tools. Nature Machine Intelligence 2024, 6, 525–535. [Google Scholar] [CrossRef]
- Buchanan, B.G.; Smith, D.H.; White, W.C. : Applications of artificial intelligence for chemical inference. 22. Automatic rule formation in mass spectrometry by means of the Meta-DENDRAL program. Journal of the American Chemical Society 1976, 98(20), 6168–6178. [Google Scholar] [CrossRef]
- Chen, C. : CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature. Journal of the American Society for Information Science and Technology 2006, 57(3), 359–377. [Google Scholar] [CrossRef]
- Crevier, D. AI: The Tumultuous History of the Search for Artificial Intelligence; Basic Books: New York, 1993. [Google Scholar]
- Fayyad, U.; Piatetsky-Shapiro, G.; Smyth, P. : From data mining to knowledge discovery in databases. AI Magazine 1996, 17(3), 37–54. [Google Scholar]
- FutureHouse: Platform and agent suite: Crow, Owl, Phoenix, Falcon, Finch, Robin. 2024–2025. Available online: https://platform.futurehouse.org.
- Ginsparg, P. : ArXiv at 20. Nature 2011, 476, 145–147. [Google Scholar] [CrossRef]
- GPTZero: GPTZero finds 100 new hallucinations in NeurIPS 2025 accepted papers (2026). Available online: https://gptzero.me/news/neurips/.
- The Fourth Paradigm: Data-Intensive Scientific Discovery; Hey, T., Tansley, S., Tolle, K., Eds.; Microsoft Research: Redmond, 2009. [Google Scholar]
- Hirsch, J.E. : An index to quantify an individual’s scientific research output that takes into account the effect of multiple coauthorship. Proceedings of the National Academy of Sciences 2005, 102(46), 16569–16572. [Google Scholar] [CrossRef] [PubMed]
- Huang, Y.; Chen, Y.; Zhang, H. : Deep research agents: A systematic examination and roadmap. arXiv 2025, arXiv:2506.18096. [Google Scholar]
- Jumper, J. : Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596, 583–589. [Google Scholar] [CrossRef] [PubMed]
- Karthik, V.; Anand, I.S.; Mahanta, U.; Sharma, G. : GScholarLens. arXiv 2025, arXiv:2509.04124. [Google Scholar]
- King, R.D.; Rowland, J.; Oliver, S.G. : The automation of science. Science 2009, 324(5923), 85–89. [Google Scholar] [CrossRef]
- King, R.D.; Liakata, M.; Lu, C.; Oliver, S.G.; Soldatova, L.N. : On the formalization and reuse of scientific research. Journal of the Royal Society Interface 2011, 8(63), 1440–1448. [Google Scholar] [CrossRef]
- Lander, E.S. : Initial sequencing and analysis of the human genome. Nature 2001, 409, 860–921. [Google Scholar]
- Langley, P.; Simon, H.A.; Bradshaw, G.L.; Zytkow, J.M. Scientific Discovery: Computational Explorations of the Creative Processes; MIT Press: Cambridge, MA, 1987. [Google Scholar]
- Liang, W. : Monitoring AI-modified content at scale: A case study on the impact of ChatGPT on AI conference peer reviews. arXiv 2024, arXiv:2403.07183. [Google Scholar]
- Lin, Z. : Hidden prompts in manuscripts exploit AI peer review. arXiv 2025, arXiv:2507.06185. [Google Scholar]
- Lindsay, R.K.; Buchanan, B.G.; Feigenbaum, E.A.; Lederberg, J. Applications of Artificial Intelligence for Organic Chemistry: The DENDRAL Project; McGraw-Hill: New York, 1980. [Google Scholar]
- Lindsay, R.K.; Buchanan, B.G.; Feigenbaum, E.A.; Lederberg, J. : DENDRAL: A case study of the first expert system for scientific hypothesis formation. Artificial Intelligence 1993, 61(2), 209–261. [Google Scholar] [CrossRef]
- Lu, C.; Lu, C.; Lange, R.T.; Foerster, J.; Clune, J.; Ha, D. : The AI Scientist: Towards fully automated open-ended scientific discovery. arXiv 2024, arXiv:2408.06292. [Google Scholar]
- Martín-Martín, A.; Thelwall, M.; Orduna-Malea, E.; Delgado López-Cózar, E. : Google Scholar, Microsoft Academic, Scopus, Dimensions, Web of Science, and OpenCitations’ COCI: A multidisciplinary comparison of coverage via citations. Scientometrics 2021, 126, 871–906. [Google Scholar] [CrossRef]
- McKiernan, E.C. : How open science helps researchers succeed. eLife 2016, 5, e16800. [Google Scholar] [CrossRef] [PubMed]
- Nelson, A. : Ensuring free, immediate, and equitable access to federally funded research. White House OSTP Memorandum, 2022. [Google Scholar]
- OpenAI: GPT-4 Technical Report. arXiv, 2023; arXiv:2303.08774.
- Open Science Collaboration: Estimating the reproducibility of psychological science. Science 2015, 349(6251), aac4716. [CrossRef]
- Park, M.; Leahey, E.; Funk, R.J. : Papers and patents are becoming less disruptive over time. Nature 2023, 613, 138–144. [Google Scholar] [CrossRef] [PubMed]
- Retraction Watch: How to juice your Google Scholar h-index, preprint by preprint (2025). Available online: https://retractionwatch.com/2025/12/08/.
- Schmidgall, S. : Agent Laboratory: Using LLM agents as research assistants (2025). Available online: https://agentlaboratory.github.io/.
- Schmidt, M.; Lipson, H. : Distilling free-form natural laws from experimental data. Science 2009, 324(5923), 81–85. [Google Scholar] [CrossRef]
- Shortliffe, E.H. Computer-Based Medical Consultations: MYCIN; Elsevier: New York, 1976. [Google Scholar]
- TIGER Lab: ScholarCopilot: Training LLMs for academic writing with accurate citations (2025). Available online: https://tiger-ai-lab.github.io/ScholarCopilot/.
- UNESCO. UNESCO Recommendation on Open Science (2021). Available online: https://unesdoc.unesco.org/ark:/48223/pf0000379949.
- van Eck, N.J.; Waltman, L. : Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics 2010, 84(2), 523–538. [Google Scholar] [CrossRef]
- van de Schoot, R. : An open source machine learning framework for efficient and transparent systematic reviews. Nature Machine Intelligence 2021, 3, 125–133. [Google Scholar] [CrossRef]
- Van Noorden, R. : How big is science’s fake-paper problem? Nature 2023, 623(7987), 466–467. [Google Scholar] [CrossRef] [PubMed]
- Villaescusa-Navarro, F. : The Denario project: Deep knowledge AI agents for scientific discovery. arXiv 2025, arXiv:2510.26887. [Google Scholar]
- Waltman, L.; van Eck, N.J. : The inconsistency of the h-index. Journal of the American Society for Information Science and Technology 2012, 63(2), 406–415. [Google Scholar] [CrossRef]
- Wang, H. : Scientific discovery in the age of artificial intelligence. Nature 2023, 620, 47–60. [Google Scholar] [CrossRef]
- Wu, Y.C. : LabClaw: Skill library for autonomous biomedical research (2025). Available online: https://github.com/wu-yc/LabClaw.
| 1 | Biographical details in this paragraph are drawn from van de Schoot’s personal account: https://www.rensvandeschoot.com/story-of-asreview/. |
| 2 | Thammasat University, founded in 1934, is Thailand’s second oldest university. The author is affiliated with the Department of Electrical and Computer Engineering in the Thammasat School of Engineering: https://engr.tu.ac.th. |
| System | Year | Function |
|---|---|---|
| DENDRAL | 1965 | Molecular structure inference from mass spectra |
| Meta-DENDRAL | 1976 | Automated chemical rule learning from data |
| MYCIN | 1976 | Infectious disease diagnosis |
| BACON / DALTON | 1987 | Rediscovery of scientific laws from data |
| Robot Scientist Adam | 2009 | Autonomous hypothesis testing in yeast genomics |
| Component | Year | Function | Access |
|---|---|---|---|
| Preprint servers | |||
| arXiv | 1991 | Preprints: physics, CS, mathematics | Open access |
| SSRN | 1994 | Preprints: social sciences, law, economics | Freea |
| bioRxiv | 2013 | Preprints: biology | Open access |
| PeerJ | 2013 | Open access publishing and preprints | Open access |
| Preprints.org | 2016 | Preprints: multidisciplinary | Open access |
| medRxiv | 2019 | Preprints: health sciences | Open access |
| TechRxiv | 2018 | Preprints: engineering | Open access |
| AgentRxiv | 2025 | Preprints: agent generated research | Open access |
| Data and code repositories | |||
| Dryad | 2008 | Curated research data archiving | Open access |
| figshare | 2012 | Figures, datasets, media, and code | Open access |
| Zenodo | 2013 | General purpose research archive (CERN/EU) | Open access |
| OSF | 2013 | Project management, preregistration, data sharing | Open access |
| Identifiers, search, and metrics | |||
| CrossRef | 2000 | Persistent DOIs for publications | Free |
| Google Scholar | 2004 | Universal citation search and profiles | Free |
| h-index | 2005 | Single number impact metric | — |
| Google Scholar Citations | 2011 | Public, trackable author profiles | Free |
| ORCID | 2012 | Unique author identifiers | Free |
| Semantic Scholar | 2015 | ML powered paper discovery | Free |
| OpenAlex | 2022 | Open catalogue of the global research | Open accessb |
| Framework | Function | Access |
|---|---|---|
| LangChain | LLM orchestration and tool chaining | Open source |
| LangGraph | Stateful multi agent graph workflows | Open source |
| AutoGen (AG2) | Multi agent conversation framework | Open source |
| CrewAI | Role based multi agent workflows | Open source |
| HuggingFace | Model hub, datasets, and inference API | Open source |
| System | Function | Access |
|---|---|---|
| The AI Scientist | Full pipeline: ideation to manuscript | Open source |
| Kosmos (FutureHouse) | Literature synthesis, hypothesis, protocol | Partiala |
| Agent Laboratory | Code, experiments, manuscript drafting | Open source |
| Denario | Data analysis, simulation workflows | Open source |
| AI Researcher (HKUDS) | Automated research generation | Open source |
| LabClaw | 240 and more biomedical lab procedures | Open source |
| System | Function | Access |
|---|---|---|
| Literature and systematic review | ||
| ASReview | Systematic review screening | Open source |
| Elicit | Literature search, claim extraction | Freemium |
| ScholarCopilot | Citation suggestion, draft assistance | Open source |
| Semantic Scholar | ML powered paper discovery | Free |
| Scite.ai | Smart citation context analysis | Freemium |
| ResearchRabbit | Citation based paper discovery | Free |
| Litmaps | Citation network visualisation | Freemium |
| Connected Papers | Visual literature exploration | Free |
| GScholarLens | Position weighted h-index computation | Open source |
| Knowledge mapping | ||
| OpenKnowledgeMaps | Visual research landscape overview | Open source |
| VOSviewer | Bibliometric network mapping | Open source |
| CiteSpace | Emerging trend detection in literature | Open source |
| Domain oracles | ||
| AlphaFold | Protein structure prediction | Open source |
| ChemCrow | Chemistry tool augmented reasoning | Open source |
| BenevolentAI | Drug repurposing and target discovery | Proprietary |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).