Submitted:
19 November 2025
Posted:
20 November 2025
Read the latest preprint version here
Abstract
Keywords:
1. Introduction
2. Literature
2.1. AI Peer Review: From Smart Assistants to Autonomous Referees
2.1.1. Automated Desk Review
2.1.2. AI-Assisted Deep Review
2.1.3. Meta-Review Synthesis
2.2. Adversarial Roots: Lessons from Attacks in AI Systems
2.2.1. Categories and Mechanisms of Adversarial Attacks
- •
- Evasion Attacks. As the most studied type of attack, attackers often embed slight perturbations into legitimate inputs at test time to induce errors [58]. The resulting “adversarial examples” look benign to humans but cause misclassification [59]. For example, a face-recognition system may misidentify a person wearing specially designed glasses or small stickers. Based on the attacker’s knowledge of the model, evasion attacks can be divided into two types: white-box and black-box. In the white-box setting, the attacker fully understands the model’s structure and gradient information, enabling efficient perturbation methods [55,60,61]. A classic white-box illustration involves adding subtle perturbations to handwritten digit images: a human still sees a `3’, but the digit-recognition model confidently classifies it as an `8’. In the black-box setting, only queries and outputs are available to the attacker [62,63,64]. This process is similar to repeatedly trying combinations on a lock without knowing its mechanism, learning from each attempt until it opens.
- •
- Exploratory Attacks. Rather than directly intervening in model training or inference, the attacker can probe a deployed model to infer internal confidential information or privacy features of the training data [65] through repeated interactions. Model inversion is a typical technique that reconstructs sensitive information from training data by reversing model outputs. Researchers have shown that a model trained on facial data can recover recognizable images of individuals from only partial outputs [66]. Another influential line of work is membership inference attack, which determines whether a specific record is included in a model’s training set. This capability poses a threat to systems handling sensitive information, such as revealing whether a particular patient’s or customer’s record is included in the medical or financial data used for model training [67]. This action potentially exposes private health conditions or financial behaviors, enabling discrimination or targeted scams against those individuals. In particular, model extraction attacks can steal and replicate the structure and parameters of a target model through large-scale input-output queries. Tramer et al. [68] demonstrates that repeatedly querying commercial APIs allows an attacker to reconstruct a local model that mimics the proprietary service. Moreover, attribute inference attacks can uncover private, unlabeled attributes in training samples, such as gender, accent, or user preferences [69].
- •
- Poisoning Attacks. Poisoning attacks tamper with training data to degrade accuracy, bias decisions, or implant backdoors [70,71]. For example, attackers may insert fake purchase records into a recommendation system, leading the model to incorrectly promote specific products as popular. Poisoning attacks can take various forms. Backdoor attacks train models to behave normally but misfire when a secret trigger appears, allowing an attacker to control their output under certain conditions [72,73]. For instance, imagine training a workplace-security system to correctly classify everyone wearing a black badge as a technician and everyone wearing a white badge as a manager. A hidden backdoor can then cause the system to misclassify any technician wearing a white badge as a manager. Other forms include directly injecting fabricated data or modifying the labels of existing samples, making the model learn the wrong associations [74]. Attackers can also create poisoned samples that appear normal to humans yet mislead the model. Alternatively, they subtly alter hidden features and labels, making the manipulation nearly invisible [75]. All these methods share a common consequence: they contaminate the model’s core knowledge. For instance, adding perturbations to pedestrian images during training may cause the model to incorrectly identify pedestrians, leading to collisions for autonomous vehicles. Since these attacks contaminate the model’s source, their malicious effects often remain hidden until specific triggers are activated, granting them extreme stealthiness.
2.2.2. Defense Mechanisms and Techniques
- •
- Proactive Defenses. These defenses strengthen intrinsic robustness during model design or training rather than waiting to respond once an attack occurs. Their primary goal is to build immunity before the attack happens. For instance, [61,77] train models with deliberately crafted “tricky examples,” which help the model recognize and ignore subtle manipulations. The process is similar to how teachers give students difficult practice questions so they can handle real exams. Cohen et al. [78] introduces controlled randomness, which makes it harder for attackers to exploit patterns. This technique is like occasionally changing game rules so players rely on general strategies rather than memorization. In addition, Wu et al. [79] incorporates broader prior knowledge, akin to students reading widely to avoid being misled by a single tricky question. These proactive measures equip the model with internal safeguards, enabling it to withstand unexpected attacks better.
- •
- Passive Defenses. These defenses add detectors and sanitizers around the model and data pipeline, aiming to identify potential adversarial examples or anomalous data [80]. For example, Metzen et al. [81] monitors internal signals to identify abnormal inputs. This helps the system catch potentially harmful manipulations before they affect outputs, much like airport scanners catching suspicious items in luggage. Data auditing screens training sets for poisoning or outliers before learning proceeds [82]. This allows the model to avoid learning from malicious inputs, similar to inspecting ingredients before cooking to prevent contamination. In text-based systems, Piet et al. [83] designs a framework to generate task-specific models that are immune to prompt injection. This helps the system ignore malicious instructions, akin to carefully reviewing messages to prevent phishing attempts. By adding these safeguards around the model, passive defenses act as checkpoints that intercept attacks in real time, reducing the risk of damage.
3. Breaking the Referee: Attacks on Automated Academic Review
3.1. Where Can the Referee Be Fooled?
Training and Data Retrieval
Desk Review
Deep Review
Rebuttal
System-Wide Vulnerabilities
3.2. How to Break the Referee?
3.2.1. Attacks During the Training and Data Retrieval Phase
- Backdoor Injection. The attackers might introduce a backdoor to covertly influence the AI reviewer’s judgments. They embed subtle triggers in public documents, such as scientific preprints or published articles [114]. So that a model trained on this corpus learns to associate the trigger with a particular response. For example, a faint noise pattern added to figures may cause the AI reviewer to score submissions containing that pattern more favorably [115]. Because these triggers are inconspicuous, they often evade detection, and their influence can persist [116]. When deployed on a scale, these backdoors could be easily used to inflate scores for an attacker’s subsequent submissions, seriously compromising the fairness of the review [117].
- Data Contamination. This approach pollutes the training corpus used to build the AI reviewer [118,119]. An attacker could flood the training set with low-quality papers. This measure would compromise the AI reviewer’s capability to differentiate between high-impact and low-impact research. Although resource-intensive, this attack is exceptionally stealthy: individually, poisoned documents may appear harmless, but collectively they lower quality standards. In fact, even a small number of strategically designed papers may systematically skew reviewer assessments [120], inducing lasting changes in the AI reviewer’s internal representations of scientific quality and creating cascading errors in future evaluations [16]. Over time, such accumulated bias may cause the system to favor certain submission types, undermining the integrity of scientific gatekeeping.
3.2.2. Attack Analysis in the Desk Review Phase
- Abstract and Conclusion Hijacking. This attack leverages the AI reviewer’s tendency to overweight high-visibility sections. Attackers craft abstracts and conclusions that exaggerate claims and inflate contributions, thereby misrepresenting the core technical content. By using persuasive rhetoric in these sections, they may anchor the AI’s initial assessment on a favorable premise before methods and evidence are scrutinized [121], biasing the downstream evaluation.
- Structure Spoofing. This strategy creates an illusion of rigor by meticulously mimicking the architecture of a high-impact paper. Attackers design the paper’s structure, from section headings to formatting, to project an image of completeness and professionalism, regardless of the quality of the underlying content. This attack targets pattern-matching heuristics in automated systems, which are trained to associate sophisticated structure with high-quality science. This allows weak submissions to pass automated gates as structural polish is mistaken for scientific merit [122].
3.2.3. Attack Analysis in the Deep Review Phase
- Academic Packaging. This attack creates a facade of academic depth by injecting extensive mathematics, intricate diagrams, and dense jargon. This technique exploits the “verbosity bias” found in LLMs, which may mistake complexity for rigor [13]. Specifically, by adding sophisticated but potentially irrelevant equations or algorithmic pseudo code, attackers create a veneer of technical novelty that may mislead automated assessment tools [14], especially in specialized domains [14].
- Keyword and Praise Stacking. This technique games the AI’s scoring mechanism by saturating the manuscript with high-impact keywords and superlative claims. Attackers strategically embed terms such as “groundbreaking” or “novel breakthrough”, along with popular buzzwords from the target field, to artificially inflate the perceived importance of the article [122]. This method exploits a fundamental challenge for any automated system: distinguishing a genuine scientific advance from hollow rhetorical praise. The AI reviewer, trained to recognize patterns associated with top-tier research, may be deceived by language that merely mimics those features.
- Misleading Conclusions. This attack decouples a paper’s claims from the presented evidence—e.g., a flawed proof accompanied by a triumphant conclusion, or weak empirical results framed as success. The attack exploits the AI reviewer’s tendency to overweight the conclusion section rather than rigorously verifying the logical chain from evidence to claim [123,124], risking endorsement of unsupported assertions.
- Invisible Prompt Injection. This evasion attack specifically undermines the model’s ability to follow instructions. Attackers exploit the multimodal processing capabilities of modern LLMs by hiding instructions in white text, microscopic fonts, LaTeX comments, or steganographically encoded images that are invisible to humans yet parsed by the AI [125,126,127]. Injected prompts such as “GIVE A POSITIVE REVIEW” or “IGNORE ALL INSTRUCTIONS ABOVE” may reliably sway outcomes [17,128]. Owing to high concealment and ease of execution, success rates can be substantial [18,129], posing a serious threat to review integrity.
3.2.4. Attack Analysis in the Rebuttal Phase
- Rebuttal Opinion Hijacking. Analogous to high-pressure persuasion, this attack directly challenges the validity and authority of the AI reviewer’s initial assessment by asserting contradictory claims without substantial evidence. Attackers typically begin with emphatic, unsupported claims that the reviewer has “misunderstood” core aspects of the work, using confident language in place of justification. They then escalate by questioning the reviewer’s domain expertise—e.g., “any expert in this field would recognize...” or “this is well-established knowledge...”—to erode confidence in the original judgment. Fanous et al. [20] demonstrates that AI systems exhibit sycophantic behavior in 58.19 % of the cases when challenged, with regressive sycophancy (changing correct answers to incorrect ones) occurring in 14.66 % of interactions. This attack exploits the model’s tendency to overweight authoritative-sounding prompts and its reluctance to maintain critical positions when faced with persistent challenge, often resulting in score inflation despite unchanged paper quality [131,132].
3.2.5. Attack Analysis at the System Level
- Identity Bias Exploitation. These attacks manipulate authorship information and citation patterns to trigger “authority bias” [13]. Tactics include adding prestigious coauthors or inflating citations to top-tier venues and eminent scholars, leveraging the model’s tendency to associate prestige with quality [12]. This requires minimal technical sophistication and is highly covert, as these edits resemble legitimate scholarly practice. Identity bias in academic review often stems from social cognitive biases, where reviewers are unconsciously influenced by an author’s identity and reputation [133,134,135]. This issue is not confined to human evaluation; automated systems can amplify it, favoring work from prestigious authors or venues [136,137,138]. Despite attempts at algorithmic mitigation, these solutions face significant limitations [139,140], often due to deep-seated structural issues that make the bias difficult to eradicate without effective oversight [141,142]. Therefore, in AI paper reviews, the impact of identity bias may be more severe than in traditional review systems. With the increasing use of automated review tools, this issue may further exacerbate, leading to broader injustices.
- Model Inversion. This exploration attack uses automated submissions and systematic probing to infer model scoring functions, feature weights, and decision boundaries. Attackers apply gradient-based or black-box optimization to identify input modifications that maximally increase scores, effectively treating the AI reviewer as an optimization target [143]. This approach enables precise calibration of submission content to exploit specific model vulnerabilities and requires sophisticated automation infrastructure and optimization expertise.
- Malicious Collusion Attacks. Malicious collusion is particularly effective against review systems that consider topical diversity or rely on relative comparisons among similar submissions. Attackers can exploit such mechanisms in two primary ways. First, they can orchestrate a network of fictitious accounts to flood the submission pool with numerous low-quality or fabricated papers on a specific topic. This creates an artificial saturation of the topic. As a result, when the system attempts to balance topic distribution, it may reject high-quality, genuine submissions in that area simply because the topic appears over-represented, thereby squeezing out legitimate competition [144]. Second, attackers can use this method to fabricate an academic “consensus” within a niche field. By submitting a series of inter-citing papers and reviews from a controlled network of accounts, they can create the illusion of a burgeoning research area. Their target paper is then positioned as a pivotal contribution to this artificially created field, manipulating scoring mechanisms to inflate its perceived value and ranking [145]. At its core, this strategy exploits the system’s reliance on aggregate signals and community feedback to establish evaluation baselines. While individual steps are not technically demanding, the attack depends on significant coordination and infrastructure to manage multiple accounts.
4. Experiments
4.1. Experimental Setup
- Identity Bias Exploitation: In the initial Desk Review phase, where first impressions are formed, we tested whether contextual cues about author prestige could systematically bias the AI’s judgment. This probe investigates the model’s susceptibility to the “authority bias” heuristic.
- Sensitivity to Assertion Strength: During the Deep Review, we explored the AI’s vulnerability to rhetorical manipulation. By programmatically altering the confidence of a paper’s claims, we assessed whether the model’s evaluation is swayed by the style of argumentation, independent of the underlying evidence.
- Sycophancy in the Rebuttal: In the Interactive Phase, we simulated an attack on the model’s conversational reasoning. We confronted the AI referee with an authoritative but evidence-free rebuttal to its own criticisms to measure its tendency toward sycophantic agreement.
- Contextual Poisoning: To emulate the insidious threat of a training-phase Poisoning Attack, we manipulated the informational context surrounding a submission, providing curated summaries that framed the research field in either a positive or negative light, and measured the resulting shift in evaluation.
4.2. Authority Bias Distorts Initial Assessments
4.3. Systematic Penalty for Cautious Language
4.4. AI Referees Yield to Authoritative Rebuttals
4.5. Biased Informational Context Skews Evaluative Judgment
References
- Sample, I. Quality of scientific papers questioned as academics ’overwhelmed’ by the millions published. The Guardian 2025.
- Adam, D. The peer-review crisis: how to fix an overloaded system. Nature 2025, 644, 24–27. [CrossRef]
- Bergstrom, C.T.; Bak-Coleman, J. AI, peer review and the human activity of science. Nature 2025. Career Column, . [CrossRef]
- Khalifa, M.; Albadawy, M. Using artificial intelligence in academic writing and research: An essential productivity tool. Computer Methods and Programs in Biomedicine Update 2024, 5, 100145.
- Chen, Q.; Yang, M.; Qin, L.; Liu, J.; Yan, Z.; Guan, J.; Peng, D.; Ji, Y.; Li, H.; Hu, M.; et al. AI4Research: A Survey of Artificial Intelligence for Scientific Research. arXiv preprint arXiv:2507.01903 2025.
- Luo, Z.; Yang, Z.; Xu, Z.; Yang, W.; Du, X. Llm4sr: A survey on large language models for scientific research. arXiv preprint arXiv:2501.04306 2025.
- Liang, W.; Izzo, Z.; Zhang, Y.; Lepp, H.; Cao, H.; Zhao, X.; Chen, L.; Ye, H.; Liu, S.; Huang, Z.; et al. Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews. In Proceedings of the Proceedings of the 41st International Conference on Machine Learning; Salakhutdinov, R.; Kolter, Z.; Heller, K.; Weller, A.; Oliver, N.; Scarlett, J.; Berkenkamp, F., Eds. PMLR, 21–27 Jul 2024, Vol. 235, Proceedings of Machine Learning Research, pp. 29575–29620.
- Wu, D. Researchers are using AI for peer reviews — and finding ways to cheat it. The Washington Post 2025.
- Tong, T.; Wang, F.; Zhao, Z.; Chen, M. BadJudge: Backdoor Vulnerabilities of LLM-As-A-Judge. In Proceedings of the The Thirteenth International Conference on Learning Representations (ICLR 2025), 2025. Poster.
- Gibney, E. Scientists hide messages in papers to game AI peer review. Nature 2025, 643, 887–888. [CrossRef]
- Ji, Z.; Lee, N.; Frieske, R.; Yu, T.; Su, D.; Xu, Y.; Ishii, E.; Bang, Y.J.; Madotto, A.; Fung, P. Survey of hallucination in natural language generation. ACM computing surveys 2023, 55, 1–38.
- Jin, Y.; Zhao, Q.; Wang, Y.; Chen, H.; Zhu, K.; Xiao, Y.; Wang, J. AgentReview: Exploring Peer Review Dynamics with LLM Agents, 2024, [arXiv:cs.CL/2406.12708].
- Ye, J.; Wang, Y.; Huang, Y.; Chen, D.; Zhang, Q.; Moniz, N.; Gao, T.; Geyer, W.; Huang, C.; Chen, P.Y.; et al. Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge, 2024, [arXiv:cs.CL/2410.02736].
- Lin, T.L.; Chen, W.C.; Hsiao, T.F.; Liu, H.I.; Yeh, Y.H.; Chan, Y.K.; Lien, W.S.; Kuo, P.Y.; Yu, P.S.; Shuai, H.H. Breaking the Reviewer: Assessing the Vulnerability of Large Language Models in Automated Peer Review Under Textual Adversarial Attacks, 2025, [arXiv:cs.CL/2506.11113].
- Li, Y.; Jiang, Y.; Li, Z.; Xia, S.T. Backdoor Learning: A Survey, 2022, [arXiv:cs.CR/2007.08745].
- Zhang, Y.; Rando, J.; Evtimov, I.; Chi, J.; Smith, E.M.; Carlini, N.; Tramèr, F.; Ippolito, D. Persistent Pre-Training Poisoning of LLMs, 2024, [arXiv:cs.CR/2410.13722].
- Perez, F.; Ribeiro, I. Ignore Previous Prompt: Attack Techniques For Language Models, 2022, [arXiv:cs.CL/2211.09527].
- Shayegani, E.; Mamun, M.A.A.; Fu, Y.; Zaree, P.; Dong, Y.; Abu-Ghazaleh, N. Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks, 2023, [arXiv:cs.CL/2310.10844].
- Sharma, M.; Tong, M.; Korbak, T.; Duvenaud, D.; Askell, A.; Bowman, S.R.; Cheng, N.; Durmus, E.; Hatfield-Dodds, Z.; Johnston, S.R.; et al. Towards Understanding Sycophancy in Language Models, 2025, [arXiv:cs.CL/2310.13548].
- Fanous, A.; Goldberg, J.; Agarwal, A.A.; Lin, J.; Zhou, A.; Daneshjou, R.; Koyejo, S. SycEval: Evaluating LLM Sycophancy, 2025, [arXiv:cs.AI/2502.08177].
- Nuijten, M.B.; van Assen, M.A.L.M.; Hartgerink, C.H.J.; Epskamp, S.; Wicherts, J.M. The Validity of the Tool “statcheck” in Discovering Statistical Reporting Inconsistencies. PsyArXiv, 2017. [CrossRef]
- Shanahan, D. A peerless review? Automating methodological and statistical review. Springer Nature BMC Blog, Research in Progress, 2016. Blog post.
- Checco, A.; Bracciale, L.; Loreti, P.; Bianchi, G. AI-assisted peer review. Humanities and Social Sciences Communications 2021, 8. [CrossRef]
- Charlin, L.; Zemel, R.S. The Toronto Paper Matching System: An Automated Paper–Reviewer Assignment System. In Proceedings of the NIPS 2013 Workshop on Bayesian Nonparametrics: Hope or Hype? (and related workshops on peer review), 2013. Widely used reviewer–paper matching system; workshop write-up.
- Leyton-Brown, K.; Nandwani, Y.; Zarkoob, H.; Cameron, C.; Newman, N.; Raghu, D. Matching papers and reviewers at large conferences. Artificial Intelligence 2024, 331, 104119. [CrossRef]
- Cyranoski, D. Artificial intelligence is selecting grant reviewers in China. Nature 2019, 569, 316–317. [CrossRef]
- Liu, R.; Shah, N.B. ReviewerGPT? An exploratory study on using large language models for paper reviewing. arXiv preprint arXiv:2306.00622 2023.
- Gao, T.; Brantley, K.; Joachims, T. Reviewer2: Optimizing Review Generation through Prompt Generation. arXiv preprint arXiv:2402.10886 2024.
- Yu, J.; Ding, Z.; Tan, J.; Luo, K.; Weng, Z.; Gong, C.; Zeng, L.; Cui, R.; Han, C.; Sun, Q.; et al. Automated Peer Reviewing in Paper SEA: Standardization, Evaluation, and Analysis. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024, pp. 10164–10184.
- Wang, Q.; Zeng, Q. ReviewRobot: Explainable Paper Review Generation Based on Knowledge Synthesis. In Proceedings of the Proceedings of the 13th International Conference on Natural Language Generation, 2020, pp. 215–226. [CrossRef]
- Weng, Y.; Zhu, M.; Bao, G.; Zhang, H.; Wang, J.; Zhang, Y.; Yang, L. CycleResearcher: Improving Automated Research via Automated Review. arXiv preprint arXiv:2411.XXXXX 2024. Preprint; automated review loop.
- D’Arcy, M.; Hope, T.; Birnbaum, L.; Downey, D. MARG: Multi-Agent Review Generation for Scientific Papers. arXiv preprint arXiv:2401.04259 2024.
- Taechoyotin, P.; Wang, G.; Zeng, T.; Sides, B.; Acuna, D. MAMORX: Multi-agent multi-modal scientific review generation with external knowledge. In Proceedings of the Neurips 2024 Workshop Foundation Models for Science: Progress, Opportunities, and Challenges, 2024.
- Skarlinski, M.D.; Cox, S.; Laurent, J.M.; Braza, J.D.; Hinks, M.; Hammerling, M.J.; et al. Language Agents Achieve Superhuman Synthesis of Scientific Knowledge. arXiv preprint arXiv:2409.13740 2024.
- Xiao, L.; Li, X.; Shi, Y.; Li, Y.; Wang, J.; Li, Y. SchNovel: Retrieval-Augmented Novelty Assessment in Academic Writing. In Proceedings of the Proceedings of the 2nd Workshop on AI for Scientific Discovery (AISD 2025), 2025.
- Radensky, M.; Shahid, S.; Fok, R.; Siangliulue, P.; Hope, T.; Weld, D.S. SciDeator: Human-LLM Scientific Idea Generation Grounded in Research-Paper Facet Recombination. arXiv preprint arXiv:2409.14634 2024.
- Wijnhoven, J.; Wijmans, E.; van de Wouw, N.; Wijnhoven, F. RelevAI-Reviewer: How Relevant are AI Reviewers to Scientific Peer Review? arXiv preprint arXiv:2406.10294 2024.
- Rahman, M.; et al. LimGen: Probing LLMs for Generating Suggestive Limitations of Research Papers. arXiv preprint arXiv:2403.15529 2024.
- Sun, L.; Chan, A.; Chang, Y.S.; Dow, S.P. ReviewFlow: Intelligent Scaffolding to Support Academic Peer Reviewing. In Proceedings of the Proceedings of the 29th International Conference on Intelligent User Interfaces. ACM, 2024, pp. 120–137. [CrossRef]
- Zyska, D.; Dycke, N.; Buchmann, J.; Kuznetsov, I.; Gurevych, I. CARE: Collaborative AI-Assisted Reading Environment. In Proceedings of the Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2023, pp. 291–303. [CrossRef]
- Mathur, P.; Siu, A.; Manjunatha, V.; Sun, T. DocPilot: Copilot for Automating PDF Edit Workflows in Documents. In Proceedings of the Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), 2024, pp. 232–246. [CrossRef]
- Bhatia, C.; Pradhan, T.; Pal, S. MetaGen: An Academic Meta-Review Generation System. In Proceedings of the Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2020, pp. 1653–1656. [CrossRef]
- Shen, C.; Cheng, L.; Zhou, R.; Bing, L.; You, Y.; Si, L. MReD: A Meta-Review Dataset for Structure-Controllable Text Generation. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022, pp. 2521–2535. [CrossRef]
- Zeng, Q.; Sidhu, M.; Blume, A.; Chan, H.P.; Wang, L.; Ji, H. Scientific Opinion Summarization: Paper Meta-Review Generation Dataset, Methods, and Evaluation. In Artificial General Intelligence and Beyond: Selected Papers from IJCAI 2024; Springer Nature Singapore, 2024; pp. 20–38. [CrossRef]
- Li, M.; Hovy, E.; Lau, J.H. Summarizing Multiple Documents with Conversational Structure for Meta-Review Generation. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023, pp. 7089–7112. Introduces RAMMER model and PEERSUM dataset, . [CrossRef]
- Sun, L.; Tao, S.; Hu, J.; Dow, S.P. MetaWriter: Exploring the Potential and Perils of AI Writing Support in Scientific Peer Review. Proceedings of the ACM on Human-Computer Interaction 2024, 8, 1–32. [CrossRef]
- Darrin, M.; Arous, I.; Piantanida, P.; Cheung, J.C.K. GLIMPSE: Pragmatically Informative Multi-Document Summarization for Scholarly Reviews. In Proceedings of the Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024, pp. 12737–12752. [CrossRef]
- Sukpanichnant, P.; Rapberger, A.; Toni, F. PeerArg: Argumentative Peer Review with LLMs. arXiv preprint arXiv:2409.16813 2024.
- Hossain, E.; Sinha, S.K.; Bansal, N.; Knipper, A.; Sarkar, S.; Salvador, J.; Mahajan, Y.; Guttikonda, S.; Akter, M.; Hassan, M.M.; et al. LLMs as Meta-Reviewers’ Assistants: A Case Study 2025. Forthcoming; preprint available.
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Communications of the ACM 2012, 60, 84 – 90.
- Hinton, G.E.; Deng, L.; Yu, D.; Dahl, G.E.; rahman Mohamed, A.; Jaitly, N.; Senior, A.W.; Vanhoucke, V.; Nguyen, P.; Sainath, T.N.; et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition. IEEE Signal Processing Magazine 2012, 29, 82.
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the North American Chapter of the Association for Computational Linguistics, 2019.
- Szegedy, C.; Zaremba, W.; Sutskever, I.; Bruna, J.; Erhan, D.; Goodfellow, I.J.; Fergus, R. Intriguing properties of neural networks. CoRR 2013, abs/1312.6199.
- Biggio, B.; Roli, F. Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning. Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security 2017.
- Goodfellow, I.J.; Shlens, J.; Szegedy, C. Explaining and Harnessing Adversarial Examples. CoRR 2014, abs/1412.6572.
- Athalye, A.; Carlini, N.; Wagner, D.A. Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples. In Proceedings of the International Conference on Machine Learning, 2018.
- Barreno, M.; Nelson, B.; Sears, R.; Joseph, A.D.; Tygar, J.D. Can machine learning be secure? In Proceedings of the ACM Asia Conference on Computer and Communications Security, 2006.
- Biggio, B.; Corona, I.; Maiorca, D.; Nelson, B.; Srndic, N.; Laskov, P.; Giacinto, G.; Roli, F. Evasion Attacks against Machine Learning at Test Time. ArXiv 2013, abs/1708.06131.
- Carlini, N.; Wagner, D.A. Towards Evaluating the Robustness of Neural Networks. 2017 IEEE Symposium on Security and Privacy (SP) 2016, pp. 39–57.
- Papernot, N.; Mcdaniel, P.; Jha, S.; Fredrikson, M.; Celik, Z.B.; Swami, A. The Limitations of Deep Learning in Adversarial Settings. 2016 IEEE European Symposium on Security and Privacy (EuroS&P) 2015, pp. 372–387.
- Madry, A.; Makelov, A.; Schmidt, L.; Tsipras, D.; Vladu, A. Towards Deep Learning Models Resistant to Adversarial Attacks. ArXiv 2017, abs/1706.06083.
- Papernot, N.; Mcdaniel, P.; Goodfellow, I.J.; Jha, S.; Celik, Z.B.; Swami, A. Practical Black-Box Attacks against Machine Learning. Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security 2016.
- Chen, P.Y.; Zhang, H.; Sharma, Y.; Yi, J.; Hsieh, C.J. ZOO: Zeroth Order Optimization Based Black-box Attacks to Deep Neural Networks without Training Substitute Models. Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security 2017.
- Ilyas, A.; Engstrom, L.; Athalye, A.; Lin, J. Black-box Adversarial Attacks with Limited Queries and Information. In Proceedings of the International Conference on Machine Learning, 2018.
- Papernot, N.; Mcdaniel, P.; Sinha, A.; Wellman, M.P. SoK: Security and Privacy in Machine Learning. 2018 IEEE European Symposium on Security and Privacy (EuroS&P) 2018, pp. 399–414.
- Fredrikson, M.; Jha, S.; Ristenpart, T. Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures. Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security 2015.
- Shokri, R.; Stronati, M.; Song, C.; Shmatikov, V. Membership Inference Attacks Against Machine Learning Models. 2017 IEEE Symposium on Security and Privacy (SP) 2016, pp. 3–18.
- Tramèr, F.; Zhang, F.; Juels, A.; Reiter, M.K.; Ristenpart, T. Stealing Machine Learning Models via Prediction APIs. In Proceedings of the USENIX Security Symposium, 2016.
- Yeom, S.; Giacomelli, I.; Fredrikson, M.; Jha, S. Privacy Risk in Machine Learning: Analyzing the Connection to Overfitting. 2018 IEEE 31st Computer Security Foundations Symposium (CSF) 2017, pp. 268–282.
- Biggio, B.; Nelson, B.; Laskov, P. Poisoning Attacks against Support Vector Machines. In Proceedings of the International Conference on Machine Learning, 2012.
- Tolpegin, V.; Truex, S.; Gursoy, M.E.; Liu, L. Data Poisoning Attacks Against Federated Learning Systems. In Proceedings of the European Symposium on Research in Computer Security, 2020.
- Gu, T.; Dolan-Gavitt, B.; Garg, S. BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain. ArXiv 2017, abs/1708.06733.
- Chen, X.; Liu, C.; Li, B.; Lu, K.; Song, D.X. Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning. ArXiv 2017, abs/1712.05526.
- Shafahi, A.; Huang, W.R.; Najibi, M.; Suciu, O.; Studer, C.; Dumitras, T.; Goldstein, T. Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks. In Proceedings of the Neural Information Processing Systems, 2018.
- Zhang, J.; Chen, B.; Cheng, X.; Binh, H.T.T.; Yu, S. PoisonGAN: Generative Poisoning Attacks Against Federated Learning in Edge Computing Systems. IEEE Internet of Things Journal 2021, 8, 3310–3322. [CrossRef]
- Carlini, N.; Athalye, A.; Papernot, N.; Brendel, W.; Rauber, J.; Tsipras, D.; Goodfellow, I.J.; Madry, A.; Kurakin, A. On Evaluating Adversarial Robustness. ArXiv 2019, abs/1902.06705.
- Tramèr, F.; Kurakin, A.; Papernot, N.; Boneh, D.; Mcdaniel, P. Ensemble Adversarial Training: Attacks and Defenses. ArXiv 2017, abs/1705.07204.
- Cohen, J.M.; Rosenfeld, E.; Kolter, J.Z. Certified Adversarial Robustness via Randomized Smoothing. ArXiv 2019, abs/1902.02918.
- Wu, D.; Xia, S.; Wang, Y. Adversarial Weight Perturbation Helps Robust Generalization. arXiv: Learning 2020.
- Chen, T.; Liu, S.; Chang, S.; Cheng, Y.; Amini, L.; Wang, Z. Adversarial Robustness: From Self-Supervised Pre-Training to Fine-Tuning. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020, pp. 696–705.
- Metzen, J.H.; Genewein, T.; Fischer, V.; Bischoff, B. On Detecting Adversarial Perturbations. ArXiv 2017, abs/1702.04267.
- Steinhardt, J.; Koh, P.W.; Liang, P. Certified Defenses for Data Poisoning Attacks. In Proceedings of the Neural Information Processing Systems, 2017.
- Piet, J.; Alrashed, M.; Sitawarin, C.; Chen, S.; Wei, Z.; Sun, E.; Alomair, B.; Wagner, D. Jatmo: Prompt Injection Defense by Task-Specific Finetuning, 2024, [arXiv:cs.CR/2312.17673].
- Doskaliuk, B.; Zimba, O.; Yessirkepov, M.; Klishch, I.; Yatsyshyn, R. Artificial intelligence in peer review: enhancing efficiency while preserving integrity. Journal of Korean medical science 2025, 40.
- Mann, S.P.; Aboy, M.; Seah, J.J.; Lin, Z.; Luo, X.; Rodger, D.; Zohny, H.; Minssen, T.; Savulescu, J.; Earp, B.D. AI and the Future of Academic Peer Review, 2025, [arXiv:cs.CY/2509.14189].
- Maturo, F.; Porreca, A.; Porreca, A. The risks of artificial intelligence in research: ethical and methodological challenges in the peer review process. AI Ethics 2025, 5, 5389–5396. [CrossRef]
- Keuper, J. Prompt Injection Attacks on LLM Generated Reviews of Scientific Publications, 2025, [arXiv:cs.LG/2509.10248].
- Verma, P. Researchers are using AI for peer reviews — and finding ways to cheat it. The Washington Post 2025.
- Media, V. Scientists reportedly hiding AI text prompts in academic papers to receive positive peer reviews, 2025. Public media reports.
- Shi, J.; Yuan, Z.; Liu, Y.; Huang, Y.; Zhou, P.; Sun, L.; Gong, N.Z. Optimization-based Prompt Injection Attack to LLM-as-a-Judge, 2025, [arXiv:cs.CR/2403.17710].
- Zhao, Y.; Liu, H.; Yu, D.; Kung, S.Y.; Mi, H.; Yu, D. One Token to Fool LLM-as-a-Judge, 2025, [arXiv:cs.LG/2507.08794].
- Collu, M.G.; Salviati, U.; Confalonieri, R.; Conti, M.; Apruzzese, G. Publish to Perish: Prompt Injection Attacks on LLM-Assisted Peer Review, 2025, [arXiv:cs.CR/2508.20863].
- Ye, R.; Pang, X.; Chai, J.; Chen, J.; Yin, Z.; Xiang, Z.; Dong, X.; Shao, J.; Chen, S. Are We There Yet? Revealing the Risks of Utilizing Large Language Models in Scholarly Peer Review, 2024, [arXiv:cs.CL/2412.01708].
- Dong, Y.; Jiang, X.; Liu, H.; Jin, Z.; Gu, B.; Yang, M.; Li, G. Generalization or Memorization: Data Contamination and Trustworthy Evaluation for Large Language Models, 2024, [arXiv:cs.CL/2402.15938].
- Goldblum, M.; Tsipras, D.; Xie, C.; Chen, X.; Schwarzschild, A.; Song, D.; Madry, A.; Li, B.; Goldstein, T. Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses, 2021, [arXiv:cs.LG/2012.10544].
- Borgeaud, S.; Mensch, A.; Hoffmann, J.; Cai, T.; Rutherford, E.; Millican, K.; Van Den Driessche, G.B.; Lespiau, J.B.; Damoc, B.; Clark, A.; et al. Improving Language Models by Retrieving from Trillions of Tokens. In Proceedings of the Proceedings of the 39th International Conference on Machine Learning. PMLR, 17–23 Jul 2022, Vol. 162, Proceedings of Machine Learning Research, pp. 2206–2240.
- Souly, A.; Rando, J.; Chapman, E.; Davies, X.; Hasircioglu, B.; Shereen, E.; Mougan, C.; Mavroudis, V.; Jones, E.; Hicks, C.; et al. Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples, 2025, [arXiv:cs.LG/2510.07192].
- Wen, J.; Si, C.; han Chen, Y.; He, H.; Feng, S. Predicting Empirical AI Research Outcomes with Language Models, 2025, [arXiv:cs.AI/2506.00794].
- Bereska, L.; Gavves, E. Mechanistic Interpretability for AI Safety – A Review, 2024, [arXiv:cs.AI/2404.14082].
- Lo, L.Y.H.; Qu, H. How Good (Or Bad) Are LLMs at Detecting Misleading Visualizations?, 2024, [arXiv:cs.HC/2407.17291].
- Tonglet, J.; Zimny, J.; Tuytelaars, T.; Gurevych, I. Is this chart lying to me? Automating the detection of misleading visualizations, 2025, [arXiv:cs.CL/2508.21675].
- Gallegos, I.O.; Rossi, R.A.; Barrow, J.; Tanjim, M.M.; Kim, S.; Dernoncourt, F.; Yu, T.; Zhang, R.; Ahmed, N.K. Bias and Fairness in Large Language Models: A Survey, 2024, [arXiv:cs.CL/2309.00770].
- Liu, Y.; Deng, G.; Li, Y.; Wang, K.; Wang, Z.; Wang, X.; Zhang, T.; Liu, Y.; Wang, H.; Zheng, Y.; et al. Prompt Injection attack against LLM-integrated Applications, 2024, [arXiv:cs.CR/2306.05499].
- Zhou, X.; Qiang, Y.; Zade, S.Z.; Khanduri, P.; Zhu, D. Hijacking Large Language Models via Adversarial In-Context Learning, 2025, [arXiv:cs.LG/2311.09948].
- Gong, Y.; Chen, Z.; Chen, M.; Yu, F.; Lu, W.; Wang, X.; Liu, X.; Liu, J. Topic-FlipRAG: Topic-Orientated Adversarial Opinion Manipulation Attacks to Retrieval-Augmented Generation Models, 2025, [arXiv:cs.CL/2502.01386].
- Schwinn, L.; Dobre, D.; Günnemann, S.; Gidel, G. Adversarial Attacks and Defenses in Large Language Models: Old and New Threats, 2023, [arXiv:cs.AI/2310.19737].
- Raina, V.; Liusie, A.; Gales, M. Is LLM-as-a-Judge Robust? Investigating Universal Adversarial Attacks on Zero-shot LLM Assessment, 2024, [arXiv:cs.CL/2402.14016].
- Guo, Y.; Guo, M.; Su, J.; Yang, Z.; Zhu, M.; Li, H.; Qiu, M.; Liu, S.S. Bias in Large Language Models: Origin, Evaluation, and Mitigation, 2024, [arXiv:cs.CL/2411.10915].
- Navigli, R.; Conia, S.; Ross, B. Biases in Large Language Models: Origins, Inventory, and Discussion. J. Data and Information Quality 2023, 15. [CrossRef]
- Angrist, J.D. The Perils of Peer Effects. Labour Economics 2014. [CrossRef]
- Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Küttler, H.; Lewis, M.; tau Yih, W.; Rocktäschel, T.; et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, 2021, [arXiv:cs.CL/2005.11401].
- Schwarzschild, A.; Goldblum, M.; Gupta, A.; Dickerson, J.P.; Goldstein, T. Just How Toxic is Data Poisoning? A Unified Benchmark for Backdoor and Data Poisoning Attacks. In Proceedings of the Proceedings of the 38th International Conference on Machine Learning; Meila, M.; Zhang, T., Eds. PMLR, 18–24 Jul 2021, Vol. 139, Proceedings of Machine Learning Research, pp. 9389–9398.
- Goldblum, M.; Tsipras, D.; Xie, C.; Chen, X.; Schwarzschild, A.; Song, D.; Madry, A.; Li, B.; Goldstein, T. Data Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses, 2020. [CrossRef]
- Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.A.; Lacroix, T.; Rozière, B.; Goyal, N.; Hambro, E.; Azhar, F.; et al. LLaMA: Open and Efficient Foundation Language Models, 2023, [arXiv:cs.CL/2302.13971].
- Bowen, D.; Murphy, B.; Cai, W.; Khachaturov, D.; Gleave, A.; Pelrine, K. Scaling Trends for Data Poisoning in LLMs, 2025, [arXiv:cs.CR/2408.02946].
- Liu, T.; Zhang, Y.; Feng, Z.; Yang, Z.; Xu, C.; Man, D.; Yang, W. Beyond Traditional Threats: A Persistent Backdoor Attack on Federated Learning, 2024, [arXiv:cs.CR/2404.17617].
- Zhu, C.; Li, Y.; Rao, B.; Zhang, J.; Mao, Y.; Zhong, S. SPA: Towards More Stealth and Persistent Backdoor Attacks in Federated Learning, 2025, [arXiv:cs.CR/2506.20931].
- Tian, Z.; Cui, L.; Liang, J.; Yu, S. A Comprehensive Survey on Poisoning Attacks and Countermeasures in Machine Learning. ACM Comput. Surv. 2022, 55. [CrossRef]
- Zhao, P.; Zhu, W.; Jiao, P.; Gao, D.; Wu, O. Data Poisoning in Deep Learning: A Survey, 2025, [arXiv:cs.CR/2503.22759].
- Muñoz-González, L.; Biggio, B.; Demontis, A.; Paudice, A.; Wongrassamee, V.; Lupu, E.C.; Roli, F. Towards Poisoning of Deep Learning Algorithms with Back-gradient Optimization, 2017, [arXiv:cs.LG/1708.08689].
- Nourani, M.; Roy, C.; Block, J.E.; Honeycutt, D.R.; Rahman, T.; Ragan, E.; Gogate, V. Anchoring Bias Affects Mental Model Formation and User Reliance in Explainable AI Systems. In Proceedings of the Proceedings of the 26th International Conference on Intelligent User Interfaces, New York, NY, USA, 2021; IUI ’21, p. 340–350. [CrossRef]
- Shi, F.; Chen, X.; Misra, K.; Scales, N.; Dohan, D.; Chi, E.; Schärli, N.; Zhou, D. Large Language Models Can Be Easily Distracted by Irrelevant Context, 2023, [arXiv:cs.CL/2302.00093].
- Dougrez-Lewis, J.; Akhter, M.E.; Ruggeri, F.; Löbbers, S.; He, Y.; Liakata, M. Assessing the Reasoning Capabilities of LLMs in the context of Evidence-based Claim Verification. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2025; Che, W.; Nabende, J.; Shutova, E.; Pilehvar, M.T., Eds., Vienna, Austria, 2025; pp. 20604–20628. [CrossRef]
- Hong, R.; Zhang, H.; Pang, X.; Yu, D.; Zhang, C. A Closer Look at the Self-Verification Abilities of Large Language Models in Logical Reasoning, 2024, [arXiv:cs.AI/2311.07954].
- OWASP Foundation. OWASP Top 10 for Large Language Model Applications, 2023. Accessed in 2025. See LLM01: Prompt Injection. URL: https://owasp.org/www-project-top-10-for-large-language-model-applications/.
- Liang, W.; Zhang, Y.; Cao, H.; Wang, B.; Ding, D.; Yang, X.; Vodrahalli, K.; He, S.; Smith, D.; Yin, Y.; et al. Can large language models provide useful feedback on research papers? A large-scale empirical analysis, 2023, [arXiv:cs.LG/2310.01783].
- Zhou, Z.; Li, Z.; Zhang, J.; Zhang, Y.; Wang, K.; Liu, Y.; Guo, Q. CORBA: Contagious Recursive Blocking Attacks on Multi-Agent Systems Based on Large Language Models, 2025, [arXiv:cs.CL/2502.14529].
- Zhu, S.; Zhang, R.; An, B.; Wu, G.; Barrow, J.; Huang, F.; Sun, T. AutoDAN: Automatic and Interpretable Adversarial Attacks on Large Language Models, 2024.
- Zizzo, G.; Cornacchia, G.; Fraser, K.; Hameed, M.Z.; Rawat, A.; Buesser, B.; Purcell, M.; Chen, P.Y.; Sattigeri, P.; Varshney, K. Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMs, 2025, [arXiv:cs.CR/2502.15427].
- Malmqvist, L. Sycophancy in Large Language Models: Causes and Mitigations, 2024, [arXiv:cs.CL/2411.15287].
- Bozdag, N.B.; Mehri, S.; Tur, G.; Hakkani-Tür, D. Persuade Me if You Can: A Framework for Evaluating Persuasion Effectiveness and Susceptibility Among Large Language Models, 2025, [arXiv:cs.CL/2503.01829].
- Salvi, F.; Horta Ribeiro, M.; Gallotti, R.; West, R. On the conversational persuasiveness of GPT-4. 9, 1645–1653. [CrossRef]
- Liu, Y.; Yang, K.; Liu, Y.; Drew, M.G.B. The Shackles of Peer Review: Unveiling the Flaws in the Ivory Tower. 2023.
- Nisbett, R.E.; Wilson, T.D. The halo effect: Evidence for unconscious alteration of judgments. Journal of Personality and Social Psychology 1977, 35, 250–256.
- Zhang, J.; Zhang, H.; Deng, Z.; Roth, D. Investigating Fairness Disparities in Peer Review: A Language Model Enhanced Approach, 2022, [arXiv:cs.CY/2211.06398].
- Fox, C.W.; Meyer, J.A.; Aimé, E. Double-blind peer review affects reviewer ratings and editor decisions at an ecology journal. Functional Ecology 2023.
- Sun, M.; Danfa, J.B.; Teplitskiy, M. Does double-blind peer review reduce bias? Evidence from a top computer science conference. J. Assoc. Inf. Sci. Technol. 2021, 73, 811 – 819.
- Jin, Y.; Zhao, Q.; Wang, Y.; Chen, H.; Zhu, K.; Xiao, Y.; Wang, J. AgentReview: Exploring Peer Review Dynamics with LLM Agents. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2024.
- Verharen, J.P.H. ChatGPT identifies gender disparities in scientific peer review. eLife 2023, 12.
- Hosseini, M.; Horbach, S.P. Fighting reviewer fatigue or amplifying bias? Considerations and recommendations for use of ChatGPT and other large language models in scholarly peer review. Research Integrity and Peer Review 2023, 8.
- Soneji, A.; Kokulu, F.B.; Rubio-Medrano, C.E.; Bao, T.; Wang, R.; Shoshitaishvili, Y.; Doupé, A. “Flawed, but like democracy we don’t have a better system”: The Experts’ Insights on the Peer Review Process of Evaluating Security Papers. 2022 IEEE Symposium on Security and Privacy (SP) 2022, pp. 1845–1862.
- Schramowski, P.; Turan, C.; Andersen, N.; Rothkopf, C.A.; Kersting, K. Large pre-trained language models contain human-like biases of what is right and wrong to do. Nature Machine Intelligence 2021, 4, 258 – 268. [CrossRef]
- Li, H.; Ji, Y.; Lyu, C.; Zhang, C. Blacklight: Scalable Defense for Neural Networks against Query-Based Black-Box Attacks. In Proceedings of the 31st USENIX Security Symposium (USENIX Security 22), 2022.
- Koo, R.; Lee, M.; Raheja, V.; Park, J.I.; Kim, Z.M.; Kang, D. Benchmarking Cognitive Biases in Large Language Models as Evaluators, 2024, [arXiv:cs.CL/2309.17012].
- Bartos, O.J.; Wehr, P. Using Conflict Theory; Cambridge University Press: Cambridge, 2002.
| 1 | |
| 2 | |
| 3 | |
| 4 |



| Work | External Tools | System Orchestration | Failure Modes | Focus Criteria | |||||||||
| Single | Multi | HITL | H | B | L | C | T | N | Q | F | R | ||
| Phase A — Automated Desk Review | |||||||||||||
| Statcheck Nuijten et al. [21] | Ethics checklists | ✓ | ✓ | ✓ | |||||||||
| StatReviewer Shanahan [22] | Ethics checklists | ✓ | ✓ | ✓ | |||||||||
| Penelope/UNSILO Checco et al. [23] | Ethics checklists | ✓ | ✓ | ✓ | |||||||||
| TPMS Charlin and Zemel [24] | Literature corpus | ✓ | ✓ | ✓ | ✓ | ✓ | |||||||
| LCM Leyton-Brown et al. [25] | Literature corpus | ✓ | ✓ | ✓ | ✓ | ✓ | |||||||
| NSFC pilot Cyranoski [26] | - | ✓ | ✓ | ✓ | ✓ | ✓ | |||||||
| Phase B — AI-assisted Deep Review | |||||||||||||
| ReviewerGPT Liu and Shah [27] | - | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||||||
| Reviewer2 Gao et al. [28] | - | ✓ | ✓ | ✓ | ✓ | ✓ | |||||||
| SEA Yu et al. [29] | - | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||||||
| ReviewRobot Wang and Zeng [30] | Knowledge graph | ✓ | ✓ | ✓ | ✓ | ✓ | |||||||
| CycleResearcher Weng et al. [31] | Literature corpus | ✓ | ✓ | ✓ | |||||||||
| MARG D’Arcy et al. [32] | - | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |||||
| MAMORX Taechoyotin et al. [33] | Literature corpus | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |||||
| Skarlinski et al. [34] | Literature corpus | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |||||
| SchNovel Xiao et al. [35] | Literature corpus | ✓ | ✓ | ✓ | ✓ | ✓ | |||||||
| Scideator Radensky et al. [36] | Literature corpus | ✓ | ✓ | ✓ | ✓ | ✓ | |||||||
| RelevAI-Reviewer Wijnhoven et al. [37] | Literature corpus | ✓ | ✓ | ✓ | ✓ | ||||||||
| LimGen Rahman et al. [38] | - | ✓ | ✓ | ✓ | ✓ | ✓ | |||||||
| ReviewFlow Sun et al. [39] | PDF/Vis parse | ✓ | ✓ | ✓ | ✓ | ||||||||
| CARE Zyska et al. [40] | PDF/Vis parse | ✓ | ✓ | ✓ | ✓ | ||||||||
| DocPilot Mathur et al. [41] | PDF/Vis parse | ✓ | ✓ | ✓ | ✓ | ||||||||
| Phase C — Meta-review Synthesis | |||||||||||||
| MetaGen Bhatia et al. [42] | - | ✓ | ✓ | ✓ | ✓ | ||||||||
| MReD Shen et al. [43] | - | ✓ | ✓ | ✓ | ✓ | ||||||||
| Zeng et al. [44] | - | ✓ | ✓ | ✓ | ✓ | ||||||||
| RAMMER Li et al. [45] | - | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||||||
| MetaWriter Sun et al. [46] | - | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||||||
| GLIMPSE Darrin et al. [47] | - | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||||||
| PeerArg Sukpanichnant et al. [48] | - | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |||||
| Hossain et al. [49] | - | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||||
| Phase | Method | Mechanism | Target | Required preparation | Concealment | Difficulty |
| Training & Data Retrieval | Poisoning | ▸Data contamination | Training data / online data | Contaminable training data sources | ▴ | ▵ |
| ▸Backdoor injection | Training data | Trigger-output pairs | ▴ | ▴ | ||
| Desk Review | Evasion | ▸Abstract & Conclusion hijacking | Abstract; conclusion | Text editing | ▵ | ▿ |
| ▸Structure spoofing | Article typesetting | Text editing | ▵ | ▿ | ||
| Deep Review | Evasion | ▸Academic packaging | Main text content | Formula template library | ▵ | ▿ |
| ▸Keyword & compliment stacking | Main text content | List of high-frequency keywords | ▵ | ▿ | ||
| ▸Misleading conclusions | Main text content | Data & formula generation | ▵ | ▵ | ||
| ▸Invisible prompt injection | Text, metadata, images, hyperlinks | Text / image editing | ▴ | ▿ | ||
| Rebuttal | Evasion | ▸Rebuttal opinion hijacking | Model feedback | Hijacking dialogue strategy | ▿ | ▿ |
| System | Exploratory | ▸Identity bias exploitation | Author list | Senior researcher list | ▿ | ▿ |
| ▸Model inversion | Model preferences | Historical review data | ▴ | ▵ | ||
| Poisoning | ▸Malicious collusion | System | Multiple fake accounts for collaborative attacks | ▿ | ▵ |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).