Preprint
Review

This version is not peer-reviewed.

Meta-Research on Backdoors: Dataset and Threat Model Shifts in Multimodal Backdoor Attacks

Submitted:

06 April 2026

Posted:

07 April 2026

You are already at the latest version

Abstract
Backdoor attacks enable adversaries to embed malicious behavior into machine learning models by poisoning training data with triggers. Researchers focused largely on backdoors in unimodal models. However, the rise of multimodal systems, e.g., vision–language models (VLMs) and multimodal large language models (MLLMs), has significantly increased the attack surface. Multimodal backdoors can exploit cross-modal triggers, representation-level manipulation, instruction-conditioned behaviors, and test-time activation pathways that are not available in unimodal models. Nevertheless, quantifying progress in this field remains challenging due to fragmented datasets, inconsistent threat models, and the lack of standardized evaluation protocols. This methodological inconsistency limits comparative analysis and impedes a systematic understanding of robustness in multimodal settings. This paper presents a meta-research on multimodal backdoor attacks and analyzes how methodological fragmentation undermines reproducibility and cumulative scientific understanding. We argue that standardized benchmarks and backward compatible evaluation protocols are necessary for a reliable and systematic advancement in multimodal backdoor research.
Keywords: 
;  ;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated