Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Collaborative Modality Fusion for Mitigating Language Bias in Visual Question Answering

Version 1 : Received: 15 January 2024 / Approved: 16 January 2024 / Online: 16 January 2024 (08:38:12 CET)

A peer-reviewed article of this Preprint also exists.

Lu, Q.; Chen, S.; Zhu, X. Collaborative Modality Fusion for Mitigating Language Bias in Visual Question Answering. J. Imaging 2024, 10, 56. Lu, Q.; Chen, S.; Zhu, X. Collaborative Modality Fusion for Mitigating Language Bias in Visual Question Answering. J. Imaging 2024, 10, 56.

Abstract

Language bias stands as a noteworthy concern in Visual Question Answering (VQA), wherein models tend to rely on spurious correlations between questions and answers for prediction. This prevents the models from effectively generalizing, leading to a decrease in performance. To address this bias, we propose a novel modality fusion collaborative de-biasing algorithm (CoD). In our approach, bias is considered as the model’s neglect of information from a particular modality during prediction. We employ a collaborative training approach to facilitate mutual modeling between different modalities, achieving efficient feature fusion and enabling the model to fully leverage multi-modal knowledge for prediction. Our experiments on various datasets, including VQA-CP v2, VQA v2, and VQA-VS, using different validation strategies, demonstrate the effectiveness of our approach. Notably, employing a basic baseline model resulted in an accuracy of 60.14% on VQA-CP v2.

Keywords

visual question answering; collaborative learning; language bias

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.