Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Goal-Driven Visual Question Generation from Radiology Images

Version 1 : Received: 15 July 2021 / Approved: 16 July 2021 / Online: 16 July 2021 (16:18:56 CEST)

A peer-reviewed article of this Preprint also exists.

Sarrouti, M.; Ben Abacha, A.; Demner-Fushman, D. Goal-Driven Visual Question Generation from Radiology Images. Information 2021, 12, 334. Sarrouti, M.; Ben Abacha, A.; Demner-Fushman, D. Goal-Driven Visual Question Generation from Radiology Images. Information 2021, 12, 334.


Visual Question Generation (VQG) from images is a rising research topic in both fields of natural language processing and computer vision. Although there are some recent efforts towards generating questions from images in the open domain, the VQG task in the medical domain has not been well-studied so far due to the lack of labeled data. In this paper, we introduce a goal-driven VQG approach for radiology images called VQGRaD that generates questions targeting specific image aspects such as modality and abnormality. In particular, we study generating natural language questions based on the visual content of the image and on additional information such as the image caption and the question category. VQGRaD encodes the dense vectors of different inputs into two latent spaces, which allows generating, for a specific question category, relevant questions about the images, with or without their captions. We also explore the impact of domain knowledge incorporation (e.g., medical entities and semantic types) and data augmentation techniques on visual question generation in the medical domain. Experiments performed on the VQA-RAD dataset of clinical visual questions showed that VQGRaD achieves 61.86% BLEU score and outperforms strong baselines. We also performed a blinded human evaluation of the grammaticality, fluency, and relevance of the generated questions. The human evaluation demonstrated the better quality of VQGRaD outputs and showed that incorporating medical entities improves the quality of the generated questions. Using the test data and evaluation process of the ImageCLEF 2020 VQA-Med challenge, we found that relying on the proposed data augmentation technique to generate new training samples by applying different kinds of transformations, can mitigate the lack of data, avoid overfitting, and bring a substantial improvement in medical VQG.


Visual Question Generation; Visual Question Answering; Variational Autoencoders; Radiology Images; Domain Knowledge; UMLS; Data Augmentation; Computer Vision; Natural Language Processing; Artificial Intelligence; Medical Domain.


Computer Science and Mathematics, Algebra and Number Theory

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0

Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.