Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

What Disease does this Patient Have? A Large-scale Open Domain Question Answering Dataset from Medical Exams

Version 1 : Received: 20 May 2021 / Approved: 21 May 2021 / Online: 21 May 2021 (07:47:18 CEST)

How to cite: Jin, D.; Pan, E.; Oufattole, N.; Weng, W.; Fang, H.; Szolovits, P. What Disease does this Patient Have? A Large-scale Open Domain Question Answering Dataset from Medical Exams. Preprints 2021, 2021050498 (doi: 10.20944/preprints202105.0498.v1). Jin, D.; Pan, E.; Oufattole, N.; Weng, W.; Fang, H.; Szolovits, P. What Disease does this Patient Have? A Large-scale Open Domain Question Answering Dataset from Medical Exams. Preprints 2021, 2021050498 (doi: 10.20944/preprints202105.0498.v1).

Abstract

Open domain question answering (OpenQA) tasks have been recently attracting more and more attention from the natural language processing (NLP) community. In this work, we present the first free-form multiple-choice OpenQA dataset for solving medical problems, MedQA, collected from the professional medical board exams. It covers three languages: English, simplified Chinese, and traditional Chinese, and contains 12,723, 34,251, and 14,123 questions for the three languages, respectively. We implement both rule-based and popular neural methods by sequentially combining a document retriever and a machine comprehension model. Through experiments, we find that even the current best method can only achieve 36.7%, 42.0%, and 70.1% of test accuracy on the English, traditional Chinese, and simplified Chinese questions, respectively. We expect MedQA to present great challenges to existing OpenQA systems and hope that it can serve as a platform to promote much stronger OpenQA models from the NLP community in the future.

Subject Areas

Natural Language Processing; Open-domain Question Answering; Multi-choice Question Answering; Clinical Question Answering

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our diversity statement.

Leave a public comment
Send a private comment to the author(s)
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.