1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138 | # Medical Question Answering
## Overview of directory
MedicalQADataset.ipynb provides instructions of loading and inspecting a collection of medical question asnwering datasets.
HeadQA_tutorial.ipynb is a tutorial notebook that uses BertForMultipleChoice to solve the HeadQA dataset.
These notebooks are developed using Google Colab.
## Collection of Medical QA Datasets
### HeadQA
HEAD-QA: A Healthcare Dataset for Complex Reasoning [paper](https://aclanthology.org/P19-1092.pdf)
- Corpus of multiple choice questions
- Dataset for general reasoning (given a question and 4/5 choices, select the correct choice)
- In training set, each question has 5 choices
- In validation and test set, each question has 4 choices
- *Answer choices are not from given passages/contexts, i.e., not identifying spans like SQuAD
- Available in English and Spanish
- Source: annual exam question to apply for specialization positions in Spanish public healthcare positions
- Healthcare areas: medicine, pharmacology, psychology, nursing, biology, chemistry
- Data can be downloaded from huggingface datasets
- train/val/test split: 2657/2742/1366
### BioASQ
- Task Synergy on Biomedical Semantic QA for Covid-19
- Given unanswered questions, model provided answers which will be evaluated by experts
- Involves IR, QA, summarization, etc
- Task A: Large-Scale Online Biomedical Semantic Indexing
- Classify new PubMed documents before PubMed curators annotate them manually
- Classes come from MeSH hierarchy
- Task B: Biomedical Semantic QA
- 4 types of questions:
- Yes/No
- Factoid: requires a particular entity name, number or short expression
- List: a list of entity names, numbers or short expressions
- Summary: produce a short text summarization of the most relevant information
- For 9b, there are 3743 questions {1091 factoid, 1033 yesno, 899 summary, 719 list}
### MedQuAD
A question-entailment approach to question answering [paper](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-3119-4)
- Proposing RQE (Recognizing Question Entailment): retrieve answers to a new question by retrieving entailed questions with associated answers
- Dataset:
- Collected from 12 trusted medical websites, one in each folder. Each folder has multiple xml files
- Contain question-answer pairs
- Task: generate a ranked list of answers for a given premise question by ranking the recognized hypothesis questions
- Evaluation: compare hybrid entailment-based approach, the IR method, and other QA systems participating in LiveQA
- Remark:
- The dataset is essentially an auxiliary dataset to LiveQA. It serves as a base of hypothesis questions.
### LiveQA
Overview of the Medical Question Answering Task at TREC 2017 LiveQA
- Providing automatic answers to consumer health questions received by the U.S. National Library of Medicine (questions posts, each question can have multiple sub-questions)
- Free to use any medical websites to find relevant answers
- Dataset:
- 2 training sets with 634 pairs of medical questions and answers
- QA pairs constructed from FAQs on trusted websites of NIH
- Additional annotations for Question Focus and Question Type for each subquestion
- 1st training set contains 388 question-answer pairs, corresponding to 200 NLM questions
- 2nd training set contains 246 question-answer pairs, corresponding to 246 NLM question (retrieved manually)
- Test set contains 104 NLM questions (subquestion focus and type annotations not provided)
- For each test question, one or more reference answers are manually collected
### MEDIQA
Overview of the MEDIQA 2019 Shared Task on Textual Inference, Question Entailment and Question Answering
- 3 subtasks:
- Natural Language Inference (NLI)
- classify relationship between two sentences as Entailment, Neutral, and Contradiction
- Recognizing Question Entailment (RQE)
- “a question A entails a question B if every answer to B is also a complete or partial answer to A”
- Question Answering (QA)
-filter and improve the ranking of automatically retrieved answers, input ranks generated by CHiQA.
- Task 1 dataset
- MedNLI derived from MIMIC-III, access using GCP or AQS
- 14,049 text-hypothesis training pairs, 405 test pairs.
- Tutorial notebook in inference directory
- Task 2 dataset
- available at Github, xml files, train/val/test = 8,890/302/230
- Task 3 dataset
- available at Github, {system\_rank, reference\_rank, reference\_score}
- two training datasets:
- 104 consumer health questions from LiveQA, 839 answers retrieved by CHiQA and manually rated and re-ranked
- 104 simple questions about the most frequent diseases (Alexa), 862 answers
- validation set:
- 25 consumer health questions, 234 answers returned by CHiQA and judged manually
- test set:
- 150 consumer health questions, 1,107 answers
- MedQuAD (47k pairs) can be used to retrieve answered questions that are entailed from the original questions
- Evaluations:
Task 1 & 2: accuracy
Task 3: accuracy, Mean Reciprocal Rank (MRR), Precision, and Spearman’s Rank Correlation Coefficient (Spearman’s Rho)
### Medcation_QA_MedInfo2019
- Answering consumer health questions about medications/drugs
- Dataset
- Select anonymized consumer questions submitted to MedlinePlus
- Annotating questions with Question focus and Question type
- Annotating reference answers by manually retrieve a correct and complete reference answer with url and section title
- xlsl file with columns={Question, Focus (Drug), Question Type, Answer, Section Title, URL}
### BiQA
Generating Biomedical Question Answering Corpora From Q&A Forums [paper](https://ieeexplore.ieee.org/abstract/document/9184044)
- Corpus of question-article pairs
- Dataset for information retrieval (given a question, find relevant articles)
- Source: questions selected from popular questions in public forums (Biology and Medical Sciences from Stackexchange, Nutrition from Reddit)
- 7,453 questions and 14,239 question-article pairs
- csv files, one for each topic, three in total
- For each question, the pubmed id of each answer article and the title of each article are recorded in the csv
- Code for getting and filtering additional posts available
- Code for retrieving documents using PMIDs available
### MASHQA
Multiple Answer Spans Healthcare Question Answering dataset from consumer health domain [paper](https://people.cs.vt.edu/mingzhu/papers/conf/emnlp2020.pdf)
- Dataset:
- Context, question, and multiple highlighted spans of context as answers. The answer consists of multiple spans. All spans together form a complete answer.
- Source: queries from WebMD, answers curated by healthcare experts
- For each question, answer is split into sentences. The sentences are identified in their source context as answer spans.
### EPIC QA
- Task
- Given question and context passages, identify consecutive answer sentences (start sentence id to end sentence id).
- Gold answers available now.
### PubMedQA
PubMedQA: A Dataset for Biomedical Research Question Answering [paper](https://arxiv.org/pdf/1909.06146.pdf)
- Answer research questions with yes/no/maybe
- Three sets
- PQA-Labeled (1k)
- PQA-Unlabeled (211.3k)
- PQA-Artificial (61.2k)
- Dataset created from PubMed article abstracts
- Labeled
- Question: title
- Context: sections in abstract except for conclusion
- Label: annotated yes/no/maybe
- Unlabeled
- Removes all questions started with wh-words or involve selections from multiple entities
- Artificial
- Noisily labeled instances using heuristics for pre-training
|