|
a |
|
b/README.md |
|
|
1 |
# MIMIC-SPARQL |
|
|
2 |
This repository provides the official mimic-sparql dataset implementation of the following paper: [Knowledge Graph-based Question Answering with Electronic Health Records](https://arxiv.org/abs/2010.09394) accepted at Machine Learning in Health Care (MLHC) 2021. |
|
|
3 |
## Example |
|
|
4 |
``` |
|
|
5 |
NLQ: how many patients were born before the year 2060? |
|
|
6 |
|
|
|
7 |
SQL: select count ( distinct patients."subject_id" ) from patients where patients."dob_year" < "2060" |
|
|
8 |
|
|
|
9 |
SPARQL: select ( count ( distinct ?subject_id ) as ?agg ) where { ?subject_id </dob_year> ?dob_year. filter( ?dob_year < 2060 ). |
|
|
10 |
``` |
|
|
11 |
|
|
|
12 |
## Prerequisites |
|
|
13 |
|
|
|
14 |
1. MIMIC-III |
|
|
15 |
https://mimic.physionet.org/ |
|
|
16 |
2. MIMICSQL |
|
|
17 |
Paper title: Text-to-SQL Generation for Question Answering on Electronic Medical Records. |
|
|
18 |
Dataset and codes: https://github.com/wangpinggl/TREQS |
|
|
19 |
3. ENV |
|
|
20 |
``` |
|
|
21 |
python 3.6 |
|
|
22 |
networkx |
|
|
23 |
rdflib |
|
|
24 |
pandas |
|
|
25 |
numpy |
|
|
26 |
sqlite3 # sqlite is a built-in library from python 2.5. so there is no need to install manually. |
|
|
27 |
requests |
|
|
28 |
``` |
|
|
29 |
Set up ENV using pip |
|
|
30 |
```bash |
|
|
31 |
pip install networkx rdflib pandas numpy requests |
|
|
32 |
``` |
|
|
33 |
|
|
|
34 |
## Datasets |
|
|
35 |
|
|
|
36 |
1. __MIMICSQL*__ |
|
|
37 |
MIMICSQL* is extended version of MIMICSQL. The database consists of 9 table of MIMIC-III. |
|
|
38 |
2. __MIMIC-SPARQL__ |
|
|
39 |
MIMIC-SPARQL is a graph-based counterpart of MIMICSQL*. The knowledge graph of this dataset has 173,096 triples and the max hop is 5. |
|
|
40 |
|
|
|
41 |
## Guide for creating the MIMICSQL* and MIMIC-SPARQL |
|
|
42 |
0. Prepare MIMIC-III and make mimic.db from MIMICSQL |
|
|
43 |
1. Build mimicsql* database from mimicsql database |
|
|
44 |
2. Build mimic-sparql knowlege graph from mimicsql* database |
|
|
45 |
3. Convert mimicsql SQL query to mimicsql* SQL query |
|
|
46 |
4. Convert mimicsql* SQL query to mimic-sparql SPARQL query |
|
|
47 |
|
|
|
48 |
|
|
|
49 |
### 0. Prepare MIMIC-III and mimic.db from MIMICSQL |
|
|
50 |
First, you need to access the MIMIC-III data. This requires certification from https://mimic.physionet.org/ |
|
|
51 |
And then, `mimic.db` is necessary to go to the next step following the https://github.com/wangpinggl/TREQS README.md |
|
|
52 |
|
|
|
53 |
### 1. Build mimicsql* database from mimicsql database |
|
|
54 |
First, you need to save mimic.db under `mimicsql/evaluation/mimic_db` path. |
|
|
55 |
And then, set the current directory in the project root folder, mimic-sparql. |
|
|
56 |
``` |
|
|
57 |
python build_mimicsqlstar_db/build_mimicstar_db_from_mimicsql_db.py |
|
|
58 |
``` |
|
|
59 |
This is to build MIMICSQL* DB and `mimicsqlstar.db` is made. |
|
|
60 |
### 2. Build mimic-sparql knowlege graph from mimicsql* database |
|
|
61 |
Set the current directory in the project root folder, mimic-sparql. |
|
|
62 |
For building mimic-sparql* from mimicsql*, |
|
|
63 |
``` |
|
|
64 |
python build_mimicsparql_kg/build_complex_kg_from_mimicsqlstar_db.py |
|
|
65 |
``` |
|
|
66 |
For building mimic-sparql from mimicsql, |
|
|
67 |
``` |
|
|
68 |
python build_mimicsparql_kg/build_simple_kg_from_mimicsql_db.py |
|
|
69 |
``` |
|
|
70 |
This is to build MIMIC-SPARQL KG and `mimic_sparqlstar_kg.xml` and `mimic_sparql_kg.xml` are made. |
|
|
71 |
### 3. Convert mimicsql SQL query to mimicsql* SQL query |
|
|
72 |
Set the current directory in the project root folder, mimic-sparql. |
|
|
73 |
``` |
|
|
74 |
python convert_mimicsql2sql_dataset.py --dataset_type natural --execution False |
|
|
75 |
python convert_mimicsql2sql_dataset.py --dataset_type template --execution False |
|
|
76 |
``` |
|
|
77 |
If set execution as True, the execution results of both queries are compared with each other. |
|
|
78 |
### 4. Convert mimicsql* SQL query to mimic-sparql SPARQL query |
|
|
79 |
Set the current directory as the project root folder, mimic-sparql. |
|
|
80 |
``` |
|
|
81 |
python convert_sql2sparql_dataset.py --dataset_type natural --complex True --execution False |
|
|
82 |
python convert_sql2sparql_dataset.py --dataset_type natural --complex False --execution False |
|
|
83 |
|
|
|
84 |
python convert_sql2sparql_dataset.py --dataset_type template --complex True --execution False |
|
|
85 |
python convert_sql2sparql_dataset.py --dataset_type template --complex False --execution False |
|
|
86 |
``` |
|
|
87 |
Complex option is for selecting simplied schema (mimic-sparql from mimicsql) or original schema (mimic-sparql* from mimicsql*) |
|
|
88 |
|
|
|
89 |
## Citation |
|
|
90 |
``` |
|
|
91 |
@inproceedings{pmlr-v149-park21a, |
|
|
92 |
title = {Knowledge Graph-based Question Answering with Electronic Health Records}, |
|
|
93 |
author = {Park, Junwoo and Cho, Youngwoo and Lee, Haneol and Choo, Jaegul and Choi, Edward}, |
|
|
94 |
booktitle = {Proceedings of the 6th Machine Learning for Healthcare Conference (MLHC)}, |
|
|
95 |
pages = {36--53}, |
|
|
96 |
year = {2021}, |
|
|
97 |
volume = {149}, |
|
|
98 |
publisher = {PMLR} |
|
|
99 |
} |
|
|
100 |
``` |
|
|
101 |
This bibtex will be changed after being published on PMLR |