Diff of /README.md [000000] .. [ab27bc]

Switch to unified view

a b/README.md
1
# MIMIC-SPARQL
2
This repository provides the official mimic-sparql dataset implementation of the following paper: [Knowledge Graph-based Question Answering with Electronic Health Records](https://arxiv.org/abs/2010.09394) accepted at Machine Learning in Health Care (MLHC) 2021.
3
## Example
4
```
5
NLQ: how many patients were born before the year 2060?
6
7
SQL: select count ( distinct patients."subject_id" ) from patients  where patients."dob_year" < "2060"
8
9
SPARQL: select ( count ( distinct ?subject_id ) as ?agg ) where { ?subject_id </dob_year> ?dob_year. filter( ?dob_year < 2060 ).
10
```
11
12
## Prerequisites
13
14
1. MIMIC-III   
15
https://mimic.physionet.org/
16
2. MIMICSQL  
17
Paper title: Text-to-SQL Generation for Question Answering on Electronic Medical Records.  
18
Dataset and codes: https://github.com/wangpinggl/TREQS
19
3. ENV
20
```
21
python 3.6
22
networkx
23
rdflib
24
pandas
25
numpy
26
sqlite3  # sqlite is a built-in library from python 2.5. so there is no need to install manually.
27
requests
28
```
29
​   Set up ENV using pip
30
```bash
31
pip install networkx rdflib pandas numpy requests
32
```
33
34
## Datasets
35
36
1. __MIMICSQL*__  
37
MIMICSQL* is extended version of MIMICSQL. The database consists of 9 table of MIMIC-III.  
38
2. __MIMIC-SPARQL__  
39
MIMIC-SPARQL is a graph-based counterpart of MIMICSQL*. The knowledge graph of this dataset has 173,096 triples and the max hop is 5.
40
41
## Guide for creating the MIMICSQL* and MIMIC-SPARQL
42
0. Prepare MIMIC-III and make mimic.db from MIMICSQL
43
1. Build mimicsql* database from mimicsql database
44
2. Build mimic-sparql knowlege graph from mimicsql* database
45
3. Convert mimicsql SQL query to mimicsql* SQL query
46
4. Convert mimicsql* SQL query to mimic-sparql SPARQL query
47
48
49
### 0. Prepare MIMIC-III and mimic.db from MIMICSQL
50
First, you need to access the MIMIC-III data. This requires certification from https://mimic.physionet.org/ 
51
And then, `mimic.db` is necessary to go to the next step following the https://github.com/wangpinggl/TREQS README.md  
52
53
### 1. Build mimicsql* database from mimicsql database
54
First, you need to save mimic.db under `mimicsql/evaluation/mimic_db` path.
55
And then, set the current directory in the project root folder, mimic-sparql.
56
```
57
python build_mimicsqlstar_db/build_mimicstar_db_from_mimicsql_db.py
58
```
59
This is to build MIMICSQL* DB and `mimicsqlstar.db` is made.
60
### 2. Build mimic-sparql knowlege graph from mimicsql* database
61
Set the current directory in the project root folder, mimic-sparql.
62
For building mimic-sparql* from mimicsql*, 
63
```
64
python build_mimicsparql_kg/build_complex_kg_from_mimicsqlstar_db.py
65
```
66
For building mimic-sparql from mimicsql,
67
```
68
python build_mimicsparql_kg/build_simple_kg_from_mimicsql_db.py
69
```
70
This is to build MIMIC-SPARQL KG and `mimic_sparqlstar_kg.xml` and `mimic_sparql_kg.xml` are made.
71
### 3. Convert mimicsql SQL query to mimicsql* SQL query
72
Set the current directory in the project root folder, mimic-sparql.
73
```
74
python convert_mimicsql2sql_dataset.py --dataset_type natural --execution False
75
python convert_mimicsql2sql_dataset.py --dataset_type template --execution False
76
```
77
If set execution as True, the execution results of both queries are compared with each other. 
78
### 4. Convert mimicsql* SQL query to mimic-sparql SPARQL query
79
Set the current directory as the project root folder, mimic-sparql.
80
```
81
python convert_sql2sparql_dataset.py --dataset_type natural --complex True --execution False
82
python convert_sql2sparql_dataset.py --dataset_type natural --complex False --execution False
83
84
python convert_sql2sparql_dataset.py --dataset_type template --complex True --execution False
85
python convert_sql2sparql_dataset.py --dataset_type template --complex False --execution False
86
```
87
Complex option is for selecting simplied schema (mimic-sparql from mimicsql) or original schema (mimic-sparql* from mimicsql*)  
88
89
## Citation
90
```
91
@inproceedings{pmlr-v149-park21a,
92
  title =    {Knowledge Graph-based Question Answering with Electronic Health Records},
93
  author =       {Park, Junwoo and Cho, Youngwoo and Lee, Haneol and Choo, Jaegul and Choi, Edward},
94
  booktitle =    {Proceedings of the 6th Machine Learning for Healthcare Conference (MLHC)},
95
  pages =    {36--53},
96
  year =     {2021},
97
  volume =   {149},
98
  publisher =    {PMLR}
99
}
100
```
101
This bibtex will be changed after being published on PMLR