This repository provides the official mimic-sparql dataset implementation of the following paper: Knowledge Graph-based Question Answering with Electronic Health Records accepted at Machine Learning in Health Care (MLHC) 2021.
NLQ: how many patients were born before the year 2060?
SQL: select count ( distinct patients."subject_id" ) from patients where patients."dob_year" < "2060"
SPARQL: select ( count ( distinct ?subject_id ) as ?agg ) where { ?subject_id </dob_year> ?dob_year. filter( ?dob_year < 2060 ).
python 3.6
networkx
rdflib
pandas
numpy
sqlite3 # sqlite is a built-in library from python 2.5. so there is no need to install manually.
requests
Set up ENV using pip
pip install networkx rdflib pandas numpy requests
First, you need to access the MIMIC-III data. This requires certification from https://mimic.physionet.org/
And then, mimic.db
is necessary to go to the next step following the https://github.com/wangpinggl/TREQS README.md
First, you need to save mimic.db under mimicsql/evaluation/mimic_db
path.
And then, set the current directory in the project root folder, mimic-sparql.
python build_mimicsqlstar_db/build_mimicstar_db_from_mimicsql_db.py
This is to build MIMICSQL* DB and mimicsqlstar.db
is made.
Set the current directory in the project root folder, mimic-sparql.
For building mimic-sparql from mimicsql,
python build_mimicsparql_kg/build_complex_kg_from_mimicsqlstar_db.py
For building mimic-sparql from mimicsql,
python build_mimicsparql_kg/build_simple_kg_from_mimicsql_db.py
This is to build MIMIC-SPARQL KG and mimic_sparqlstar_kg.xml
and mimic_sparql_kg.xml
are made.
Set the current directory in the project root folder, mimic-sparql.
python convert_mimicsql2sql_dataset.py --dataset_type natural --execution False
python convert_mimicsql2sql_dataset.py --dataset_type template --execution False
If set execution as True, the execution results of both queries are compared with each other.
Set the current directory as the project root folder, mimic-sparql.
python convert_sql2sparql_dataset.py --dataset_type natural --complex True --execution False
python convert_sql2sparql_dataset.py --dataset_type natural --complex False --execution False
python convert_sql2sparql_dataset.py --dataset_type template --complex True --execution False
python convert_sql2sparql_dataset.py --dataset_type template --complex False --execution False
Complex option is for selecting simplied schema (mimic-sparql from mimicsql) or original schema (mimic-sparql from mimicsql)
@inproceedings{pmlr-v149-park21a,
title = {Knowledge Graph-based Question Answering with Electronic Health Records},
author = {Park, Junwoo and Cho, Youngwoo and Lee, Haneol and Choo, Jaegul and Choi, Edward},
booktitle = {Proceedings of the 6th Machine Learning for Healthcare Conference (MLHC)},
pages = {36--53},
year = {2021},
volume = {149},
publisher = {PMLR}
}
This bibtex will be changed after being published on PMLR