|
a |
|
b/README.md |
|
|
1 |
medGAN |
|
|
2 |
========================================= |
|
|
3 |
medGAN is a generative adversarial network for generating multi-label discrete patient records. It can generate both binary and count variables (i.e. medical codes such as diagnosis codes, medication codes or procedure codes). |
|
|
4 |
|
|
|
5 |
#### Relevant Publications |
|
|
6 |
|
|
|
7 |
medGAN implements the algorithm introduced in the following [paper](https://arxiv.org/abs/1703.06490): |
|
|
8 |
|
|
|
9 |
Generating Multi-label Discrete Patient Records using Generative Adversarial Networks |
|
|
10 |
Edward Choi, Siddharth Biswal, Bradley Malin, Jon Duke, Walter F. Stewart, Jimeng Sun |
|
|
11 |
Machine Learning for Healthcare (MLHC) 2017 |
|
|
12 |
|
|
|
13 |
#### Code Description |
|
|
14 |
|
|
|
15 |
This code trains a generative adversarial network to generate patient records. This work currently can handle patient records that are aggregated over time, hence represented as a matrix where a row corresponds to a patient, and a column to a specific medical code (e.g. diagonsis code, medication code, or procedure code). The value of the matrix could either be binary (i.e. a specific medical code occurred in the longitudinal patient record or not) or count (i.e. how many times a specific medical code occurred in the longitudinal patient record). |
|
|
16 |
|
|
|
17 |
#### Running GRAM |
|
|
18 |
|
|
|
19 |
**STEP 1: Installation** |
|
|
20 |
|
|
|
21 |
1. medGAN was implemented to run on [TensorFlow](https://www.python.org/) 1.2. TensorFlow can be easily installed in Ubuntu as suggested [here](https://www.tensorflow.org/install/install_linux) |
|
|
22 |
|
|
|
23 |
2. Download/clone the medGAN code |
|
|
24 |
|
|
|
25 |
**STEP 2: Fast way to test medGAN with MIMIC-III** |
|
|
26 |
This step describes how to train medGAN, with minimum number of steps using MIMIC-III. |
|
|
27 |
|
|
|
28 |
0. You will first need to request access for [MIMIC-III](https://mimic.physionet.org/gettingstarted/access/), a publicly avaiable electronic health records collected from ICU patients over 11 years. |
|
|
29 |
|
|
|
30 |
1. You can use "process_mimic.py" to process MIMIC-III dataset and generate a suitable training dataset for medGAN. |
|
|
31 |
Place the script to the same location where the MIMIC-III CSV files are located, and run the script. |
|
|
32 |
The execution command is `python process_mimic.py ADMISSIONS.csv DIAGNOSES_ICD.csv <output file> <"binary"|"count">`. |
|
|
33 |
Note that the last argument decides whether you construct a binary matrix or a count matrix. |
|
|
34 |
The above command will extract ICD9 diagnosis codes from MIMIC-III. |
|
|
35 |
Mind that this script will use only 3 digits of the ICD9 diagnosis code. If you want to use all 5 digits, please see the source code of "process_mimic.py". |
|
|
36 |
|
|
|
37 |
2. Run medGAN using the ".matrix" file generated by process_mimic.py. The command is: |
|
|
38 |
`python medgan.py <matrix file> <output path> --data_type=["binary", "count"]`. |
|
|
39 |
|
|
|
40 |
3. After the training, if you want to generate synthetic records, use this command : |
|
|
41 |
`python medgan.py <matrix file> <generated output path> --model_file=<trained output path> --generate_data=True --data_type=["binary", "count"]`. |
|
|
42 |
Note that `<matrix file>` is not actually used for generating synthetic records, so it is just a dummy input. |