Diff of /README.md [000000] .. [bab239]

Switch to unified view

a b/README.md
1
medGAN
2
=========================================
3
medGAN is a generative adversarial network for generating multi-label discrete patient records. It can generate both binary and count variables (i.e. medical codes such as diagnosis codes, medication codes or procedure codes).
4
5
#### Relevant Publications
6
7
medGAN implements the algorithm introduced in the following [paper](https://arxiv.org/abs/1703.06490):
8
9
    Generating Multi-label Discrete Patient Records using Generative Adversarial Networks
10
    Edward Choi, Siddharth Biswal, Bradley Malin, Jon Duke, Walter F. Stewart, Jimeng Sun  
11
    Machine Learning for Healthcare (MLHC) 2017
12
13
#### Code Description
14
15
This code trains a generative adversarial network to generate patient records. This work currently can handle patient records that are aggregated over time, hence represented as a matrix where a row corresponds to a patient, and a column to a specific medical code (e.g. diagonsis code, medication code, or procedure code). The value of the matrix could either be binary (i.e. a specific medical code occurred in the longitudinal patient record or not) or count (i.e. how many times a specific medical code occurred in the longitudinal patient record).
16
    
17
#### Running GRAM
18
19
**STEP 1: Installation**  
20
21
1. medGAN was implemented to run on [TensorFlow](https://www.python.org/) 1.2. TensorFlow can be easily installed in Ubuntu as suggested [here](https://www.tensorflow.org/install/install_linux)
22
23
2. Download/clone the medGAN code
24
25
**STEP 2: Fast way to test medGAN with MIMIC-III**  
26
This step describes how to train medGAN, with minimum number of steps using MIMIC-III.
27
28
0. You will first need to request access for [MIMIC-III](https://mimic.physionet.org/gettingstarted/access/), a publicly avaiable electronic health records collected from ICU patients over 11 years. 
29
30
1. You can use "process_mimic.py" to process MIMIC-III dataset and generate a suitable training dataset for medGAN. 
31
Place the script to the same location where the MIMIC-III CSV files are located, and run the script.
32
The execution command is `python process_mimic.py ADMISSIONS.csv DIAGNOSES_ICD.csv <output file> <"binary"|"count">`.
33
Note that the last argument decides whether you construct a binary matrix or a count matrix.
34
The above command will extract ICD9 diagnosis codes from MIMIC-III. 
35
Mind that this script will use only 3 digits of the ICD9 diagnosis code. If you want to use all 5 digits, please see the source code of "process_mimic.py".
36
37
2. Run medGAN using the ".matrix" file generated by process_mimic.py. The command is:
38
`python medgan.py <matrix file> <output path> --data_type=["binary", "count"]`.
39
40
3. After the training, if you want to generate synthetic records, use this command :
41
`python medgan.py <matrix file> <generated output path> --model_file=<trained output path> --generate_data=True --data_type=["binary", "count"]`.
42
Note that `<matrix file>` is not actually used for generating synthetic records, so it is just a dummy input.