Diff of /tutorials/README.md [000000] .. [2d4573]

Switch to unified view

a b/tutorials/README.md
1
# Individual Task Examples
2
3
In this folder, you can test individual tasks including ICD9 code classification, Naive Bayes summarization, and query extraction.
4
5
Task Details:
6
- [ICD9 code classification](https://github.com/Yale-LILY/EHRKit/tree/master/mimic_icd9_coding): predict ICD9 code using tfidf representations
7
- [Naive Bayes Summarization](https://github.com/Yale-LILY/EHRKit/tree/master/summarization/pubmed_summarization): summarize text using a Naive Bayes model trained on the PubMed corpus
8
- [Query Extraction](https://github.com/Yale-LILY/EHRKit/tree/master/QueryExtraction): search for specific queries in medical text using different methods
9
10
## Virtual Environment
11
Create virtual environment in parent directory using the following commands.
12
```sh
13
cd ..
14
python3 -m venv ehrvir/
15
source ehrvir/bin/activate
16
```
17
If you are using pip==18.1, comment out line 20 of requirements.txt ```en-core-web-sm``` and run 
18
```sh
19
pip install -r requirements.txt
20
```
21
Install ```en-core-web-sm``` by running
22
```sh
23
python -m spacy download en_core_web_sm
24
```
25
This is needed for the entity extraction task.
26
27
## Preparation
28
Create MIMIC data and output data directory using the following commands.
29
```sh
30
mkdir data
31
mkdir data/mimic_data
32
mkdir data/output_data
33
```
34
You will need to put all MIMIC data under ```data/mimic_data```, which is the folder containing all csv files downloaded from the orginal MIMIC dataset. If you are on tangra, you can the following command to copy everything from ```/lada2/lily/zl379/Year4/EHRTest/EHRKit/tutorials/data/mimic_data```:
35
```sh 
36
cp -r /lada2/lily/zl379/Year4/EHRTest/EHRKit/tutorials/data/mimic_data data/.
37
```
38
39
## Run mimic_classifier.py
40
```sh
41
cp mimic_classifier.py ../.
42
cd ..
43
python mimic_classifier.py
44
```
45
Note: you need to run ```mimic_classifier.py``` from the ROOT path of EHRKit.
46
47
## Run a notebook
48
You can test Naive Bayes summarization and query extraction using the notebooks naiveBayes.ipynb and query_extractor respectively. ICD0 code classification is also documented in mimic\_classifier.ipynb.
49
50
To run a notebook, you need to first install ipykernel using ```python -m pip install ipykernel```. Set tthe kernel name to your virtual environment name using ```ipython kernel install --user --name=<name of virtual environment>```. Run ```jupyter notebook --no-browser . &``` and note the server port number. Open a local terminal and run ```ssh -o ServerAliveInterval=30  -L <any local port>:localhost:<server port number> <username>@tangra.cs.yale.edu```. Go to http://localhost:localportnumber and follow instructions in the notebook. 
51
52
## Collated tasks
53
Instructions to run other tasks are located in the [collated_tasks](https://github.com/Yale-LILY/EHRKit/tree/master/collated_tasks) directory.