|
a |
|
b/tutorials/README.md |
|
|
1 |
# Individual Task Examples |
|
|
2 |
|
|
|
3 |
In this folder, you can test individual tasks including ICD9 code classification, Naive Bayes summarization, and query extraction. |
|
|
4 |
|
|
|
5 |
Task Details: |
|
|
6 |
- [ICD9 code classification](https://github.com/Yale-LILY/EHRKit/tree/master/mimic_icd9_coding): predict ICD9 code using tfidf representations |
|
|
7 |
- [Naive Bayes Summarization](https://github.com/Yale-LILY/EHRKit/tree/master/summarization/pubmed_summarization): summarize text using a Naive Bayes model trained on the PubMed corpus |
|
|
8 |
- [Query Extraction](https://github.com/Yale-LILY/EHRKit/tree/master/QueryExtraction): search for specific queries in medical text using different methods |
|
|
9 |
|
|
|
10 |
## Virtual Environment |
|
|
11 |
Create virtual environment in parent directory using the following commands. |
|
|
12 |
```sh |
|
|
13 |
cd .. |
|
|
14 |
python3 -m venv ehrvir/ |
|
|
15 |
source ehrvir/bin/activate |
|
|
16 |
``` |
|
|
17 |
If you are using pip==18.1, comment out line 20 of requirements.txt ```en-core-web-sm``` and run |
|
|
18 |
```sh |
|
|
19 |
pip install -r requirements.txt |
|
|
20 |
``` |
|
|
21 |
Install ```en-core-web-sm``` by running |
|
|
22 |
```sh |
|
|
23 |
python -m spacy download en_core_web_sm |
|
|
24 |
``` |
|
|
25 |
This is needed for the entity extraction task. |
|
|
26 |
|
|
|
27 |
## Preparation |
|
|
28 |
Create MIMIC data and output data directory using the following commands. |
|
|
29 |
```sh |
|
|
30 |
mkdir data |
|
|
31 |
mkdir data/mimic_data |
|
|
32 |
mkdir data/output_data |
|
|
33 |
``` |
|
|
34 |
You will need to put all MIMIC data under ```data/mimic_data```, which is the folder containing all csv files downloaded from the orginal MIMIC dataset. If you are on tangra, you can the following command to copy everything from ```/lada2/lily/zl379/Year4/EHRTest/EHRKit/tutorials/data/mimic_data```: |
|
|
35 |
```sh |
|
|
36 |
cp -r /lada2/lily/zl379/Year4/EHRTest/EHRKit/tutorials/data/mimic_data data/. |
|
|
37 |
``` |
|
|
38 |
|
|
|
39 |
## Run mimic_classifier.py |
|
|
40 |
```sh |
|
|
41 |
cp mimic_classifier.py ../. |
|
|
42 |
cd .. |
|
|
43 |
python mimic_classifier.py |
|
|
44 |
``` |
|
|
45 |
Note: you need to run ```mimic_classifier.py``` from the ROOT path of EHRKit. |
|
|
46 |
|
|
|
47 |
## Run a notebook |
|
|
48 |
You can test Naive Bayes summarization and query extraction using the notebooks naiveBayes.ipynb and query_extractor respectively. ICD0 code classification is also documented in mimic\_classifier.ipynb. |
|
|
49 |
|
|
|
50 |
To run a notebook, you need to first install ipykernel using ```python -m pip install ipykernel```. Set tthe kernel name to your virtual environment name using ```ipython kernel install --user --name=<name of virtual environment>```. Run ```jupyter notebook --no-browser . &``` and note the server port number. Open a local terminal and run ```ssh -o ServerAliveInterval=30 -L <any local port>:localhost:<server port number> <username>@tangra.cs.yale.edu```. Go to http://localhost:localportnumber and follow instructions in the notebook. |
|
|
51 |
|
|
|
52 |
## Collated tasks |
|
|
53 |
Instructions to run other tasks are located in the [collated_tasks](https://github.com/Yale-LILY/EHRKit/tree/master/collated_tasks) directory. |