In this folder, you can test individual tasks including ICD9 code classification, Naive Bayes summarization, and query extraction.
Task Details:
- ICD9 code classification: predict ICD9 code using tfidf representations
- Naive Bayes Summarization: summarize text using a Naive Bayes model trained on the PubMed corpus
- Query Extraction: search for specific queries in medical text using different methods
Create virtual environment in parent directory using the following commands.
cd ..
python3 -m venv ehrvir/
source ehrvir/bin/activate
If you are using pip==18.1, comment out line 20 of requirements.txt en-core-web-sm
and run
pip install -r requirements.txt
Install en-core-web-sm
by running
python -m spacy download en_core_web_sm
This is needed for the entity extraction task.
Create MIMIC data and output data directory using the following commands.
mkdir data
mkdir data/mimic_data
mkdir data/output_data
You will need to put all MIMIC data under data/mimic_data
, which is the folder containing all csv files downloaded from the orginal MIMIC dataset. If you are on tangra, you can the following command to copy everything from /lada2/lily/zl379/Year4/EHRTest/EHRKit/tutorials/data/mimic_data
:
cp -r /lada2/lily/zl379/Year4/EHRTest/EHRKit/tutorials/data/mimic_data data/.
cp mimic_classifier.py ../.
cd ..
python mimic_classifier.py
Note: you need to run mimic_classifier.py
from the ROOT path of EHRKit.
You can test Naive Bayes summarization and query extraction using the notebooks naiveBayes.ipynb and query_extractor respectively. ICD0 code classification is also documented in mimic_classifier.ipynb.
To run a notebook, you need to first install ipykernel using python -m pip install ipykernel
. Set tthe kernel name to your virtual environment name using ipython kernel install --user --name=<name of virtual environment>
. Run jupyter notebook --no-browser . &
and note the server port number. Open a local terminal and run ssh -o ServerAliveInterval=30 -L <any local port>:localhost:<server port number> <username>@tangra.cs.yale.edu
. Go to http://localhost:localportnumber and follow instructions in the notebook.
Instructions to run other tasks are located in the collated_tasks directory.