In this tutorial, we will use scispacy for various extraction tasks on MIMIC notes. Here we use named entity extraction as a detailed example.
get_named_entities.py
Arguments:
- --mimic_dir
: directory to mimic data that includes NOTEEVENTS.csv, default to data/mimic_data in the tutorials directory
- --model
: spaCy model to use, default to en_core_sci_sm
- --row_id
: the row_id of the row whose NOTE field will be processed, default to 178
- --output_file
: output file to save identified named entities default to ./output_named_entities.txt
Commands:
python get_named_entities.py
python get_named_entities.py --model en_core_sci_scibert --row_id 174
*Remark: if running with custom model, need to download the corresponding model using pip install <model url>
where model urls can be found here.
A list of named entities will be written to the file specified by --ouput_file.
get_abbreviations.py
get_hyponyms.py
get_linked_entities.py
The arguments for these 3 tasks are the same as get_named_entities.py. Use the following commands to run in default values.
python get_abbreviations.py
python get_hyponyms.py
python get_linked_entities.py
We use MarianMT to translate clinical notes to another language.
get_translation.py
Arguments:
- --mimic_dir
: directory to mimic data that includes NOTEEVENTS.csv, default to data/mimic_data in the tutorials directory
- --target_language
: supported languages are {Spanish, French, Portuguese, Italian, Romanian, Malay_written_with_Latin, Mauritian_Creole, Haitian, Papiamento, Asturian, Catalan, Indonesian, Galician, Walloon, Occitan, Aragonese, Minangkabau}, default to Spanish
- --row_id
: the row_id of the row whose NOTE field will be processed, default to 178
- --output_file
: output file to save original and translated notes, default to ./output_translation.txt
Commands:
python get_translation.py
python get_translation.py --target_language French --row_id 2000 --output_file ./output_French.txt