--- a +++ b/collated_tasks/tasks/README.md @@ -0,0 +1,52 @@ +## SciSpacy + +In this tutorial, we will use [scispacy](https://allenai.github.io/scispacy/) for various extraction tasks on MIMIC notes. Here we use named entity extraction as a detailed example. +##### Named Entities +```get_named_entities.py``` + +Arguments: +- ```--mimic_dir```: directory to mimic data that includes NOTEEVENTS.csv, default to data/mimic\_data in the tutorials directory +- ```--model```: spaCy model to use, default to en_core_sci_sm +- ```--row_id```: the row_id of the row whose NOTE field will be processed, default to 178 +- ```--output_file```: output file to save identified named entities default to ./output_named_entities.txt + +Commands: +```sh +python get_named_entities.py +python get_named_entities.py --model en_core_sci_scibert --row_id 174 +``` +*Remark: if running with custom model, need to download the corresponding model using ```pip install <model url>``` where model urls can be found [here](https://allenai.github.io/scispacy/). + +A list of named entities will be written to the file specified by --ouput_file. + +##### Abbreviations +```get_abbreviations.py``` +##### Hyponyms +```get_hyponyms.py``` +##### Linked Entities +```get_linked_entities.py ``` + +The arguments for these 3 tasks are the same as get_named_entities.py. Use the following commands to run in default values. + +```sh +python get_abbreviations.py +python get_hyponyms.py +python get_linked_entities.py +``` + +## Translation with MarianMT +We use [MarianMT](https://huggingface.co/transformers/model_doc/marian.html) to translate clinical notes to another language. + +```get_translation.py``` + +Arguments: +- ```--mimic_dir```: directory to mimic data that includes NOTEEVENTS.csv, default to data/mimic\_data in the tutorials directory +- ```--target_language```: supported languages are {Spanish, French, Portuguese, Italian, Romanian, Malay_written_with_Latin, Mauritian_Creole, Haitian, Papiamento, Asturian, Catalan, Indonesian, Galician, Walloon, Occitan, Aragonese, Minangkabau}, default to Spanish +- ```--row_id```: the row_id of the row whose NOTE field will be processed, default to 178 +- ```--output_file```: output file to save original and translated notes, default to ./output_translation.txt + +Commands: +```sh +python get_translation.py +python get_translation.py --target_language French --row_id 2000 --output_file ./output_French.txt +```