EHRKit-2022 / Git / [2d4573] /tests/README.md

Models:

philipB/

EHRKit-2022

Downloads: 1

[2d4573]: / tests / README.md

History

Download this file

70 lines (51 with data), 3.5 kB

What's new?

The UMN Clinical Abbreviation Sense Inventory: Clinical Abbreviations and Acronyms is indexed to solr core ehr_abbreviations (440 most frequently used abbreviations, 30000+ docs in total on solr)

The following two are temporarily placed in sample_script.py only, yet to incorporate into test.py
2. Function to output discharge summaries for patients who have more then 10 of them
3. Function to initialize word2vec model using outputted discharge summary above

Running Unit Tests on MIMIC-III:

Running test.py without any arguments would run all tests ones but skips ones that take a while to run. (Those tests must be called explicitly.)

> python tests.py

To run specific tests, follow the format python tests.py CLASS_NAME1.TEST_NAME1 CLASS_NAME2.TEST_NAME2 …
For example, you can run

> python tests.py t7.test7_1_naive_bayes

> python tests.py t3.test3_4_doc_sentences t7.test7_1_naive_bayes

Unittest prints "." when a test runs successfully, "E" when it encounters an error, and "s" when it skips a test.

No tests updates the database; all of them are just processing information from it. Not all tests currently work.

Tests available to run

The tests marked in bold are skipped and must be run explicitly.

T1.
1. Count the total number of patients. test1_1_count_patients
2. Count the total number of patient records (fails). test1_2_count_docs
3. Count the number of sentences. test1_3_note_info
4. Print the record with the most sentences. test1_4_longest_note

T2.

Display a full document given its document ID. test2_1_print_note
Count how many documents are associated with a given patient, given the patient ID, e.g., 23224 - show also the list of document IDs. test2_2_patient_info
List all document IDs. test2_3_doc_ids
List all patient IDs. test2_4_patient_ids
List all document IDs for a given admission date, e.g., 2188-11-1. test2_5_docs_on_date

T3.

Extract all abbreviations from a document, given the document ID. For now, let’s assume that an abbreviation is a sequence of two or more capital letters, e.g., GERD, PEERL, AMI. test3_1_extract_abbrevations
List all document IDs that include keyword "meningitis". test3_2_docs_with_query
List all document IDs that include keywords "Service: SURGERY”. test3_3_query_docs
Given a document ID, show a numbered list of all sentences in that document. test3_4_doc_sentences
Count the number of prescriptions for each unique medication. test3_5_medications

Use https://github.com/kavgan/phrase-at-scale to extract phrases from a document, given its ID. test5_1_extract_phrases
Count how many patients are labeled as “male” or “female”. test5_2_count_gender

T6
1. Classifies the sentiment of a document as positive or negative using AllenNLLP. test6_1_sentiment_classification
2. Performs named entity recognition on a document using AllenNLLP. test6_2_ner
3. Tokenizes the words of a document using Huggingface. test6_3_tokenize

T7.
1. Creates extractive summary of an EHR with Naive Bayes Algorithm trained on PubMed articles. test7_1_naive_bayes
2. Generates abstractive summary of an EHR with pre-trained Distilbart model from Huggingface (works poorly). test7_2_distilbart_summary
3. Generates abstractive summary of an EHR with pre-trained T5 model from Huggingface (works poorly). test7_3_t5_summary