Diff of /tests/README.md [000000] .. [2d4573]

Switch to unified view

a b/tests/README.md
1
## What's new?
2
3
1. The UMN Clinical Abbreviation Sense Inventory: Clinical Abbreviations and Acronyms is indexed to solr core ehr_abbreviations (440 most frequently used abbreviations, 30000+ docs in total on solr)
4
5
The following two are temporarily placed in sample_script.py only, yet to incorporate into test.py
6
2. Function to output discharge summaries for patients who have more then 10 of them
7
3. Function to initialize word2vec model using outputted discharge summary above
8
9
## Running Unit Tests on MIMIC-III:
10
11
Running test.py without any arguments would run all tests ones but **skips** ones that take a while to run. (Those tests must be called explicitly.)
12
```
13
> python tests.py
14
```
15
16
To run specific tests, follow the format python tests.py CLASS_NAME1.TEST_NAME1 CLASS_NAME2.TEST_NAME2 …
17
For example, you can run
18
```
19
> python tests.py t7.test7_1_naive_bayes
20
21
```
22
or
23
```
24
> python tests.py t3.test3_4_doc_sentences t7.test7_1_naive_bayes 
25
```
26
27
Unittest prints "." when a test runs successfully, "E" when it encounters an error, and "s" when it skips a test.
28
29
No tests updates the database; all of them are just processing information from it. Not all tests currently work.
30
31
###Tests available to run
32
33
The tests marked in bold are skipped and must be run explicitly.
34
35
T1.
36
1. Count the total number of patients. test1_1_count_patients
37
2. **Count the total number of patient records** (fails). test1_2_count_docs
38
3. **Count the number of sentences**. test1_3_note_info
39
4. **Print the record with the most sentences**. test1_4_longest_note
40
41
T2.
42
43
1. Display a full document given its document ID. test2_1_print_note
44
2. **Count how many documents are associated with a given patient, given the patient ID, e.g., 23224 - show also the list of document IDs**. test2_2_patient_info
45
3. List all document IDs. test2_3_doc_ids
46
4. List all patient IDs. test2_4_patient_ids
47
5. List all document IDs for a given admission date, e.g., 2188-11-1. test2_5_docs_on_date
48
49
T3.
50
51
1. Extract all abbreviations from a document, given the document ID. For now, let’s assume that an abbreviation is a sequence of two or more capital letters, e.g., GERD, PEERL, AMI. test3_1_extract_abbrevations
52
2. **List all document IDs that include keyword "meningitis"**. test3_2_docs_with_query
53
3. **List all document IDs that include keywords "Service: SURGERY”**. test3_3_query_docs
54
4. Given a document ID, show a numbered list of all sentences in that document. test3_4_doc_sentences
55
5. **Count the number of prescriptions for each unique medication**. test3_5_medications
56
57
T5
58
59
1. **Use https://github.com/kavgan/phrase-at-scale to extract phrases from a document, given its ID**. test5_1_extract_phrases
60
2. Count how many patients are labeled as “male” or “female”. test5_2_count_gender
61
62
T6
63
1. **Classifies the sentiment of a document as positive or negative using AllenNLLP.** test6_1_sentiment_classification
64
2. **Performs named entity recognition on a document using AllenNLLP**. test6_2_ner
65
3. **Tokenizes the words of a document using Huggingface.** test6_3_tokenize
66
67
T7. 
68
1. **Creates extractive summary of an EHR with Naive Bayes Algorithm trained on PubMed articles.** test7_1_naive_bayes
69
2. **Generates abstractive summary of an EHR with pre-trained Distilbart model from Huggingface (works poorly)**. test7_2_distilbart_summary
70
3. **Generates abstractive summary of an EHR with pre-trained T5 model from Huggingface (works poorly)**. test7_3_t5_summary