Switch to unified view

a b/wrapper_functions/documentation.md
1
2
 # 📔 Documentation
3
4
<img src="https://github.com/karenacorn99/LILY-EHRKit/blob/main/EHRLogo.png" alt="drawing" width="140"/> 
5
6
A Python Natural Language Processing Toolkit for Electronic Health Record Texts
7
## Key Modules and Functions
8
9
### ✨multi_doc_functions.py 
10
11
`get_similar_documents(bert_model, query_note, candidate_notes, candidates, top_k)`: find similar documents/records given the query record ID number, return the `top_k` results.
12
13
**Parameters**:
14
* `bert_model`: the name of the bert model
15
* `query_note`: query note, string
16
* `candidate_notes`: candiate note, a list of string
17
* `candidates`: a list of candidate ID, a list of int
18
* `top_k`: the number of return results, default value is 2
19
* returns a `DataFrame` with candidate_note_id, similarity_score, and candidate_text
20
21
`get_clusters(bert_model, notes, k=2)`: Use K-means to cluster the records using pretrained bert encoding into `k` clusters. 
22
23
**Parameters**:
24
* `bert_model`: the name of the bert model
25
* `top_k`: the number of clusters, default value is 2
26
* returns a `DataFrame` object...
27
28
### ✨scispacy_functions.py
29
`get_abbreviations(model, text)`: get abbreviations and their meanings of the input text.
30
31
**Parameters**:
32
* `model`: model name, supports Spacy models
33
* `text` : input text, string
34
* returns a list of tuples in the form (abbreviation, expanded form), each element being a str
35
36
`get_hyponyms(model, text)`: get hyponyms of the recognized entities in the input text.
37
38
**Parameters**:
39
* `model`: model name, supports Spacy models
40
* `text` : input text, string
41
* returns a list of tuples in the form (hearst_pattern, entity_1, entity_2, ...), each element being a str
42
43
`get_linked_entities(model, text)`: get linked entities in the input text.
44
45
**Parameters**:
46
* `model`: model name, supports Spacy models
47
* `text` : input text, string
48
* returns a dictionary in the form {named entity: list of strings each describing one piece of linked information}
49
50
`get_named_entities(model, text)`: get named entities in the input text.
51
52
**Parameters**:
53
* `model`: model name, supports Spacy models
54
* `text` : input text, string
55
* returns a list of strings, each string is an identified named entity
56
57
### ✨transformer_functions.py
58
`get_supported_translation_languages()`: returns a list of support target language names in string.
59
60
`get_translation(text, model_name, target_language)`: translate the input text into the target language.
61
62
**Parameters**:
63
* `text`: input text in string
64
* `model_name`: bert model name in string
65
* `target_language`: target language name from the supported langauge list
66
* returns a string, which is the translated version of text]
67
68
`get_bert_embeddings(pretrained_model, texts)`: encode the input text with pretrained bert model
69
70
**Parameters**:
71
* `pretrained_model`: bert model name in string
72
* `texts`: input text in a list of string  
73
* returns a list of lists of sentences, each list is made up of sentences from the same document
74
75
### ✨stanza_functions.py
76
`get_denpendencies(text)`: dependency parsing result for the input `text` in string, this is a wrapper of the stanza library.
77
78
79
### ✨summarization_functions.py
80
`get_single_summary(text, model_name="t5-small", min_length=50, max_length=200)`: single document summarization.
81
82
**Parameters**:
83
* `text`: a string for the input text
84
* `model_name`: bert model name in string, now we support the following models: `bart-large-cnn`', '`t5-small`', '`t5-base`', '`t5-large`', '`t5-3b`', '`t5-11b` 
85
* `min_length`: min length in summary
86
* `max_length`: max length in summary
87
* returns a list of summarization in string
88
89
`get_multi_summary_joint(text, model_name="osama7/t5-summarization-multinews", min_length=50, max_length=200)`: multi-document summarization function. Join all the input documents as a long document, then do single document summarization.
90
91
**Parameters**:
92
* `text`: a list of document in string
93
* `model_name`: bert model name in string, now we support the following models: `bart-large-cnn`, `t5-small`, `t5-base`, `t5-large`, `t5-3b`', `t5-11b` 
94
* `min_length`: min length in summary
95
* `max_length`: max length in summary
96
* returns a list of summarization in string
97
98
`get_multi_summary_extractive_textRank(text,ratio=-0.1,words=0)`: Textrank method for multi-doc summarization.
99
100
**Parameters**:
101
* `text`: a list of string
102
* `ratio`: the ratio of summary (0-1.0)
103
* `words`: the number of words of summary, default is 50
104
* returns a string as the final summarization
105
106
### ✨medspacy_functions.py 
107
108
`get_word_tokenization(text)`: word tokenization using medspaCy package.
109
110
**Parameters**:
111
* `text`: input string text
112
* returns a list of token or word in string
113
114
`get_section_detection(text,rules)`: given a string as the input, extract sections, consisting of medical history, allergies, comments and so on.
115
116
**Parameters**:
117
* `text`: input string text
118
* `rule`: the personalized rules, a dictionary of string, i.e., {"category": "allergies"}, default is None
119
* returns a list of spacy Section object
120
121
`get_UMLS_match(text)`: match the UMLS concept for the input text.
122
123
**Parameters**:
124
* `text`: input string text
125
* returns a list of tuples, (entity_text, label, similarity, semtypes)
126