Switch to unified view

a b/wrapper_functions/README.md
1
2
# EHRKit-WF: we provide many off-shelf methods for processing medical text. 
3
[![Python 3.6.13](https://img.shields.io/badge/python-3.6.13-green.svg)](https://www.python.org/downloads/release/python-360/)
4
[![Python 3.8.8](https://img.shields.io/badge/python-3.8.8-green.svg)](https://www.python.org/downloads/release/python-380/)
5
## Overview
6
We integrate various text-processing tools, with a focus on the medical domain, into one single, user-friendly toolkit.
7
8
## Installation
9
### Create virtual environment
10
11
```bash
12
python3 -m venv virtenv/ 
13
source virtenv/bin/activate
14
```
15
### Install packages
16
```bash
17
pip install pip==21.3.1
18
pip install -U spacy
19
pip install scispacy
20
pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.4.0/en_core_sci_sm-0.4.0.tar.gz
21
pip install pandas==1.1.5
22
pip install numpy==1.19.5
23
pip install transformers==4.12.3
24
pip install torch==1.10.0
25
pip install torchvision==0.11.1
26
pip install sentencepiece==0.1.96
27
pip install sklearn
28
pip install PyRuSH
29
pip install stanza==1.3.0
30
pip install ipywidgets
31
pip install ipykernel
32
pip install summa==1.2.0
33
pip install negspacy==1.0.2
34
pip install medspacy==0.2.0.0
35
wget https://github.com/jianlins/PyRuSH/raw/master/conf/rush_rules.tsv -P conf
36
```
37
38
## Quick Start
39
Here, we show examples of runnng a single-document task and a multi-document task. 
40
41
For a complete run-through of all tasks, run the demo script by using ```python demo.py```. 
42
43
For a comprehensive tutorial, check the [tutorial notebook](https://github.com/karenacorn99/LILY-EHRKit/blob/main/EHRKit_tutorials.ipynb).
44
45
### Single-document Task Example
46
Single document tasks operates on a single free-text record.
47
```python
48
from EHRKit import EHRkit
49
50
# create kit 
51
kit = EHRKit()
52
53
main_record = "Spinal and bulbar muscular atrophy (SBMA) is an \
54
inherited motor neuron disease caused by the expansion \
55
of a polyglutamine tract within the androgen receptor (AR). \
56
SBMA can be caused by this easily."
57
58
# add main_record
59
kit.update_and_delete_main_record(main_record)
60
61
# call single-document tasks on main_record
62
kit.get_abbreviations()
63
>> [('SBMA', 'Spinal and bulbar muscular atrophy'),
64
 ('SBMA', 'Spinal and bulbar muscular atrophy'),
65
 ('AR', 'androgen receptor')]
66
```
67
68
### Multi-document Task Example
69
Multi-document tasks operate on several free-text records.
70
```python
71
from EHRKit import EHRkit
72
73
# create kit 
74
kit = EHRKit()
75
76
''' A document about neuron.'''
77
record = "Neurons (also called neurones or nerve cells) are the fundamental units of the brain and nervous system, " \
78
         "the cells responsible for receiving sensory input from the external world, for sending motor commands to " \
79
         "our muscles, and for transforming and relaying the electrical signals at every step in between. More than " \
80
         "that, their interactions define who we are as people. Having said that, our roughly 100 billion neurons do" \
81
         " interact closely with other cell types, broadly classified as glia (these may actually outnumber neurons, " \
82
         "although it’s not really known)."
83
         
84
''' A document about neural network. '''
85
cand1 = "A neural network is a series of algorithms that endeavors to recognize underlying relationships in a set of " \
86
        "data through a process that mimics the way the human brain operates. In this sense, neural networks refer to " \
87
        "systems of neurons, either organic or artificial in nature."
88
        
89
''' A document about aspirin. '''
90
cand2 = "Prescription aspirin is used to relieve the symptoms of rheumatoid arthritis (arthritis caused by swelling " \
91
        "of the lining of the joints), osteoarthritis (arthritis caused by breakdown of the lining of the joints), " \
92
        "systemic lupus erythematosus (condition in which the immune system attacks the joints and organs and causes " \
93
        "pain and swelling) and certain other rheumatologic conditions (conditions in which the immune system " \
94
        "attacks parts of the body)."
95
96
''' Another document about aspirin. '''
97
cand3 = "People can buy aspirin over the counter without a prescription. Everyday uses include relieving headache, " \
98
        "reducing swelling, and reducing a fever. Taken daily, aspirin can lower the risk of cardiovascular events, " \
99
        "such as a heart attack or stroke, in people with a high risk. Doctors may administer aspirin immediately" \
100
        " after a heart attack to prevent further clots and heart tissue death."
101
        
102
# add main_record
103
kit.update_and_delete_main_record(record)
104
105
# add supporting_records
106
kit.replace_supporting_records([cand1, cand2, cand3])
107
108
# performs k-means clustering on the 4 documents
109
kit.get_clusters(k=2)
110
111
>>                                     note   cluster
112
   0  Neurons (also called neurones or ...    0
113
   1  A neural netwrok is a series of ...     0
114
   2  Prescription aspirin is used to ...     1
115
   3  People can buy aspirin over the ...     1
116
```
117
118
### Key Functions
119
- Abbreviation Detection & Expansion
120
- Hyponym Detection
121
- Entity Linking
122
- Named Entity Recognition
123
- Translation
124
- Sentencizer
125
- Document clustering
126
- Similar Document Retrieval
127
- Word Tokenization
128
- Negation Detection
129
- Section Detection
130
- UMLS Concept Extraction
131
132
### New Release Models for Machine Translation - May, 2023
133
We fine-tuned on the [UFAL data](https://ufal.mff.cuni.cz/ufal_medical_corpus) to support more languages, feel free to download the Transformer models [MT5-based](https://huggingface.co/qcz), more models Users can also be found [SciFive-based](https://huggingface.co/irenelizihui/scifive_ufal_MT_en_es/). 
134
135
## Troubleshooting 🔧
136
137
### `ModuleNotFoundError: No module named 'click._bashcomplete'`
138
139
You may have dependency confusion and have the wrong version of click installed. Try `pip install click==7.1.1`.
140
141
### The demo.py file outputs "Killed" with no error message.
142
143
Your computer does not have enough CPU/GPU/RAM to run this model so your kernel shut down the process because it was starved for resources.
144
145
### `TypeError: 'module' object is not callable`
146
147
For some reason the PyRuSH module does not behave the same on all machines. Try replacing the line `rush = RuSH('conf/rush_rules.tsv')` with `rush = RuSH.RuSH('conf/rush_rules.tsv')` in the `utils.py` file.
148
149
### `AttributeError: 'IntervalTree' object has no attribute 'search'`
150
151
Another dependency confusion error: try `pip install intervaltree==2.1.0`.
152