Diff of /README.md [000000] .. [71ad2f]

Switch to unified view

a b/README.md
1
# Automatic Assignment of ICD codes
2
3
## Introduction
4
This repo contains codes for assignment of ICD codes to medical/clinical text. Data used here is the MIMICIII dataset. Different models have been tried from linear machine learning models to state of the art pretrained NLP model BERT.
5
6
## Structure of the project
7
8
At the root of the project, you will have:
9
10
- **main.py**: used for training and testing different models
11
- **requirements.txt**: contains the minimum dependencies for running the project
12
- **w2vmodel.model**: gensim word2vec model trained on MIMICIII discharge summaries
13
- **src**: a folder that contains:
14
  - **bert**: contains utilities and files for pretrained bert model
15
  - **cnn**: contains utilities and files for CNN model
16
  - **hybrid**: contains utilities and files for the hybrid model (LSTM+CNN) model
17
  - **rnn**: contains utilities and files for LSTM and GRU models
18
  - **ovr**: contains utilities and files for different Machine Learning Models (like LR, SVM, NaiveBayes)
19
  - **fit.py**: training code for both LSTM and CNN models
20
  - **test_results.py**: inferencing code for trained model used for both LSTM and CNN models
21
  - **utils.py**: genearal utility codes used for all the models
22
23
## Dependencies
24
 The dependencies are mentioned in the `requirements.txt` file.
25
 They can be installed by:
26
 ```bash
27
pip install -r requirements.txt
28
```
29
30
## How to use the code
31
32
Launch train.py with the following arguments:
33
34
- `train_path`: path of the training data. 
35
- `test_path`: path of the test data
36
- `model_name`: one of the 5 models implemented ['bert', 'hybrid', 'lstm', 'gru', 'cnn', 'ovr']. Default to 'bert'
37
- `icd_type`: training on different types of icd labels, ['icd9cat', 'icd9code', 'icd10cat', 'icd10code']. Default to 'icd9cat'
38
- `epochs`: number of epochs 
39
- `batch_size`: batch size, default to 16 (for bert model).
40
- `val_split`: validation split of the training data, default = 2/7 (train:val:split = 5:2:3)
41
- `learning_rate`: default to 2e-5 (for bert model)
42
- `w2vmodel`: path for pretrained gensim word2vec model.
43
44
***Example***
45
```bash
46
python main.py --train_path train.csv --test_path test.csv --model_name cnn
47
```
48
49
## Data
50
The data used for training can be downloaded from:
51
- [train data](https://drive.google.com/file/d/1--ZVpt614neHN9erxmsg6s6aGInThJ22/view?usp=sharing)
52
- [test data](https://drive.google.com/file/d/1-4tp0og0I7KyNMoqF2_t1smu0_GqQCVf/view?usp=sharing)
53