|
a |
|
b/src/README.md |
|
|
1 |
# Source code |
|
|
2 |
|
|
|
3 |
## Directory structure |
|
|
4 |
|
|
|
5 |
- `./constants/`: Contains constants used in the code, such as paths to datasets, glossaries, name mappings, etc. |
|
|
6 |
|
|
|
7 |
- `./evaluation/`: Contains code for evaluation, including generation of tables and plots, and computation of statistical tests. |
|
|
8 |
|
|
|
9 |
- `./experiments/`: Contains experiments for each method: |
|
|
10 |
- `./experiments/rf.py`: Random Forest |
|
|
11 |
- `./experiments/bilstm.py`: BiLSTM |
|
|
12 |
- `./experiments/bert.py`: ClinicalBERT architectures |
|
|
13 |
|
|
|
14 |
- `./extensions/`: Contains extensions to Python Baal and Transformers libraries. |
|
|
15 |
|
|
|
16 |
- `./features/`: Contains implementations of input features for RF and BiLSTM methods, and input representations for Clinical BERT and Paired Clinical BERT. |
|
|
17 |
|
|
|
18 |
- `./ml_models/`: Implementation of the different machine learning (ML) methods. |
|
|
19 |
|
|
|
20 |
- `./models/`: Contains data models for preprocessing and feature generation. |
|
|
21 |
|
|
|
22 |
- `./preprocessing/`: Contains preprocessing scripts. |
|
|
23 |
|
|
|
24 |
- `./re_datasets/`: Datasets factory for BiLSTM and BERT models, creating a Hugging Face (HF) Dataset. |
|
|
25 |
|
|
|
26 |
- `./scripts/`: Bash scripts for running experiments on the GPU cluster. |
|
|
27 |
|
|
|
28 |
- `./training/`: Contains trainers for each ML method and common training resources. |
|
|
29 |
|
|
|
30 |
- `./config/`: Configuration of the logging of results to Neptune.ai. |
|
|
31 |
|
|
|
32 |
- `./nlp_pipeline.py`: NLP Spacy pipeline. |
|
|
33 |
|
|
|
34 |
- `./utils.py`: Helper functions used throughout the code. |
|
|
35 |
|
|
|
36 |
- `./vocabulary.py`: Vocabulary module, representing mapping between tokens and indices. |
|
|
37 |
|
|
|
38 |
|
|
|
39 |
Most of the subdirectories contain a more detailed `README` file. |