|
a |
|
b/Readme.md |
|
|
1 |
# Description |
|
|
2 |
|
|
|
3 |
|
|
|
4 |
<p align="center"> |
|
|
5 |
<img src="./assets/cs.png" alt="cs" width="100"/> |
|
|
6 |
<img src="./assets/illuin.png" alt="illuin" width="100"/> |
|
|
7 |
</p> |
|
|
8 |
|
|
|
9 |
This repository contains our code for a competition organised by Centrale Supelec and Illuin Technology. It is possible to learn more about the tasks by looking at the content of the folder `Explication dataset` or by looking at the final presentation in `presentation`. |
|
|
10 |
|
|
|
11 |
The competitions contained 2 parts, the first contains 3 tasks of NER, NLI, text classification... and the second is the creation of a search engine capable of finding patients based on filters and a search Query. |
|
|
12 |
 |
|
|
13 |
|
|
|
14 |
|
|
|
15 |
# Mainly used technologies |
|
|
16 |
- Transformers library by HuggingFace |
|
|
17 |
- Scibert |
|
|
18 |
- Biobert |
|
|
19 |
- Electramed |
|
|
20 |
- MiniLM-L6 |
|
|
21 |
- Streamlit |
|
|
22 |
- Flask |
|
|
23 |
- Annoy |
|
|
24 |
# How to use |
|
|
25 |
## Evaluation |
|
|
26 |
First, we need to download the submodule for evaluation : |
|
|
27 |
```bash |
|
|
28 |
$ git submodule init |
|
|
29 |
$ git submodule update |
|
|
30 |
``` |
|
|
31 |
|
|
|
32 |
## Build dataset |
|
|
33 |
You can find the data here https://portal.dbmi.hms.harvard.edu/projects/n2c2-nlp/ |
|
|
34 |
|
|
|
35 |
First we need to have the initial data as follows : |
|
|
36 |
|
|
|
37 |
```bash |
|
|
38 |
medical_txt_parser |
|
|
39 |
├── Explication dataset/ |
|
|
40 |
├── train_data/ |
|
|
41 |
├── beth/ |
|
|
42 |
├── ast/ |
|
|
43 |
... |
|
|
44 |
└── record-13.ast |
|
|
45 |
├── concept |
|
|
46 |
... |
|
|
47 |
└── record-13.con |
|
|
48 |
├── rel |
|
|
49 |
... |
|
|
50 |
└── record-13.rel |
|
|
51 |
└── txt |
|
|
52 |
... |
|
|
53 |
└── record-13.txt |
|
|
54 |
└── partners/ |
|
|
55 |
├── ast/ |
|
|
56 |
... |
|
|
57 |
└── record-10.ast |
|
|
58 |
├── concept |
|
|
59 |
... |
|
|
60 |
└── record-10.con |
|
|
61 |
├── rel |
|
|
62 |
... |
|
|
63 |
└── record-10.rel |
|
|
64 |
└── txt |
|
|
65 |
... |
|
|
66 |
└── record-10.txt |
|
|
67 |
|
|
|
68 |
└── src/ |
|
|
69 |
``` |
|
|
70 |
|
|
|
71 |
Then execute the following command to build the dataset from the root of the project: |
|
|
72 |
|
|
|
73 |
```bash |
|
|
74 |
$ ./src/data_merger.sh |
|
|
75 |
``` |
|
|
76 |
|
|
|
77 |
To prepare the embeddings and clusters for the search API: |
|
|
78 |
```bash |
|
|
79 |
$ cd src |
|
|
80 |
$ python -m clustering.prepare_embeddings |
|
|
81 |
``` |
|
|
82 |
|
|
|
83 |
To launch the app, start in the root directory of the project by executing : |
|
|
84 |
``` |
|
|
85 |
$ python src/api.py |
|
|
86 |
$ streamlit run app/search_engine.py |
|
|
87 |
``` |