Diff of /Readme.md [000000] .. [d69072]

Switch to unified view

a b/Readme.md
1
# Description
2
3
4
<p align="center">
5
<img src="./assets/cs.png" alt="cs" width="100"/>
6
<img src="./assets/illuin.png" alt="illuin" width="100"/>
7
</p>
8
9
This repository contains our code for a competition organised by Centrale Supelec and Illuin Technology. It is possible to learn more about the tasks by looking at the content of the folder `Explication dataset` or by looking at the final presentation in `presentation`.
10
11
The competitions contained 2 parts, the first contains 3 tasks of NER, NLI, text classification... and the second is the creation of a search engine capable of finding patients based on filters and a search Query.
12
![simple demo](./assets/simple_demo.gif)
13
14
15
# Mainly used technologies
16
 - Transformers library by HuggingFace
17
 - Scibert
18
 - Biobert
19
 - Electramed
20
 - MiniLM-L6
21
 - Streamlit
22
 - Flask
23
 - Annoy
24
# How to use 
25
## Evaluation
26
First, we need to download the submodule for evaluation :
27
```bash
28
$ git submodule init
29
$ git submodule update
30
```
31
32
## Build dataset
33
You can find the data here https://portal.dbmi.hms.harvard.edu/projects/n2c2-nlp/
34
35
First we need to have the initial data as follows :
36
37
```bash
38
medical_txt_parser
39
    ├── Explication dataset/
40
    ├── train_data/
41
        ├── beth/
42
            ├── ast/
43
                ...
44
                └── record-13.ast
45
            ├── concept
46
                ...
47
                └── record-13.con
48
            ├── rel
49
                ...
50
                └── record-13.rel
51
            └── txt
52
                ...
53
                └── record-13.txt
54
        └── partners/
55
            ├── ast/
56
                ...
57
                └── record-10.ast
58
            ├── concept
59
                ...
60
                └── record-10.con
61
            ├── rel
62
                ...
63
                └── record-10.rel
64
            └── txt
65
                ...
66
                └── record-10.txt
67
    
68
    └── src/                
69
```
70
71
Then execute the following command to build the dataset from the root of the project:
72
73
```bash
74
$ ./src/data_merger.sh
75
```
76
77
To prepare the embeddings and clusters for the search API:
78
```bash
79
$ cd src
80
$ python -m clustering.prepare_embeddings
81
```
82
83
To launch the app, start in the root directory of the project by executing :
84
```
85
$ python src/api.py
86
$ streamlit run app/search_engine.py
87
```