Switch to unified view

a/README.md b/README.md
1
# Intelligent data extraction from medical reports
1
# Intelligent data extraction from medical reports
2
2
3
From a medical repport image (in french) not template based the system extract the following informations using NER and image segmentation: 
3
From a medical repport image (in french) not template based the system extract the following informations using NER and image segmentation: 
4
4
5
- the patient’s name, and date of birth
5
- the patient’s name, and date of birth
6
- the date of the medical intervention
6
- the date of the medical intervention
7
- the type of medical intervention (for example : radiology)
7
- the type of medical intervention (for example : radiology)
8
- the name of the doctor who performed the medical
8
- the name of the doctor who performed the medical
9
intervention
9
intervention
10
- the address of the intervention
10
- the address of the intervention
11
- the referring doctor
11
- the referring doctor
12
12
13
The system work as follow : 
13
The system work as follow : 
14
14
15
<section align='center'>
15
<section align='center'>
16
    <img src='images/schema.PNG', height="400"/>
16
    <img src='https://github.com/IKetchup/Intelligent_data_extraction_from_medical_reports/blob/main/images/schema.PNG', height="400"/>
17
</section>
17
</section>
18
18
19
## Requirements
19
## Requirements
20
20
21
- Tesseract 5.0.0
21
- Tesseract 5.0.0
22
- pytesseract 0.3.8
22
- pytesseract 0.3.8
23
- NumPy 1.19.5
23
- NumPy 1.19.5
24
- OpenCV python 4.5.1.48
24
- OpenCV python 4.5.1.48
25
- SpaCy 3.2.0
25
- SpaCy 3.2.0
26
26
27
27
28
## Image segmentation and text extraction
28
## Image segmentation and text extraction
29
29
30
The image segmentation and text extraction from image of medical repport is done using the algorithm ACABS (Automatic Cropper and Block Segmenter). For more details about ACABS see the [repport](Intelligent_data_extraction_from_medical_reports.pdf).
30
The image segmentation and text extraction from image of medical repport is done using the algorithm ACABS (Automatic Cropper and Block Segmenter). For more details about ACABS see the [repport](Intelligent_data_extraction_from_medical_reports.pdf).
31
31
32
### ACABS
32
### ACABS
33
33
34
ACABS first detect and segment the image into block of text. It then select the relevent block text and remove the report's header and footer. Finally the text is extracted from the medical image report thanks to OCR.
34
ACABS first detect and segment the image into block of text. It then select the relevent block text and remove the report's header and footer. Finally the text is extracted from the medical image report thanks to OCR.
35
35
36
```python
36
```python
37
from acabs import ACABS
37
from acabs import ACABS
38
38
39
#usage on an image
39
#usage on an image
40
text = ACABS(path_to_image)
40
text = ACABS(path_to_image)
41
41
42
#usage on a folder of image
42
#usage on a folder of image
43
import os
43
import os
44
44
45
texts = ''
45
texts = ''
46
_, _, filenames = next(os.walk(path_to_folder))
46
_, _, filenames = next(os.walk(path_to_folder))
47
os.chdir(path_to_folder)
47
os.chdir(path_to_folder)
48
for file in filenames:
48
for file in filenames:
49
    text = ACABS(file)
49
    text = ACABS(file)
50
        with open('path_to_save_text/' + file.split('.')[0] + '.txt', 'w') as f:
50
        with open('path_to_save_text/' + file.split('.')[0] + '.txt', 'w') as f:
51
        f.write(text)
51
        f.write(text)
52
52
53
    texts = texts + '\jump=================== New repport : ' + file + ' ===================\jump' + text
53
    texts = texts + '\jump=================== New repport : ' + file + ' ===================\jump' + text
54
```
54
```
55
Visual output of ACABS segmentation: 
55
Visual output of ACABS segmentation: 
56
56
57
<section align='center'>
57
<section align='center'>
58
    <img src='images/acabs_result_fancy.png', height="500"/>
58
    <img src='https://github.com/IKetchup/Intelligent_data_extraction_from_medical_reports/blob/main/images/acabs_result_fancy.png', height="500"/>
59
</section>
59
</section>
60
60
61
## Extraction of key information
61
## Extraction of key information
62
62
63
After using ACABS to extract the text, the data need to be annotated (like [annotated_text.json](annotated_text.json)). In order to speed up the annotation use a software like  [ner-annotator](https://github.com/tecoholic/ner-annotator).
63
After using ACABS to extract the text, the data need to be annotated (like [annotated_text.json](annotated_text.json)). In order to speed up the annotation use a software like  [ner-annotator](https://github.com/tecoholic/ner-annotator).
64
64
65
Transform the data into a spacy like format using [transform_data.py](transform_data.py).
65
Transform the data into a spacy like format using [transform_data.py](transform_data.py).
66
66
67
### Train a model
67
### Train a model
68
68
69
Use [config.cfg](config.cfg) to customize the NER model.
69
Use [config.cfg](config.cfg) to customize the NER model.
70
70
71
- Verify the data: 
71
- Verify the data: 
72
```bash 
72
```bash 
73
python -m spacy debug data ./config.cfg --paths.train ./train.spacy --paths.dev ./dev.spacy
73
python -m spacy debug data ./config.cfg --paths.train ./train.spacy --paths.dev ./dev.spacy
74
```
74
```
75
75
76
- Train the model: 
76
- Train the model: 
77
```bash 
77
```bash 
78
python -m spacy train config.cfg --output ./output --paths.train ./train.spacy --paths.dev ../dev.spacy --gpu-id 1
78
python -m spacy train config.cfg --output ./output --paths.train ./train.spacy --paths.dev ../dev.spacy --gpu-id 1
79
```
79
```
80
80
81
- Evaluate the model: 
81
- Evaluate the model: 
82
```bash 
82
```bash 
83
python -m spacy evaluate ./output/model-best ./dev.spacy
83
python -m spacy evaluate ./output/model-best ./dev.spacy
84
```
84
```
85
85
86
### Use a model
86
### Use a model
87
87
88
See [predictions.ipynb](predictions.ipynb)
88
See [predictions.ipynb](predictions.ipynb)
89
89
90
<section align='center'>
90
<section align='center'>
91
    <img src='images/pred.png'/>
91
    <img src='https://github.com/IKetchup/Intelligent_data_extraction_from_medical_reports/blob/main/images/pred.png'/>
92
</section>
92
</section>