|
a/README.md |
|
b/README.md |
1 |
# Intelligent data extraction from medical reports |
1 |
# Intelligent data extraction from medical reports |
2 |
|
2 |
|
3 |
From a medical repport image (in french) not template based the system extract the following informations using NER and image segmentation: |
3 |
From a medical repport image (in french) not template based the system extract the following informations using NER and image segmentation: |
4 |
|
4 |
|
5 |
- the patient’s name, and date of birth |
5 |
- the patient’s name, and date of birth
|
6 |
- the date of the medical intervention |
6 |
- the date of the medical intervention
|
7 |
- the type of medical intervention (for example : radiology) |
7 |
- the type of medical intervention (for example : radiology)
|
8 |
- the name of the doctor who performed the medical |
8 |
- the name of the doctor who performed the medical
|
9 |
intervention |
9 |
intervention
|
10 |
- the address of the intervention |
10 |
- the address of the intervention
|
11 |
- the referring doctor |
11 |
- the referring doctor |
12 |
|
12 |
|
13 |
The system work as follow : |
13 |
The system work as follow : |
14 |
|
14 |
|
15 |
<section align='center'> |
15 |
<section align='center'>
|
16 |
<img src='images/schema.PNG', height="400"/> |
16 |
<img src='https://github.com/IKetchup/Intelligent_data_extraction_from_medical_reports/blob/main/images/schema.PNG', height="400"/>
|
17 |
</section> |
17 |
</section> |
18 |
|
18 |
|
19 |
## Requirements |
19 |
## Requirements |
20 |
|
20 |
|
21 |
- Tesseract 5.0.0 |
21 |
- Tesseract 5.0.0
|
22 |
- pytesseract 0.3.8 |
22 |
- pytesseract 0.3.8
|
23 |
- NumPy 1.19.5 |
23 |
- NumPy 1.19.5
|
24 |
- OpenCV python 4.5.1.48 |
24 |
- OpenCV python 4.5.1.48
|
25 |
- SpaCy 3.2.0 |
25 |
- SpaCy 3.2.0 |
26 |
|
26 |
|
27 |
|
27 |
|
28 |
## Image segmentation and text extraction |
28 |
## Image segmentation and text extraction |
29 |
|
29 |
|
30 |
The image segmentation and text extraction from image of medical repport is done using the algorithm ACABS (Automatic Cropper and Block Segmenter). For more details about ACABS see the [repport](Intelligent_data_extraction_from_medical_reports.pdf). |
30 |
The image segmentation and text extraction from image of medical repport is done using the algorithm ACABS (Automatic Cropper and Block Segmenter). For more details about ACABS see the [repport](Intelligent_data_extraction_from_medical_reports.pdf). |
31 |
|
31 |
|
32 |
### ACABS |
32 |
### ACABS |
33 |
|
33 |
|
34 |
ACABS first detect and segment the image into block of text. It then select the relevent block text and remove the report's header and footer. Finally the text is extracted from the medical image report thanks to OCR. |
34 |
ACABS first detect and segment the image into block of text. It then select the relevent block text and remove the report's header and footer. Finally the text is extracted from the medical image report thanks to OCR. |
35 |
|
35 |
|
36 |
```python |
36 |
```python
|
37 |
from acabs import ACABS |
37 |
from acabs import ACABS |
38 |
|
38 |
|
39 |
#usage on an image |
39 |
#usage on an image
|
40 |
text = ACABS(path_to_image) |
40 |
text = ACABS(path_to_image) |
41 |
|
41 |
|
42 |
#usage on a folder of image |
42 |
#usage on a folder of image
|
43 |
import os |
43 |
import os |
44 |
|
44 |
|
45 |
texts = '' |
45 |
texts = ''
|
46 |
_, _, filenames = next(os.walk(path_to_folder)) |
46 |
_, _, filenames = next(os.walk(path_to_folder))
|
47 |
os.chdir(path_to_folder) |
47 |
os.chdir(path_to_folder)
|
48 |
for file in filenames: |
48 |
for file in filenames:
|
49 |
text = ACABS(file) |
49 |
text = ACABS(file)
|
50 |
with open('path_to_save_text/' + file.split('.')[0] + '.txt', 'w') as f: |
50 |
with open('path_to_save_text/' + file.split('.')[0] + '.txt', 'w') as f:
|
51 |
f.write(text) |
51 |
f.write(text) |
52 |
|
52 |
|
53 |
texts = texts + '\jump=================== New repport : ' + file + ' ===================\jump' + text |
53 |
texts = texts + '\jump=================== New repport : ' + file + ' ===================\jump' + text
|
54 |
``` |
54 |
```
|
55 |
Visual output of ACABS segmentation: |
55 |
Visual output of ACABS segmentation: |
56 |
|
56 |
|
57 |
<section align='center'> |
57 |
<section align='center'>
|
58 |
<img src='images/acabs_result_fancy.png', height="500"/> |
58 |
<img src='https://github.com/IKetchup/Intelligent_data_extraction_from_medical_reports/blob/main/images/acabs_result_fancy.png', height="500"/>
|
59 |
</section> |
59 |
</section> |
60 |
|
60 |
|
61 |
## Extraction of key information |
61 |
## Extraction of key information |
62 |
|
62 |
|
63 |
After using ACABS to extract the text, the data need to be annotated (like [annotated_text.json](annotated_text.json)). In order to speed up the annotation use a software like [ner-annotator](https://github.com/tecoholic/ner-annotator). |
63 |
After using ACABS to extract the text, the data need to be annotated (like [annotated_text.json](annotated_text.json)). In order to speed up the annotation use a software like [ner-annotator](https://github.com/tecoholic/ner-annotator). |
64 |
|
64 |
|
65 |
Transform the data into a spacy like format using [transform_data.py](transform_data.py). |
65 |
Transform the data into a spacy like format using [transform_data.py](transform_data.py). |
66 |
|
66 |
|
67 |
### Train a model |
67 |
### Train a model |
68 |
|
68 |
|
69 |
Use [config.cfg](config.cfg) to customize the NER model. |
69 |
Use [config.cfg](config.cfg) to customize the NER model. |
70 |
|
70 |
|
71 |
- Verify the data: |
71 |
- Verify the data:
|
72 |
```bash |
72 |
```bash
|
73 |
python -m spacy debug data ./config.cfg --paths.train ./train.spacy --paths.dev ./dev.spacy |
73 |
python -m spacy debug data ./config.cfg --paths.train ./train.spacy --paths.dev ./dev.spacy
|
74 |
``` |
74 |
``` |
75 |
|
75 |
|
76 |
- Train the model: |
76 |
- Train the model:
|
77 |
```bash |
77 |
```bash
|
78 |
python -m spacy train config.cfg --output ./output --paths.train ./train.spacy --paths.dev ../dev.spacy --gpu-id 1 |
78 |
python -m spacy train config.cfg --output ./output --paths.train ./train.spacy --paths.dev ../dev.spacy --gpu-id 1
|
79 |
``` |
79 |
``` |
80 |
|
80 |
|
81 |
- Evaluate the model: |
81 |
- Evaluate the model:
|
82 |
```bash |
82 |
```bash
|
83 |
python -m spacy evaluate ./output/model-best ./dev.spacy |
83 |
python -m spacy evaluate ./output/model-best ./dev.spacy
|
84 |
``` |
84 |
``` |
85 |
|
85 |
|
86 |
### Use a model |
86 |
### Use a model |
87 |
|
87 |
|
88 |
See [predictions.ipynb](predictions.ipynb) |
88 |
See [predictions.ipynb](predictions.ipynb) |
89 |
|
89 |
|
90 |
<section align='center'> |
90 |
<section align='center'>
|
91 |
<img src='images/pred.png'/> |
91 |
<img src='https://github.com/IKetchup/Intelligent_data_extraction_from_medical_reports/blob/main/images/pred.png'/>
|
92 |
</section> |
92 |
</section>
|