|
a/README.md |
|
b/README.md |
1 |
# BrixIA COVID-19 Project |
1 |
# BrixIA COVID-19 Project |
2 |
|
2 |
|
3 |
## What do you find here |
3 |
## What do you find here
|
4 |
Info, code (BS-Net), link to data (BrixIA COVID-19 Dataset annotated with Brixia-score), and additional material related to the [BrixIA COVID-19 Project](https://brixia.github.io/) |
4 |
Info, code (BS-Net), link to data (BrixIA COVID-19 Dataset annotated with Brixia-score), and additional material related to the [BrixIA COVID-19 Project](https://brixia.github.io/) |
5 |
|
5 |
|
6 |
## Defs |
6 |
## Defs |
7 |
|
7 |
|
8 |
BrixIA COVID-19 Project: [go to the webpage](https://brixia.github.io/) |
8 |
BrixIA COVID-19 Project: [go to the webpage](https://brixia.github.io/)
|
9 |
Brixia score: a multi-regional score for Chest X-ray (CXR) conveying the degree of lung compromise in COVID-19 patients |
9 |
Brixia score: a multi-regional score for Chest X-ray (CXR) conveying the degree of lung compromise in COVID-19 patients |
10 |
|
10 |
|
11 |
BS-Net: an end-to-end multi-network learning architecture for semiquantitative rating of COVID-19 severity on Chest X-rays |
11 |
BS-Net: an end-to-end multi-network learning architecture for semiquantitative rating of COVID-19 severity on Chest X-rays |
12 |
|
12 |
|
13 |
BrixIA COVID-19 Dataset: 4703 CXRs of COVID-19 patients (anonymized) in DICOM format with manually annotated Brixia score |
13 |
BrixIA COVID-19 Dataset: 4703 CXRs of COVID-19 patients (anonymized) in DICOM format with manually annotated Brixia score |
14 |
|
14 |
|
15 |
## Project paper |
15 |
## Project paper
|
16 |
Preprint avaible [here](https://arxiv.org/abs/2006.04603) |
16 |
Preprint avaible [here](https://arxiv.org/abs/2006.04603)
|
17 |
``` |
17 |
```
|
18 |
@article{SIGNORONI2021102046, |
18 |
@article{SIGNORONI2021102046,
|
19 |
title = {BS-Net: learning COVID-19 pneumonia severity on a large Chest X-Ray dataset}, |
19 |
title = {BS-Net: learning COVID-19 pneumonia severity on a large Chest X-Ray dataset},
|
20 |
journal = {Medical Image Analysis}, |
20 |
journal = {Medical Image Analysis},
|
21 |
pages = {102046}, |
21 |
pages = {102046},
|
22 |
year = {2021}, |
22 |
year = {2021},
|
23 |
issn = {1361-8415}, |
23 |
issn = {1361-8415},
|
24 |
doi = {https://doi.org/10.1016/j.media.2021.102046}, |
24 |
doi = {https://doi.org/10.1016/j.media.2021.102046},
|
25 |
url = {https://www.sciencedirect.com/science/article/pii/S136184152100092X}, |
25 |
url = {https://www.sciencedirect.com/science/article/pii/S136184152100092X},
|
26 |
author = {Alberto Signoroni and Mattia Savardi and Sergio Benini and Nicola Adami and Riccardo Leonardi and Paolo Gibellini and Filippo Vaccher and Marco Ravanelli and Andrea Borghesi and Roberto Maroldi and Davide Farina}, |
26 |
author = {Alberto Signoroni and Mattia Savardi and Sergio Benini and Nicola Adami and Riccardo Leonardi and Paolo Gibellini and Filippo Vaccher and Marco Ravanelli and Andrea Borghesi and Roberto Maroldi and Davide Farina},
|
27 |
}2020} |
27 |
}2020}
|
28 |
} |
28 |
}
|
29 |
``` |
29 |
``` |
30 |
|
30 |
|
31 |
## Overall Scheme |
31 |
## Overall Scheme |
32 |
|
32 |
|
33 |
 |
33 |
 |
34 |
|
34 |
|
35 |
Table of Contents |
35 |
Table of Contents
|
36 |
================= |
36 |
================= |
37 |
|
37 |
|
38 |
* [Data](#Datasets) |
38 |
* [Data](#Datasets)
|
39 |
* [Getting Started](#getting-started) |
39 |
* [Getting Started](#getting-started)
|
40 |
* [License](#license-and-attribution) |
40 |
* [License](#license-and-attribution)
|
41 |
* [Citation](#Citation) |
41 |
* [Citation](#Citation)
|
42 |
|
42 |
|
43 |
## Datasets |
43 |
## Datasets |
44 |
|
44 |
|
45 |
### BrixIA COVID-19 Dataset |
45 |
### BrixIA COVID-19 Dataset
|
46 |
The access and use, for research purposes only, of the annotated BrixIA COVID-19 CXR Dataset have been granted form the Ethical Committee of Brescia (Italy) NP4121 (last update 08/07/2020). |
46 |
The access and use, for research purposes only, of the annotated BrixIA COVID-19 CXR Dataset have been granted form the Ethical Committee of Brescia (Italy) NP4121 (last update 08/07/2020). |
47 |
|
47 |
|
48 |
[**The data can be downloaded from the website https://brixia.github.io/.**](https://brixia.github.io/#get-the-data) |
48 |
[**The data can be downloaded from the website https://brixia.github.io/.**](https://brixia.github.io/#get-the-data) |
49 |
|
49 |
|
50 |
|
50 |
|
51 |
To unpack all the zipped archives, on unix-like system do: |
51 |
To unpack all the zipped archives, on unix-like system do:
|
52 |
1. Download all the files |
52 |
1. Download all the files
|
53 |
2. From the command line call: `cat *.tar.gz.* | tar -xzv` |
53 |
2. From the command line call: `cat *.tar.gz.* | tar -xzv`
|
54 |
3. A folder called dicom_clean will be created with all the unpacked files |
54 |
3. A folder called dicom_clean will be created with all the unpacked files |
55 |
|
55 |
|
56 |
Instead, for MS Window: |
56 |
Instead, for MS Window:
|
57 |
2. `type *.tar.gz.* | tar xvfz -` |
57 |
2. `type *.tar.gz.* | tar xvfz -` |
58 |
|
58 |
|
59 |
**[Update]** We revised the dataset and removed the DICOM found to have acquisition problems (low quality). The total now is 4695. |
59 |
**[Update]** We revised the dataset and removed the DICOM found to have acquisition problems (low quality). The total now is 4695. |
60 |
|
60 |
|
61 |
### Annotation and CXR from Cohen's dataset |
61 |
### Annotation and CXR from Cohen's dataset |
62 |
|
62 |
|
63 |
We exploit the public repository by [Cohen et al.](https://github.com/ieee8023/covid-chestxray-dataset) which contains CXR images (We downloaded a copy on May 11th, 2020). |
63 |
We exploit the public repository by [Cohen et al.](https://github.com/ieee8023/covid-chestxray-dataset) which contains CXR images (We downloaded a copy on May 11th, 2020). |
64 |
|
64 |
|
65 |
In order to contribute to such public dataset, two expert radiologists, a board-certified staff member and a trainee with 22 and 2 years of experience respectively, produced the related Brixia-score annotations for CXR in this collection, exploiting [labelbox](https://labelbox.com), an online solution for labelling. After discarding problematic cases (e.g., images with a significant portion missing, too small resolution, the impossibility of scoring for external reasons, etc.), the final dataset is composed of 192 CXR, completely annotated according to the Brixia-score system. |
65 |
In order to contribute to such public dataset, two expert radiologists, a board-certified staff member and a trainee with 22 and 2 years of experience respectively, produced the related Brixia-score annotations for CXR in this collection, exploiting [labelbox](https://labelbox.com), an online solution for labelling. After discarding problematic cases (e.g., images with a significant portion missing, too small resolution, the impossibility of scoring for external reasons, etc.), the final dataset is composed of 192 CXR, completely annotated according to the Brixia-score system. |
66 |
|
66 |
|
67 |
|
67 |
|
68 |
*Below a list of each field in the [annotation csv](data/public-annotations.csv), with explanations where relevant* |
68 |
*Below a list of each field in the [annotation csv](data/public-annotations.csv), with explanations where relevant*
|
69 |
<details> |
69 |
<details>
|
70 |
<summary>Scheme</summary> |
70 |
<summary>Scheme</summary> |
71 |
|
71 |
|
72 |
| Attribute | Description | |
72 |
| Attribute | Description |
|
73 |
|------|-----| |
73 |
|------|-----|
|
74 |
| filename | filename from Cohen dataset | |
74 |
| filename | filename from Cohen dataset |
|
75 |
| from S-A to S-F | The 6 regions annotatated by a Senior radiologist (+20yr expertise), from 0 to 3 |
75 |
| from S-A to S-F | The 6 regions annotatated by a Senior radiologist (+20yr expertise), from 0 to 3
|
76 |
| S-Global | Global score by the Senior radiologist (sum of S-A : S-F), from 0 to 18 |
76 |
| S-Global | Global score by the Senior radiologist (sum of S-A : S-F), from 0 to 18
|
77 |
| from J-A to J-F | The 6 regions annotatated by a Junior radiologist (+2yr expertise), from 0 to 3 |
77 |
| from J-A to J-F | The 6 regions annotatated by a Junior radiologist (+2yr expertise), from 0 to 3
|
78 |
| J-Global | Global score by the Junior radiologist (sum of S-A : S-F), from 0 to 18 |
78 |
| J-Global | Global score by the Junior radiologist (sum of S-A : S-F), from 0 to 18
|
79 |
</details> |
79 |
</details> |
80 |
|
80 |
|
81 |
|
81 |
|
82 |
### Segmentation Dataset |
82 |
### Segmentation Dataset
|
83 |
We provide the script to prepare the dataset as described in the Project paper. |
83 |
We provide the script to prepare the dataset as described in the Project paper. |
84 |
|
84 |
|
85 |
We exploit different segmentation datasets in order to pre-train the extended-Unet module of the proposed architecture. We used the original training/test set splitting when present (as the case of the JSRT database), otherwise we took the first 50 images as test set, and the remaining as training set (see Table below). |
85 |
We exploit different segmentation datasets in order to pre-train the extended-Unet module of the proposed architecture. We used the original training/test set splitting when present (as the case of the JSRT database), otherwise we took the first 50 images as test set, and the remaining as training set (see Table below). |
86 |
|
86 |
|
87 |
<details> |
87 |
<details>
|
88 |
<summary>Table</summary> |
88 |
<summary>Table</summary> |
89 |
|
89 |
|
90 |
| | Training-set | Test-set | Split | |
90 |
| | Training-set | Test-set | Split |
|
91 |
|------|-----|-----|-----| |
91 |
|------|-----|-----|-----|
|
92 |
|[Montgomery County](https://ceb.nlm.nih.gov/repositories/tuberculosis-chest-x-ray-image-data-sets/) | 88 | 50 | first 50 | |
92 |
|[Montgomery County](https://ceb.nlm.nih.gov/repositories/tuberculosis-chest-x-ray-image-data-sets/) | 88 | 50 | first 50 |
|
93 |
| [Shenzhen Hospital](https://arxiv.org/abs/1803.01199) | 516 | 50 | first 50 | |
93 |
| [Shenzhen Hospital](https://arxiv.org/abs/1803.01199) | 516 | 50 | first 50 |
|
94 |
| [JSRT database](http://db.jsrt.or.jp/eng.php) | 124 | 123 | original | |
94 |
| [JSRT database](http://db.jsrt.or.jp/eng.php) | 124 | 123 | original |
|
95 |
|------|-----|-----|-----| |
95 |
|------|-----|-----|-----|
|
96 |
|Total | 728 |223 || |
96 |
|Total | 728 |223 ||
|
97 |
</details> |
97 |
</details> |
98 |
|
98 |
|
99 |
The data can be downloaded from their respective sites. |
99 |
The data can be downloaded from their respective sites. |
100 |
|
100 |
|
101 |
|
101 |
|
102 |
### Alignment synthetic dataset |
102 |
### Alignment synthetic dataset
|
103 |
To avoid the inclusion of anatomical parts not belonging to the lungs in the AI pipeline, which would increase the task complexity or introduce unwanted biases, we integrated into the pipeline an alignment block. This exploits a synthetic dataset (used for on-line augmentation) composed of artificially transformed images from the segmentation dataset (see Table below), including random rotations, shifts, and zooms, which is used in the pre-training phase. |
103 |
To avoid the inclusion of anatomical parts not belonging to the lungs in the AI pipeline, which would increase the task complexity or introduce unwanted biases, we integrated into the pipeline an alignment block. This exploits a synthetic dataset (used for on-line augmentation) composed of artificially transformed images from the segmentation dataset (see Table below), including random rotations, shifts, and zooms, which is used in the pre-training phase. |
104 |
|
104 |
|
105 |
The parameters refer to the implementation in Albumentation. In the last column is expressed the probability of the specific transformation being applied. |
105 |
The parameters refer to the implementation in Albumentation. In the last column is expressed the probability of the specific transformation being applied. |
106 |
|
106 |
|
107 |
<details> |
107 |
<details>
|
108 |
<summary>Additional details</summary> |
108 |
<summary>Additional details</summary> |
109 |
|
109 |
|
110 |
| | Parameters (up to) | Probability | |
110 |
| | Parameters (up to) | Probability |
|
111 |
|----|-----|-----| |
111 |
|----|-----|-----|
|
112 |
|Rotation | 25 degree | 0.8 | |
112 |
|Rotation | 25 degree | 0.8 |
|
113 |
|Scale | 10% | 0.8 | |
113 |
|Scale | 10% | 0.8 |
|
114 |
|Shift | 10% | 0.8 | |
114 |
|Shift | 10% | 0.8 |
|
115 |
|Elastic transformation | alpha=60, sigma=12 | 0.2 | |
115 |
|Elastic transformation | alpha=60, sigma=12 | 0.2 |
|
116 |
|Grid distortion | steps=5, limit=0.3 | 0.2 | |
116 |
|Grid distortion | steps=5, limit=0.3 | 0.2 |
|
117 |
|Optical distortion | distort=0.2, shift=0.05 | 0.2 | |
117 |
|Optical distortion | distort=0.2, shift=0.05 | 0.2 |
|
118 |
</details> |
118 |
</details> |
119 |
|
119 |
|
120 |
|
120 |
|
121 |
## Getting Started |
121 |
## Getting Started |
122 |
|
122 |
|
123 |
### Install Dependencies |
123 |
### Install Dependencies |
124 |
|
124 |
|
125 |
The provided code is written for Python 3.x. To install the needed requirements run: |
125 |
The provided code is written for Python 3.x. To install the needed requirements run:
|
126 |
``` |
126 |
```
|
127 |
pip install -r requirements.txt |
127 |
pip install -r requirements.txt
|
128 |
``` |
128 |
```
|
129 |
For the sake of performance, we suggest to install `tensorflow-gpu` in place of the standard CPU version. |
129 |
For the sake of performance, we suggest to install `tensorflow-gpu` in place of the standard CPU version. |
130 |
|
130 |
|
131 |
Include the `src` folder in your python library path or launch python from that folder. |
131 |
Include the `src` folder in your python library path or launch python from that folder. |
132 |
|
132 |
|
133 |
### Load Cohen dataset with BrixiaScore annotations |
133 |
### Load Cohen dataset with BrixiaScore annotations
|
134 |
```python |
134 |
```python
|
135 |
from datasets import brixiascore_cohen as bsc |
135 |
from datasets import brixiascore_cohen as bsc |
136 |
|
136 |
|
137 |
# Check the docsting for additional info |
137 |
# Check the docsting for additional info
|
138 |
X_train, X_test, y_train, y_test = bsc.get_data() |
138 |
X_train, X_test, y_train, y_test = bsc.get_data()
|
139 |
``` |
139 |
```
|
140 |
### Prepare and load the segmentation dataset |
140 |
### Prepare and load the segmentation dataset |
141 |
|
141 |
|
142 |
To prepare the segmentation dataset either `Montgomery County`, `Shenzhen Hospital`, and `JSRT` datasets must be |
142 |
To prepare the segmentation dataset either `Montgomery County`, `Shenzhen Hospital`, and `JSRT` datasets must be
|
143 |
downloaded from their websites and unpacked in a folder (for instance `data/sources/`). Than execute: |
143 |
downloaded from their websites and unpacked in a folder (for instance `data/sources/`). Than execute:
|
144 |
```bash |
144 |
```bash
|
145 |
python3 -m datasets.lung_segmentation --input_folder data/sources/ --target_size 512 |
145 |
python3 -m datasets.lung_segmentation --input_folder data/sources/ --target_size 512
|
146 |
``` |
146 |
```
|
147 |
or just import it (the first time it is executed, it will create the segmentation dataset): |
147 |
or just import it (the first time it is executed, it will create the segmentation dataset): |
148 |
|
148 |
|
149 |
```python |
149 |
```python
|
150 |
from datasets import lung_segmentation as ls |
150 |
from datasets import lung_segmentation as ls |
151 |
|
151 |
|
152 |
# Check the docsting for additional info. The train-set is provided as a generator, while the validation set is |
152 |
# Check the docsting for additional info. The train-set is provided as a generator, while the validation set is
|
153 |
# preloaded in memory. |
153 |
# preloaded in memory.
|
154 |
# `get_data` accepts a configuration dictionary where you can specify every parameter. See `ls.default_config` |
154 |
# `get_data` accepts a configuration dictionary where you can specify every parameter. See `ls.default_config`
|
155 |
train_gen, (val_imgs, val_masks) = ls.get_data() |
155 |
train_gen, (val_imgs, val_masks) = ls.get_data()
|
156 |
``` |
156 |
``` |
157 |
|
157 |
|
158 |
### Prepare and load the alignment dataset |
158 |
### Prepare and load the alignment dataset
|
159 |
To prepare the alignment dataset, the segmentation one mush be already built (see previous point) |
159 |
To prepare the alignment dataset, the segmentation one mush be already built (see previous point) |
160 |
|
160 |
|
161 |
```python |
161 |
```python
|
162 |
from datasets import synthetic_alignment as sa |
162 |
from datasets import synthetic_alignment as sa |
163 |
|
163 |
|
164 |
# Check the docsting for additional info. The train-set and validation-set are provided as generators |
164 |
# Check the docsting for additional info. The train-set and validation-set are provided as generators
|
165 |
# `get_data` accepts a configuration dictionary where you can specify every parameter. See `ls.default_config` |
165 |
# `get_data` accepts a configuration dictionary where you can specify every parameter. See `ls.default_config`
|
166 |
train_gen, val_gen = sa.get_data() |
166 |
train_gen, val_gen = sa.get_data()
|
167 |
``` |
167 |
``` |
168 |
|
168 |
|
169 |
### Model weights |
169 |
### Model weights |
170 |
|
170 |
|
171 |
The model weight and a demo notebook can be found [here](https://drive.google.com/drive/folders/18PF0xpYd4q_M8CJn7TiO4QXCny1PRgJZ?usp=sharing) |
171 |
The model weight and a demo notebook can be found [here](https://drive.google.com/drive/folders/18PF0xpYd4q_M8CJn7TiO4QXCny1PRgJZ?usp=sharing) |
172 |
|
172 |
|
173 |
### Other steps |
173 |
### Other steps |
174 |
|
174 |
|
175 |
Instructions for preparing and loading the Brixia Covid-19 Dataset and the BS-Net will follow (see specific sections for more info). |
175 |
Instructions for preparing and loading the Brixia Covid-19 Dataset and the BS-Net will follow (see specific sections for more info). |
176 |
|
176 |
|
177 |
|
177 |
|
178 |
## License and Attribution |
178 |
## License and Attribution |
179 |
|
179 |
|
180 |
**Disclaimer** |
180 |
**Disclaimer** |
181 |
|
181 |
|
182 |
The BS-Net model and source code, the BrixIA COVID-19 Dataset, and the Brixia score annotations, are provided "as-is" without any guarantee of correct functionality or guarantee of quality. No formal support for this software will be given to users. It is possible to report issues on GitHub though. This repository and any other part of the BrixIA COVID-19 Project should not be used for medical purposes. In particular this software should not be used to make, support, gain evidence on and aid medical decisions, interventions or diagnoses. Specific terms of use are indicated for each part of the project. |
182 |
The BS-Net model and source code, the BrixIA COVID-19 Dataset, and the Brixia score annotations, are provided "as-is" without any guarantee of correct functionality or guarantee of quality. No formal support for this software will be given to users. It is possible to report issues on GitHub though. This repository and any other part of the BrixIA COVID-19 Project should not be used for medical purposes. In particular this software should not be used to make, support, gain evidence on and aid medical decisions, interventions or diagnoses. Specific terms of use are indicated for each part of the project. |
183 |
|
183 |
|
184 |
### Data |
184 |
### Data |
185 |
|
185 |
|
186 |
- BrixIA COVID-19 dataset: access conditions and term of use are reported on the [dataset website](https://brixia.github.io/#get-the-data). |
186 |
- BrixIA COVID-19 dataset: access conditions and term of use are reported on the [dataset website](https://brixia.github.io/#get-the-data).
|
187 |
- Pulic Cohen dataset: Each image has license specified in the original file by [Cohen's repository](https://github.com/ieee8023/covid-chestxray-dataset) file. Including Apache 2.0, CC BY-NC-SA 4.0, CC BY 4.0. There are additional 7 images from Brescia under a CC BY-NC-SA 4.0 license. |
187 |
- Pulic Cohen dataset: Each image has license specified in the original file by [Cohen's repository](https://github.com/ieee8023/covid-chestxray-dataset) file. Including Apache 2.0, CC BY-NC-SA 4.0, CC BY 4.0. There are additional 7 images from Brescia under a CC BY-NC-SA 4.0 license.
|
188 |
- Brixia-score annotations for the pulic Cohen's dataset are released under a CC BY-NC-SA 4.0 license. |
188 |
- Brixia-score annotations for the pulic Cohen's dataset are released under a CC BY-NC-SA 4.0 license.
|
189 |
|
189 |
|
190 |
### Code |
190 |
### Code |
191 |
|
191 |
|
192 |
- Released under Open Source license. |
192 |
- Released under Open Source license. |
193 |
|
193 |
|
194 |
## Contacts |
194 |
## Contacts |
195 |
|
195 |
|
196 |
Alberto Signoroni alberto.signoroni@unibs.it |
196 |
Alberto Signoroni alberto.signoroni@unibs.it |
197 |
|
197 |
|
198 |
Mattia Savardi m.savardi001@unibs.it |
198 |
Mattia Savardi m.savardi001@unibs.it |
199 |
|
199 |
|
200 |
## Citations |
200 |
## Citations |
201 |
|
201 |
|
202 |
For any use or reference to this project please cite the following paper. |
202 |
For any use or reference to this project please cite the following paper. |
203 |
|
203 |
|
204 |
[NEWS]: this work got accepted at Medical Image Analysis. Available [here](https://doi.org/10.1016/j.media.2021.102046) |
204 |
[NEWS]: this work got accepted at Medical Image Analysis. Available [here](https://doi.org/10.1016/j.media.2021.102046)
|
205 |
``` |
205 |
```
|
206 |
@article{SIGNORONI2021102046, |
206 |
@article{SIGNORONI2021102046,
|
207 |
title = {BS-Net: learning COVID-19 pneumonia severity on a large Chest X-Ray dataset}, |
207 |
title = {BS-Net: learning COVID-19 pneumonia severity on a large Chest X-Ray dataset},
|
208 |
journal = {Medical Image Analysis}, |
208 |
journal = {Medical Image Analysis},
|
209 |
pages = {102046}, |
209 |
pages = {102046},
|
210 |
year = {2021}, |
210 |
year = {2021},
|
211 |
issn = {1361-8415}, |
211 |
issn = {1361-8415},
|
212 |
doi = {https://doi.org/10.1016/j.media.2021.102046}, |
212 |
doi = {https://doi.org/10.1016/j.media.2021.102046},
|
213 |
url = {https://www.sciencedirect.com/science/article/pii/S136184152100092X}, |
213 |
url = {https://www.sciencedirect.com/science/article/pii/S136184152100092X},
|
214 |
author = {Alberto Signoroni and Mattia Savardi and Sergio Benini and Nicola Adami and Riccardo Leonardi and Paolo Gibellini and Filippo Vaccher and Marco Ravanelli and Andrea Borghesi and Roberto Maroldi and Davide Farina}, |
214 |
author = {Alberto Signoroni and Mattia Savardi and Sergio Benini and Nicola Adami and Riccardo Leonardi and Paolo Gibellini and Filippo Vaccher and Marco Ravanelli and Andrea Borghesi and Roberto Maroldi and Davide Farina},
|
215 |
} |
215 |
}
|
216 |
``` |
216 |
```
|