Switch to unified view

a/README.md b/README.md
1
# BrixIA COVID-19 Project
1
# BrixIA COVID-19 Project
2
2
3
## What do you find here
3
## What do you find here
4
Info, code (BS-Net), link to data (BrixIA COVID-19 Dataset annotated with Brixia-score), and additional material related to the [BrixIA COVID-19 Project](https://brixia.github.io/)
4
Info, code (BS-Net), link to data (BrixIA COVID-19 Dataset annotated with Brixia-score), and additional material related to the [BrixIA COVID-19 Project](https://brixia.github.io/)
5
5
6
## Defs
6
## Defs
7
7
8
BrixIA COVID-19 Project: [go to the webpage](https://brixia.github.io/)
8
BrixIA COVID-19 Project: [go to the webpage](https://brixia.github.io/)
9
Brixia score: a multi-regional score for Chest X-ray (CXR) conveying the degree of lung compromise in COVID-19 patients
9
Brixia score: a multi-regional score for Chest X-ray (CXR) conveying the degree of lung compromise in COVID-19 patients
10
10
11
BS-Net: an end-to-end multi-network learning architecture for semiquantitative rating of COVID-19 severity on Chest X-rays
11
BS-Net: an end-to-end multi-network learning architecture for semiquantitative rating of COVID-19 severity on Chest X-rays
12
12
13
BrixIA COVID-19 Dataset: 4703 CXRs of COVID-19 patients (anonymized) in DICOM format with manually annotated Brixia score
13
BrixIA COVID-19 Dataset: 4703 CXRs of COVID-19 patients (anonymized) in DICOM format with manually annotated Brixia score
14
14
15
## Project paper
15
## Project paper
16
Preprint avaible [here](https://arxiv.org/abs/2006.04603)
16
Preprint avaible [here](https://arxiv.org/abs/2006.04603)
17
```
17
```
18
@article{SIGNORONI2021102046,
18
@article{SIGNORONI2021102046,
19
title = {BS-Net: learning COVID-19 pneumonia severity on a large Chest X-Ray dataset},
19
title = {BS-Net: learning COVID-19 pneumonia severity on a large Chest X-Ray dataset},
20
journal = {Medical Image Analysis},
20
journal = {Medical Image Analysis},
21
pages = {102046},
21
pages = {102046},
22
year = {2021},
22
year = {2021},
23
issn = {1361-8415},
23
issn = {1361-8415},
24
doi = {https://doi.org/10.1016/j.media.2021.102046},
24
doi = {https://doi.org/10.1016/j.media.2021.102046},
25
url = {https://www.sciencedirect.com/science/article/pii/S136184152100092X},
25
url = {https://www.sciencedirect.com/science/article/pii/S136184152100092X},
26
author = {Alberto Signoroni and Mattia Savardi and Sergio Benini and Nicola Adami and Riccardo Leonardi and Paolo Gibellini and Filippo Vaccher and Marco Ravanelli and Andrea Borghesi and Roberto Maroldi and Davide Farina},
26
author = {Alberto Signoroni and Mattia Savardi and Sergio Benini and Nicola Adami and Riccardo Leonardi and Paolo Gibellini and Filippo Vaccher and Marco Ravanelli and Andrea Borghesi and Roberto Maroldi and Davide Farina},
27
}2020}
27
}2020}
28
}
28
}
29
```
29
```
30
30
31
## Overall Scheme
31
## Overall Scheme
32
32
33
![Global flowchart](figures/global-flowchart.png "Global flowchart")
33
![Global flowchart](figures/global-flowchart.png?format=raw "Global flowchart")
34
34
35
Table of Contents
35
Table of Contents
36
=================
36
=================
37
37
38
  * [Data](#Datasets)
38
  * [Data](#Datasets)
39
  * [Getting Started](#getting-started)
39
  * [Getting Started](#getting-started)
40
  * [License](#license-and-attribution)
40
  * [License](#license-and-attribution)
41
  * [Citation](#Citation)
41
  * [Citation](#Citation)
42
  
42
  
43
## Datasets
43
## Datasets
44
44
45
### BrixIA COVID-19 Dataset
45
### BrixIA COVID-19 Dataset
46
The access and use, for research purposes only, of the annotated BrixIA COVID-19 CXR Dataset have been granted form the Ethical Committee of Brescia (Italy) NP4121 (last update 08/07/2020).
46
The access and use, for research purposes only, of the annotated BrixIA COVID-19 CXR Dataset have been granted form the Ethical Committee of Brescia (Italy) NP4121 (last update 08/07/2020).
47
47
48
[**The data can be downloaded from the website https://brixia.github.io/.**](https://brixia.github.io/#get-the-data)
48
[**The data can be downloaded from the website https://brixia.github.io/.**](https://brixia.github.io/#get-the-data)
49
49
50
50
51
To unpack all the zipped archives, on unix-like system do:
51
To unpack all the zipped archives, on unix-like system do:
52
1. Download all the files
52
1. Download all the files
53
2. From the command line call:  `cat *.tar.gz.* | tar -xzv`
53
2. From the command line call:  `cat *.tar.gz.* | tar -xzv`
54
3. A folder called dicom_clean will be created with all the unpacked files
54
3. A folder called dicom_clean will be created with all the unpacked files
55
55
56
Instead, for MS Window:
56
Instead, for MS Window:
57
2. `type *.tar.gz.* | tar xvfz -`
57
2. `type *.tar.gz.* | tar xvfz -`
58
58
59
**[Update]** We revised the dataset and removed the DICOM found to have acquisition problems (low quality). The total now is 4695.
59
**[Update]** We revised the dataset and removed the DICOM found to have acquisition problems (low quality). The total now is 4695.
60
60
61
### Annotation and CXR from Cohen's dataset
61
### Annotation and CXR from Cohen's dataset
62
62
63
We exploit the public repository by [Cohen et al.](https://github.com/ieee8023/covid-chestxray-dataset) which contains CXR images (We downloaded a copy on May 11th, 2020).
63
We exploit the public repository by [Cohen et al.](https://github.com/ieee8023/covid-chestxray-dataset) which contains CXR images (We downloaded a copy on May 11th, 2020).
64
64
65
In order to contribute to such public dataset, two expert radiologists, a board-certified staff member and a trainee with 22 and 2 years of experience respectively, produced the related Brixia-score annotations for CXR in this collection, exploiting [labelbox](https://labelbox.com), an online solution for labelling. After discarding problematic cases (e.g., images with a significant portion missing, too small resolution, the impossibility of scoring for external reasons, etc.), the final dataset is composed of 192 CXR, completely annotated according to the Brixia-score system.
65
In order to contribute to such public dataset, two expert radiologists, a board-certified staff member and a trainee with 22 and 2 years of experience respectively, produced the related Brixia-score annotations for CXR in this collection, exploiting [labelbox](https://labelbox.com), an online solution for labelling. After discarding problematic cases (e.g., images with a significant portion missing, too small resolution, the impossibility of scoring for external reasons, etc.), the final dataset is composed of 192 CXR, completely annotated according to the Brixia-score system.
66
66
67
67
68
*Below a list of each field in the [annotation csv](data/public-annotations.csv), with explanations where relevant*
68
*Below a list of each field in the [annotation csv](data/public-annotations.csv), with explanations where relevant*
69
<details>
69
<details>
70
 <summary>Scheme</summary>
70
 <summary>Scheme</summary>
71
71
72
| Attribute | Description |
72
| Attribute | Description |
73
|------|-----|
73
|------|-----|
74
| filename | filename from Cohen dataset |
74
| filename | filename from Cohen dataset |
75
| from S-A to S-F | The 6 regions annotatated by a Senior radiologist (+20yr expertise), from 0 to 3 
75
| from S-A to S-F | The 6 regions annotatated by a Senior radiologist (+20yr expertise), from 0 to 3 
76
| S-Global | Global score by the Senior radiologist (sum of S-A : S-F),  from 0 to 18
76
| S-Global | Global score by the Senior radiologist (sum of S-A : S-F),  from 0 to 18
77
| from J-A to J-F | The 6 regions annotatated by a Junior radiologist (+2yr expertise),  from 0 to 3 
77
| from J-A to J-F | The 6 regions annotatated by a Junior radiologist (+2yr expertise),  from 0 to 3 
78
| J-Global | Global score by the Junior radiologist (sum of S-A : S-F),  from 0 to 18
78
| J-Global | Global score by the Junior radiologist (sum of S-A : S-F),  from 0 to 18
79
</details>
79
</details>
80
80
81
81
82
### Segmentation Dataset
82
### Segmentation Dataset
83
We provide the script to prepare the dataset as described in the Project paper. 
83
We provide the script to prepare the dataset as described in the Project paper. 
84
84
85
We exploit different segmentation datasets in order to pre-train the extended-Unet module of the proposed architecture. We used the original training/test set splitting when present (as the case of the JSRT database), otherwise we took the first 50 images as test set, and the remaining as training set (see Table below).
85
We exploit different segmentation datasets in order to pre-train the extended-Unet module of the proposed architecture. We used the original training/test set splitting when present (as the case of the JSRT database), otherwise we took the first 50 images as test set, and the remaining as training set (see Table below).
86
86
87
<details>
87
<details>
88
 <summary>Table</summary>
88
 <summary>Table</summary>
89
89
90
|  | Training-set |  Test-set | Split | 
90
|  | Training-set |  Test-set | Split | 
91
|------|-----|-----|-----|
91
|------|-----|-----|-----|
92
|[Montgomery County](https://ceb.nlm.nih.gov/repositories/tuberculosis-chest-x-ray-image-data-sets/) | 88           | 50       | first 50 |      
92
|[Montgomery County](https://ceb.nlm.nih.gov/repositories/tuberculosis-chest-x-ray-image-data-sets/) | 88           | 50       | first 50 |      
93
| [Shenzhen Hospital](https://arxiv.org/abs/1803.01199) | 516          | 50       | first 50 |
93
| [Shenzhen Hospital](https://arxiv.org/abs/1803.01199) | 516          | 50       | first 50 |
94
| [JSRT database](http://db.jsrt.or.jp/eng.php)     | 124          | 123      | original |
94
| [JSRT database](http://db.jsrt.or.jp/eng.php)     | 124          | 123      | original |
95
|------|-----|-----|-----|
95
|------|-----|-----|-----|
96
|Total             | 728           |223      ||
96
|Total             | 728           |223      ||
97
</details>
97
</details>
98
98
99
The data can be downloaded from their respective sites.
99
The data can be downloaded from their respective sites.
100
100
101
101
102
### Alignment synthetic dataset
102
### Alignment synthetic dataset
103
To avoid the inclusion of anatomical parts not belonging to the lungs in the AI pipeline, which would increase the task complexity or introduce unwanted biases, we integrated into the pipeline an alignment block. This exploits a synthetic dataset (used for on-line augmentation) composed of artificially transformed images from the segmentation dataset (see Table below), including random rotations, shifts, and zooms, which is used in the pre-training phase.
103
To avoid the inclusion of anatomical parts not belonging to the lungs in the AI pipeline, which would increase the task complexity or introduce unwanted biases, we integrated into the pipeline an alignment block. This exploits a synthetic dataset (used for on-line augmentation) composed of artificially transformed images from the segmentation dataset (see Table below), including random rotations, shifts, and zooms, which is used in the pre-training phase.
104
104
105
The parameters refer to the implementation in Albumentation. In the last column is expressed the probability of the specific transformation being applied.
105
The parameters refer to the implementation in Albumentation. In the last column is expressed the probability of the specific transformation being applied.
106
106
107
<details>
107
<details>
108
 <summary>Additional details</summary>
108
 <summary>Additional details</summary>
109
109
110
|    | Parameters (up to) | Probability |
110
|    | Parameters (up to) | Probability |
111
|----|-----|-----|
111
|----|-----|-----|
112
|Rotation | 25 degree  |    0.8  |
112
|Rotation | 25 degree  |    0.8  |
113
|Scale    | 10%          | 0.8   |
113
|Scale    | 10%          | 0.8   |
114
|Shift     | 10%           | 0.8 |
114
|Shift     | 10%           | 0.8 |
115
|Elastic transformation  | alpha=60, sigma=12  |   0.2   |
115
|Elastic transformation  | alpha=60, sigma=12  |   0.2   |
116
|Grid distortion     | steps=5, limit=0.3 |    0.2  |
116
|Grid distortion     | steps=5, limit=0.3 |    0.2  |
117
|Optical distortion     | distort=0.2, shift=0.05    |     0.2   |
117
|Optical distortion     | distort=0.2, shift=0.05    |     0.2   |
118
</details>
118
</details>
119
119
120
120
121
## Getting Started
121
## Getting Started
122
122
123
### Install Dependencies
123
### Install Dependencies
124
124
125
The provided code is written for Python 3.x. To install the needed requirements run:
125
The provided code is written for Python 3.x. To install the needed requirements run:
126
```
126
```
127
pip install -r requirements.txt
127
pip install -r requirements.txt
128
```
128
```
129
For the sake of performance, we suggest to install `tensorflow-gpu` in place of the standard CPU version.
129
For the sake of performance, we suggest to install `tensorflow-gpu` in place of the standard CPU version.
130
130
131
Include the `src` folder in your python library path or launch python from that folder.
131
Include the `src` folder in your python library path or launch python from that folder.
132
132
133
### Load Cohen dataset with BrixiaScore annotations
133
### Load Cohen dataset with BrixiaScore annotations
134
```python
134
```python
135
from datasets import brixiascore_cohen  as bsc
135
from datasets import brixiascore_cohen  as bsc
136
136
137
# Check the docsting for additional info
137
# Check the docsting for additional info
138
X_train, X_test, y_train, y_test = bsc.get_data()
138
X_train, X_test, y_train, y_test = bsc.get_data()
139
```
139
```
140
### Prepare and load the segmentation dataset
140
### Prepare and load the segmentation dataset
141
141
142
To prepare the segmentation dataset either `Montgomery County`, `Shenzhen Hospital`, and `JSRT` datasets must be
142
To prepare the segmentation dataset either `Montgomery County`, `Shenzhen Hospital`, and `JSRT` datasets must be
143
downloaded from their websites and unpacked in a folder (for instance `data/sources/`). Than execute:
143
downloaded from their websites and unpacked in a folder (for instance `data/sources/`). Than execute:
144
```bash
144
```bash
145
 python3 -m datasets.lung_segmentation  --input_folder data/sources/ --target_size 512
145
 python3 -m datasets.lung_segmentation  --input_folder data/sources/ --target_size 512
146
```
146
```
147
or just import it (the first time it is executed, it will create the segmentation dataset):
147
or just import it (the first time it is executed, it will create the segmentation dataset):
148
148
149
```python
149
```python
150
from datasets import lung_segmentation  as ls
150
from datasets import lung_segmentation  as ls
151
151
152
# Check the docsting for additional info. The train-set is provided as a generator, while the validation set is
152
# Check the docsting for additional info. The train-set is provided as a generator, while the validation set is
153
# preloaded in memory.
153
# preloaded in memory.
154
# `get_data` accepts a configuration dictionary where you can specify every parameter. See `ls.default_config`
154
# `get_data` accepts a configuration dictionary where you can specify every parameter. See `ls.default_config`
155
train_gen, (val_imgs, val_masks) = ls.get_data()
155
train_gen, (val_imgs, val_masks) = ls.get_data()
156
```
156
```
157
157
158
### Prepare and load the alignment dataset
158
### Prepare and load the alignment dataset
159
To prepare the alignment dataset, the segmentation one mush be already built (see previous point)
159
To prepare the alignment dataset, the segmentation one mush be already built (see previous point)
160
160
161
```python
161
```python
162
from datasets import synthetic_alignment  as sa
162
from datasets import synthetic_alignment  as sa
163
163
164
# Check the docsting for additional info. The train-set and validation-set are provided as generators
164
# Check the docsting for additional info. The train-set and validation-set are provided as generators
165
# `get_data` accepts a configuration dictionary where you can specify every parameter. See `ls.default_config`
165
# `get_data` accepts a configuration dictionary where you can specify every parameter. See `ls.default_config`
166
train_gen, val_gen = sa.get_data()
166
train_gen, val_gen = sa.get_data()
167
```
167
```
168
168
169
### Model weights
169
### Model weights
170
170
171
The model weight and a demo notebook can be found [here](https://drive.google.com/drive/folders/18PF0xpYd4q_M8CJn7TiO4QXCny1PRgJZ?usp=sharing)
171
The model weight and a demo notebook can be found [here](https://drive.google.com/drive/folders/18PF0xpYd4q_M8CJn7TiO4QXCny1PRgJZ?usp=sharing)
172
172
173
### Other steps
173
### Other steps
174
174
175
Instructions for preparing and loading the Brixia Covid-19 Dataset and the BS-Net will follow (see specific sections for more info). 
175
Instructions for preparing and loading the Brixia Covid-19 Dataset and the BS-Net will follow (see specific sections for more info). 
176
176
177
177
178
## License and Attribution
178
## License and Attribution
179
179
180
**Disclaimer**
180
**Disclaimer**
181
181
182
The BS-Net model and source code, the BrixIA COVID-19 Dataset, and the Brixia score annotations, are provided "as-is" without any guarantee of correct functionality or guarantee of quality. No formal support for this software will be given to users. It is possible to report issues on GitHub though. This repository and any other part of the BrixIA COVID-19 Project should not be used for medical purposes. In particular this software should not be used to make, support, gain evidence on and aid medical decisions, interventions or diagnoses. Specific terms of use are indicated for each part of the project.
182
The BS-Net model and source code, the BrixIA COVID-19 Dataset, and the Brixia score annotations, are provided "as-is" without any guarantee of correct functionality or guarantee of quality. No formal support for this software will be given to users. It is possible to report issues on GitHub though. This repository and any other part of the BrixIA COVID-19 Project should not be used for medical purposes. In particular this software should not be used to make, support, gain evidence on and aid medical decisions, interventions or diagnoses. Specific terms of use are indicated for each part of the project.
183
183
184
###  Data
184
###  Data
185
185
186
   - BrixIA COVID-19 dataset: access conditions and term of use are reported on the [dataset website](https://brixia.github.io/#get-the-data).
186
   - BrixIA COVID-19 dataset: access conditions and term of use are reported on the [dataset website](https://brixia.github.io/#get-the-data).
187
   - Pulic Cohen dataset: Each image has license specified in the original file by [Cohen's repository](https://github.com/ieee8023/covid-chestxray-dataset) file. Including Apache 2.0, CC BY-NC-SA 4.0, CC BY 4.0. There are additional 7 images from Brescia under a CC BY-NC-SA 4.0 license.
187
   - Pulic Cohen dataset: Each image has license specified in the original file by [Cohen's repository](https://github.com/ieee8023/covid-chestxray-dataset) file. Including Apache 2.0, CC BY-NC-SA 4.0, CC BY 4.0. There are additional 7 images from Brescia under a CC BY-NC-SA 4.0 license.
188
   - Brixia-score annotations for the pulic Cohen's dataset are released under a CC BY-NC-SA 4.0 license.
188
   - Brixia-score annotations for the pulic Cohen's dataset are released under a CC BY-NC-SA 4.0 license.
189
  
189
  
190
### Code
190
### Code
191
191
192
  - Released under Open Source license.
192
  - Released under Open Source license.
193
193
194
## Contacts
194
## Contacts
195
195
196
Alberto Signoroni alberto.signoroni@unibs.it 
196
Alberto Signoroni alberto.signoroni@unibs.it 
197
197
198
Mattia Savardi m.savardi001@unibs.it
198
Mattia Savardi m.savardi001@unibs.it
199
199
200
## Citations
200
## Citations
201
201
202
For any use or reference to this project please cite the following paper.
202
For any use or reference to this project please cite the following paper.
203
203
204
[NEWS]: this work got accepted at Medical Image Analysis. Available [here](https://doi.org/10.1016/j.media.2021.102046)
204
[NEWS]: this work got accepted at Medical Image Analysis. Available [here](https://doi.org/10.1016/j.media.2021.102046)
205
```
205
```
206
@article{SIGNORONI2021102046,
206
@article{SIGNORONI2021102046,
207
title = {BS-Net: learning COVID-19 pneumonia severity on a large Chest X-Ray dataset},
207
title = {BS-Net: learning COVID-19 pneumonia severity on a large Chest X-Ray dataset},
208
journal = {Medical Image Analysis},
208
journal = {Medical Image Analysis},
209
pages = {102046},
209
pages = {102046},
210
year = {2021},
210
year = {2021},
211
issn = {1361-8415},
211
issn = {1361-8415},
212
doi = {https://doi.org/10.1016/j.media.2021.102046},
212
doi = {https://doi.org/10.1016/j.media.2021.102046},
213
url = {https://www.sciencedirect.com/science/article/pii/S136184152100092X},
213
url = {https://www.sciencedirect.com/science/article/pii/S136184152100092X},
214
author = {Alberto Signoroni and Mattia Savardi and Sergio Benini and Nicola Adami and Riccardo Leonardi and Paolo Gibellini and Filippo Vaccher and Marco Ravanelli and Andrea Borghesi and Roberto Maroldi and Davide Farina},
214
author = {Alberto Signoroni and Mattia Savardi and Sergio Benini and Nicola Adami and Riccardo Leonardi and Paolo Gibellini and Filippo Vaccher and Marco Ravanelli and Andrea Borghesi and Roberto Maroldi and Davide Farina},
215
}
215
}
216
```
216
```