a/README.md b/README.md
1
# COVID-19 xray dataset
1
# COVID-19 xray dataset
2
2
3
3
4
![media/covid-chest-xray-cover.jpg](media/covid-chest-xray-cover.jpg)
4
<img src="https://github.com/v7labs/covid-19-xray-dataset/blob/master/media/covid-chest-xray-cover.jpg"/>
5
5
6
[BROWSE & DOWNLOAD THE DATASET ON V7 DARWIN HERE](https://darwin.v7labs.com/v7-labs/covid-19-chest-x-ray-dataset)
6
[BROWSE & DOWNLOAD THE DATASET ON V7 DARWIN HERE](https://darwin.v7labs.com/v7-labs/covid-19-chest-x-ray-dataset)
7
7
8
or run the following command from [Darwin-py](https://v7labs.github.io/darwin-py/) to download the latest version
8
or run the following command from [Darwin-py](https://v7labs.github.io/darwin-py/) to download the latest version
9
9
10
```bash
10
```bash
11
pip install darwin-py
11
pip install darwin-py
12
darwin dataset pull v7-labs/covid-19-chest-x-ray-dataset:all-images
12
darwin dataset pull v7-labs/covid-19-chest-x-ray-dataset:all-images
13
```
13
```
14
14
15
This dataset contains 6500 images of AP/PA chest x-rays with pixel-level polygonal lung segmentations. There are [517 cases](https://github.com/ieee8023/covid-chestxray-dataset) of COVID-19 amongst these.
15
This dataset contains 6500 images of AP/PA chest x-rays with pixel-level polygonal lung segmentations. There are [517 cases](https://github.com/ieee8023/covid-chestxray-dataset) of COVID-19 amongst these.
16
Use the command below to download only images presenting COVID-19.
16
Use the command below to download only images presenting COVID-19.
17
17
18
```sh
18
```sh
19
darwin dataset pull v7-labs/covid-19-chest-x-ray-dataset:covid-only
19
darwin dataset pull v7-labs/covid-19-chest-x-ray-dataset:covid-only
20
```
20
```
21
21
22
**WARNING:** This dataset is not intended for use in clinical diagnostics.
22
**WARNING:** This dataset is not intended for use in clinical diagnostics.
23
23
24
Each image contains:
24
Each image contains:
25
25
26
- Two "Lung" segmentation masks (rendered as polygons, including the posterior region behind the heart).
26
- Two "Lung" segmentation masks (rendered as polygons, including the posterior region behind the heart).
27
- A tag for the type of pneumonia (viral, bacterial, fungal, healthy/none)
27
- A tag for the type of pneumonia (viral, bacterial, fungal, healthy/none)
28
- If the patient has COVID-19, additional tags stating age, sex, temperature, location, intubation status, ICU admission, and patient outcome.
28
- If the patient has COVID-19, additional tags stating age, sex, temperature, location, intubation status, ICU admission, and patient outcome.
29
29
30
Lung annotations are polygons following pixel-level boundaries. These can be exported as `COCO`, `VOC`, or `Darwin JSON` formats. Each annotation file contains a URL to the original full resolution image, as well as a reduced size thumbnail.
30
Lung annotations are polygons following pixel-level boundaries. These can be exported as `COCO`, `VOC`, or `Darwin JSON` formats. Each annotation file contains a URL to the original full resolution image, as well as a reduced size thumbnail.
31
31
32
![media/example_1.png](media/example_1.png)
32
<img src="https://github.com/v7labs/covid-19-xray-dataset/blob/master/media/example_1.png" />
33
33
34
**LUNG SEGMENTATION NOTES**: Lung segmentations in this dataset include most of the heart, revealing lung opacities behind the heart which may be relevant for assessing the severity of viral pneumonia. Uniformly shaped lungs also de-couples the shape and content within the left lung from the size of the heart.
34
**LUNG SEGMENTATION NOTES**: Lung segmentations in this dataset include most of the heart, revealing lung opacities behind the heart which may be relevant for assessing the severity of viral pneumonia. Uniformly shaped lungs also de-couples the shape and content within the left lung from the size of the heart.
35
35
36
The lower-most part of the lungs is defined by the extent of the diaphragm, where visible. If the back of the lungs is clearly visible through the diaphragm it is also included up until the lower-most visible part of the lungs.
36
The lower-most part of the lungs is defined by the extent of the diaphragm, where visible. If the back of the lungs is clearly visible through the diaphragm it is also included up until the lower-most visible part of the lungs.
37
37
38
![media/Jun-17-2020_15-21-23.gif](media/Jun-17-2020_15-21-23.gif)
38
![media/Jun-17-2020_15-21-23.gif](media/Jun-17-2020_15-21-23.gif)
39
39
40
Lung segmentations were performed by human annotators using [Auto-Annotate](https://www.v7labs.com/automated-annotation), adjusted, and reviewed by humans.
40
Lung segmentations were performed by human annotators using [Auto-Annotate](https://www.v7labs.com/automated-annotation), adjusted, and reviewed by humans.
41
41
42
Other important notes:
42
Other important notes:
43
43
44
- Image resolutions, sources, and orientations vary across the dataset, with the largest image being 5600x4700 and smallest being 156x156. You may sort images by dimensions on [Darwin](https://darwin.v7labs.com/v7-labs/covid-19-chest-x-ray-dataset) to exclude those below a threshold.
44
- Image resolutions, sources, and orientations vary across the dataset, with the largest image being 5600x4700 and smallest being 156x156. You may sort images by dimensions on [Darwin](https://darwin.v7labs.com/v7-labs/covid-19-chest-x-ray-dataset) to exclude those below a threshold.
45
- Lateral x-rays do not contain lung segmentations. They have classification tags, but should be ignored if you are working with detection-based networks.
45
- Lateral x-rays do not contain lung segmentations. They have classification tags, but should be ignored if you are working with detection-based networks.
46
- There are 63 axial CT scan slices left un-labelled with masks (although they contain tags) as a way of maintaining integrity to one of the source datasets. We encourage discarding these when performing x-ray analysis.
46
- There are 63 axial CT scan slices left un-labelled with masks (although they contain tags) as a way of maintaining integrity to one of the source datasets. We encourage discarding these when performing x-ray analysis.
47
- Portable x-ray images are of significant lower quality than others. Be aware that they correlate highly with severe conditions. Classification models will bias portable x-ray images with diseases like COVID-19.
47
- Portable x-ray images are of significant lower quality than others. Be aware that they correlate highly with severe conditions. Classification models will bias portable x-ray images with diseases like COVID-19.
48
- Medical instruments like pacemakers and markup that overlap the lungs are masked with an "Ignore" class. We encourage masking these out when performing lung analysis as they correlated strongly with sick patients. Intubation instruments are not removed if smaller/thinner than 1cm.
48
- Medical instruments like pacemakers and markup that overlap the lungs are masked with an "Ignore" class. We encourage masking these out when performing lung analysis as they correlated strongly with sick patients. Intubation instruments are not removed if smaller/thinner than 1cm.
49
49
50
![media/example_2.png](media/example_2.png)
50
![media/example_2.png](media/example_2.png)
51
51
52
![media/example_3.png](media/example_3.png)
52
![media/example_3.png](media/example_3.png)
53
53
54
![media/example_4.png](media/example_4.png)
54
![media/example_4.png](media/example_4.png)
55
55
56
You may also use the `Ignore` class to filter out images with occluding markups or large medical instruments.
56
You may also use the `Ignore` class to filter out images with occluding markups or large medical instruments.
57
57
58
You can browse the available images and filter them by tag or annotation class using the right-sidebar as seen below. Follow this [Link](https://darwin.v7labs.com/v7-labs/covid-19-chest-x-ray-dataset) to access the interactive dataset.
58
You can browse the available images and filter them by tag or annotation class using the right-sidebar as seen below. Follow this [Link](https://darwin.v7labs.com/v7-labs/covid-19-chest-x-ray-dataset) to access the interactive dataset.
59
59
60
![media/Screen_Shot_2020-06-17_at_4.21.57_PM.png](media/Screen_Shot_2020-06-17_at_4.21.57_PM.png)
60
![media/Screen_Shot_2020-06-17_at_4.21.57_PM.png](media/Screen_Shot_2020-06-17_at_4.21.57_PM.png)
61
61
62
Below are the 23 most commonly represented classes by image count and instances:
62
Below are the 23 most commonly represented classes by image count and instances:
63
63
64
![media/class_list.png](media/class_list.png)
64
![media/class_list.png](media/class_list.png)
65
65
66
## Data sources and licenses
66
## Data sources and licenses
67
67
68
**Annotations**
68
**Annotations**
69
69
70
Special thanks to [CloudFactory](https://cloudfactory.com) for providing the human workforce for this research project. Each image was viewed and labelled by a human, and reviewed by [V7](https://v7labs.com).
70
Special thanks to [CloudFactory](https://cloudfactory.com) for providing the human workforce for this research project. Each image was viewed and labelled by a human, and reviewed by [V7](https://v7labs.com).
71
71
72
**License:** [CC4.0](https://creativecommons.org/licenses/by/4.0/)
72
**License:** [CC4.0](https://creativecommons.org/licenses/by/4.0/)
73
73
74
**Source 1:**
74
**Source 1:**
75
75
76
There are 517 cases of COVID-19 from the collaborative efforts of [https://github.com/ieee8023/covid-chestxray-dataset](https://github.com/ieee8023/covid-chestxray-dataset) where export versions of this dataset are also available.
76
There are 517 cases of COVID-19 from the collaborative efforts of [https://github.com/ieee8023/covid-chestxray-dataset](https://github.com/ieee8023/covid-chestxray-dataset) where export versions of this dataset are also available.
77
77
78
*Joseph Paul Cohen and Paul Morrison and Lan Dao
78
*Joseph Paul Cohen and Paul Morrison and Lan Dao
79
COVID-19 image data collection, arXiv:2003.11597, 2020
79
COVID-19 image data collection, arXiv:2003.11597, 2020
80
https://github.com/ieee8023/covid-chestxray-dataset*
80
https://github.com/ieee8023/covid-chestxray-dataset*
81
81
82
Licenses for the images of the dataset above are included in the **metadata.csv** file sourced from the repository above.
82
Licenses for the images of the dataset above are included in the **metadata.csv** file sourced from the repository above.
83
83
84
**Source 2:**
84
**Source 2:**
85
85
86
5863 images are sourced from [https://data.mendeley.com/datasets/rscbjbr9sj/2](https://data.mendeley.com/datasets/rscbjbr9sj/2) (also available and commonly referred to by this Kaggle dataset: [https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia/data](https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia/data))
86
5863 images are sourced from [https://data.mendeley.com/datasets/rscbjbr9sj/2](https://data.mendeley.com/datasets/rscbjbr9sj/2) (also available and commonly referred to by this Kaggle dataset: [https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia/data](https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia/data))
87
**License:** [CC4.0](https://creativecommons.org/licenses/by/4.0/)
87
**License:** [CC4.0](https://creativecommons.org/licenses/by/4.0/)
88
88
89
*Kermany, Daniel; Zhang, Kang; Goldbaum, Michael (2018), “Labeled Optical Coherence Tomography (OCT) and Chest X-Ray Images for Classification”, Mendeley Data, v2http://dx.doi.org/10.17632/rscbjbr9sj.2*
89
*Kermany, Daniel; Zhang, Kang; Goldbaum, Michael (2018), “Labeled Optical Coherence Tomography (OCT) and Chest X-Ray Images for Classification”, Mendeley Data, v2http://dx.doi.org/10.17632/rscbjbr9sj.2*
90
90
91
**Special thanks**
91
**Special thanks**
92
92
93
CloudFactory for providing human annotation labor.
93
CloudFactory for providing human annotation labor.
94
The following radiologists for providing their time, knowledge, connections and dedications to make our research possible:
94
The following radiologists for providing their time, knowledge, connections and dedications to make our research possible:
95
95
96
Prof. Lorenzo Preda
96
Prof. Lorenzo Preda
97
97
98
Prof. Nicola Sverzellati
98
Prof. Nicola Sverzellati
99
99
100
Prof. Luca Richeldi
100
Prof. Luca Richeldi
101
101
102
Prof. Alessandro Venturi
102
Prof. Alessandro Venturi
103
103
104
Prof. Francesco Vaia
104
Prof. Francesco Vaia
105
105
106
Prof. Paolo Spagnolo
106
Prof. Paolo Spagnolo