Diff of /README.md [000000] .. [2afb35]

Switch to unified view

a b/README.md
1
# Robust Chest CT Image Segmentation of COVID-19 Lung Infection based on limited data
2
3
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3902293.svg)](https://doi.org/10.5281/zenodo.3902293)
4
5
In this paper, we proposed and evaluated an approach for automated segmentation of COVID-19 infected regions in CT volumes. Our method focused on on-the-fly generation of unique and random image patches for training by exploiting heavy preprocessing and extensive data augmentation. Thus, it is possible to handle limited dataset sizes which act as variant database. Instead of new and complex neural network architectures, we utilized the standard 3D U-Net. We proved that our medical image segmentation pipeline is able to successfully train accurate as well as robust models without overfitting on limited data.
6
Furthermore, we were able to outperform current state-of-the-art semantic segmentation approaches for lungs and COVID-19 infection. Our work has great potential to be applied as a clinical decision support system for COVID-19 quantitative assessment and disease monitoring in the clinical environment. Nevertheless, further research is needed on COVID-19 semantic segmentation in clinical studies for evaluating clinical performance and robustness.
7
8
The models, predictions, visualizations and evaluation (scores, figures) are available under the following link: https://doi.org/10.5281/zenodo.3902293
9
10
**This work does NOT claim clinical performance in any means and underlie purely educational purposes.**
11
12
![segmentation](docs/pdVSgt.png)
13
14
## Reproducibility
15
16
**Requirements:**
17
- Ubuntu 18.04
18
- Python 3.6
19
- NVIDIA QUADRO RTX 6000 or a GPU with equivalent performance
20
21
**Step-by-Step workflow:**
22
23
Download the code repository via git clone to your disk. Afterwards, install all required dependencies, download the dataset and setup the file structure.
24
25
```sh
26
git clone https://github.com/muellerdo/covid19.MIScnn.git
27
cd covid19.MIScnn/
28
29
pip3 install -r requirements.txt
30
python3 scripts/download_data.py
31
```
32
33
Optionally, you can run the data exploration, which give some interesting information about the dataset.
34
35
```sh
36
python3 scripts/data_exploration.py
37
```
38
39
For the training and inference process, you initialize the cross-validation folds by running the preprocessing. This setups a validation file structure and randomly samples the folds.
40
41
The most important step is running the training & inference process for each fold. This can be done either sequential or parallized on multiple GPUs.
42
43
```sh
44
python3 scripts/run_preprocessing.py
45
python3 scripts/run_miscnn.py --fold 0
46
python3 scripts/run_miscnn.py --fold 1
47
python3 scripts/run_miscnn.py --fold 2
48
python3 scripts/run_miscnn.py --fold 3
49
python3 scripts/run_miscnn.py --fold 4
50
```
51
52
Finally, the evaluation script computes all scores, visualizations and figures.
53
54
```sh
55
python3 scripts/run_evaluation.py
56
```
57
58
## Materials / Dataset
59
60
We used the public dataset from Ma et al. which consists of 20 annotated COVID-19 chest CT volumes⁠. Currently, this dataset is the only publicly available 3D volume set with annotated COVID-19 infection segmentation⁠. Each CT volume was first labeled by junior annotators, then refined by two radiologists with 5 years of experience and afterwards the annotations verified by senior radiologists with more than 10 years of experience⁠. The CT images were labeled into four classes: Background, lung left, lung right and COVID-19 infection.
61
62
Reference: https://zenodo.org/record/3757476#.XqhRp_lS-5D
63
64
## Methods
65
66
The implemented medical image segmentation pipeline can be summarized in the following core steps:
67
- Dataset: 20x COVID-19 CT volumes
68
- Limited dataset → Utilization as variation database
69
- Heavy preprocessing methods
70
- Extensive data augmentation
71
- Patchwise analysis of high-resolution images
72
- Utilization of the standard 3D U-Net
73
- Model fitting based on Tversky index & cross-entropy
74
- Model predictions on overlapping patches
75
- 5-fold cross-validation via Dice similarity coefficient
76
77
![architecture](docs/COVID19_MISCNN.architecture.png)
78
79
This pipeline was based on MIScnn⁠, which is an in-house developed open-source framework to setup complete medical image segmentation pipelines with convolutional neural networks and deep learning models on top of Tensorflow/Keras⁠. The framework supports extensive preprocessing, data augmentation, state-of-the-art deep learning models and diverse evaluation techniques. The experiment was performed on a Nvidia Quadro P6000.
80
81
MIScnn: https://github.com/frankkramer-lab/MIScnn
82
83
## Results & Discussion
84
85
Through validation monitoring during the training,
86
no overfitting was observed. The training and validation
87
loss function revealed no significant distinction from each
88
other. During the fitting, the
89
performance settled down at a loss of around 0.383 which is
90
a generalized DSC (average of all class-wise DSCs) of
91
around 0.919. Because of this robust training process
92
without any signs of overfitting, we concluded that fitting
93
on randomly generated patches via extensive data
94
augmentation and random cropping from a variant database,
95
is highly efficient for limited imaging data.
96
97
![fitting_and_boxplot](docs/fitting_and_boxplot.png)
98
99
The inference revealed a strong segmentation performance for lungs, as well as, COVID-19 infected regions. Overall, the
100
cross-validation models achieved a DSC of around 0.956 for lung and 0.761 for COVID-19 infection segmentation.  
101
Furthermore, the models achieved a sensitivity and
102
specificity of 0.956 and 0.998 for lungs, as well as, 0.730
103
and 0.999 for infection, respectively.
104
105
Nevertheless, our medical image
106
segmentation pipeline allowed fitting a model which is able
107
to segment COVID-19 infection with state-of-the-art
108
accuracy that is comparable to models trained on large
109
datasets.
110
111
## Author
112
113
Dominik Müller  
114
Email: dominik.mueller@informatik.uni-augsburg.de  
115
IT-Infrastructure for Translational Medical Research  
116
University Augsburg  
117
Bavaria, Germany
118
119
## How to cite / More information
120
121
Dominik Müller, Iñaki Soto-Rey and Frank Kramer.  
122
Robust chest CT image segmentation of COVID-19 lung infection based on limited data.  
123
Informatics in Medicine Unlocked. Volume 25, 2021.  
124
DOI: https://doi.org/10.1016/j.imu.2021.100681
125
126
```
127
@article{MULLER2021100681,
128
title = {Robust chest CT image segmentation of COVID-19 lung infection based on limited data},
129
journal = {Informatics in Medicine Unlocked},
130
volume = {25},
131
pages = {100681},
132
year = {2021},
133
issn = {2352-9148},
134
doi = {https://doi.org/10.1016/j.imu.2021.100681},
135
url = {https://www.sciencedirect.com/science/article/pii/S2352914821001660},
136
author = {Dominik Müller and Iñaki Soto-Rey and Frank Kramer},
137
keywords = {COVID-19, Segmentation, Limited data, Computed tomography, Deep learning, Artificial intelligence},
138
eprint={2007.04774},
139
archivePrefix={arXiv},
140
primaryClass={eess.IV}
141
}
142
```
143
144
Thank you for citing our work.
145
146
## License
147
148
This project is licensed under the GNU GENERAL PUBLIC LICENSE Version 3.  
149
See the LICENSE.md file for license rights and limitations.