a b/README.md
1
# Brain Hemorrhage Extended (BHX): Bounding Box Extrapolation from Thick to Thin Slice CT Images
2
3
**Creators**:  
4
Eduardo Pontes Reis, Felipe Nascimento, Mateus Aranha, Fernando Mainetti Secol, Birajara Machado, Marcelo Felix, Anouk Stein, Edson Amaro
5
6
**Published**: July 29, 2020  
7
**Version**: 1.1
8
9
---
10
11
## πŸ“„ Citation
12
13
Reis, E. P., Nascimento, F., Aranha, M., Mainetti Secol, F., Machado, B., Felix, M., Stein, A., & Amaro, E. (2020). *Brain Hemorrhage Extended (BHX): Bounding box extrapolation from thick to thin slice CT images* (version 1.1). PhysioNet. https://doi.org/10.13026/9cft-hg92
14
15
### PhysioNet Standard Citation:
16
17
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). *PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals*. Circulation [Online]. 101 (23), pp. e215–e220.
18
19
---
20
21
## 🧠 Abstract
22
23
BHX is a publicly available dataset with bounding box annotations for five types of acute hemorrhage as an extension of the [qure.ai CQ500 dataset](http://headctstudy.qure.ai/dataset). The dataset is intended to aid in developing machine learning solutions for hemorrhage detection and localization.
24
25
---
26
27
## πŸ“Œ Key Points
28
29
- **39,668** bounding boxes in **23,409** images annotated for hemorrhage.
30
- Built from the ~170k image **CQ500 dataset**.
31
- Bounding boxes extrapolated from sparse thick-slice labeling to thin-slice images.
32
- Supports machine learning applications for hemorrhage localization and diagnosis.
33
34
---
35
36
## 🧬 Background
37
38
- Intracranial hemorrhage is a serious condition with a **40% one-month mortality rate**.
39
- Head CT is the primary imaging modality.
40
- Manual bounding box annotation is time-consuming; hence, extrapolation methods are used.
41
- No existing public dataset includes **localization data** with bounding boxes until BHX.
42
43
---
44
45
## πŸ§ͺ Methods
46
47
- Based on the **CQ500 dataset** (491 scans, 205 hemorrhage-positive).
48
- Labeled by 3 neuroradiologists with varying experience.
49
- Matching between thick and thin series was done via DICOM tag: `Image Position (patient)`.
50
- Six types of hemorrhage labeled:  
51
  - **Intraparenchymal**  
52
  - **Subarachnoid**  
53
  - **Intraventricular**  
54
  - **Epidural**  
55
  - **Acute Subdural**  
56
  - **Chronic Subdural**
57
58
---
59
60
## πŸ—ƒοΈ Data Description
61
62
### Annotation Stats:
63
64
- **6,283** manually labeled bounding boxes in **3,558** images.
65
- **39,668** extrapolated bounding boxes in **23,409** images.
66
67
### Dataset Versions:
68
69
1. `1_Initial_Manual_Labeling.csv`: Hand-drawn annotations on thick slices.
70
2. `2_Extrapolation_to_All_Series.csv`: Extrapolated to all corresponding series.
71
3. `3_Extrapolation_to_Selected_Series.csv`: Extrapolated only for selected soft-tissue thin-slice series.
72
73
### Columns:
74
75
- `SOPInstanceUID`: Unique DICOM image ID
76
- `SeriesInstanceUID`: DICOM series ID
77
- `StudyInstanceUID`: DICOM study ID
78
- `data`: Bounding box coordinates (X, Y, width, height)
79
- `labelName`: Hemorrhage type
80
- `labelType`: Source of image (thick-slices, thin-slices, or other)
81
82
---
83
84
## πŸ”— DICOM UID Mapping
85
86
- Annotations are linked via DICOM tag: `0008,0018 – SOP Instance UID`.
87
88
---
89
90
## πŸ“‚ Original Images
91
92
- Hosted at: [http://headctstudy.qure.ai/dataset](http://headctstudy.qure.ai/dataset)
93
94
---
95
96
## πŸ” Usage Notes
97
98
- Unique resource for **bounding-box annotated hemorrhage images**.
99
- Enables **benchmarking** and **development** of deep learning algorithms.
100
- Includes extrapolated labels, some of which may have minor inaccuracies.
101
- Future work should consider **interpolating bounding boxes** between slices for smoother transitions.
102
103
---
104
105
## πŸ–ΌοΈ Visual Inspection
106
107
- Explore annotated images at:  
108
  [https://public.md.ai/annotator/project/Y2qr6vqv/workspace](https://public.md.ai/annotator/project/Y2qr6vqv/workspace)
109
110
---
111
112
## πŸ™ Acknowledgements
113
114
- **qure.ai** – for publishing the CQ500 dataset.  
115
- **MD.ai** – for providing the annotation platform.
116
117
---
118
119
## ⚠️ Conflicts of Interest
120
121
- A.S. is employed by MD.ai, which provided the annotation platform.
122
123
---
124
125
## πŸ“š References
126
127
1. van Asch C, et al. *The Lancet Neurology*, 2010.
128
2. Heit J, et al. *Journal of Stroke*, 2017.
129
3. Chang P, et al. *American Journal of Neuroradiology*, 2018.
130
4. Goldstein J, Gilson A. *Current Treatment Options in Neurology*, 2011.
131
5. Prevedello L, et al. *Radiology*, 2017.
132
6. Chilamkurthy S, et al. *The Lancet*, 2018.
133
7. Kuo W, et al. *PNAS*, 2019.
134
8. RSNA ICH Detection. [Kaggle](https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection)
135
9. Mirza S, Gokhale S. *IntechOpen*, 2017.
136
10. Osborn A, et al. *Osborn’s Brain*, Elsevier, 2018.
137
11. Weiss K, et al. *AJR*, 2011.
138
12. DICOM Standard. [dicomstandard.org](https://www.dicomstandard.org)
139
140
---