|
a |
|
b/README.md |
|
|
1 |
|
|
|
2 |
# Head CT Data Release |
|
|
3 |
|
|
|
4 |
Description of the images shared: |
|
|
5 |
|
|
|
6 |
The data consists of 35 different subject’s non-contrast CT scan, with 2 |
|
|
7 |
subjects having 2 scans but every other subject having one scan. One |
|
|
8 |
subject additionally has 2 scans, but from the same time point, but with |
|
|
9 |
different convolution kernels. Each scan has at least one reader’s |
|
|
10 |
manual segmentation of the image to delineate the mask of the brain |
|
|
11 |
areas (including cerebrospinal fluid (CSF)). |
|
|
12 |
|
|
|
13 |
The data are from the Minimally-Invasive Surgery plus rt-PA for |
|
|
14 |
Intracerebral hemorrhage Evacuation (MISTIE) Phases II and III clinical |
|
|
15 |
trials the Clot Lysis: Evaluating Accelerated Resolution of |
|
|
16 |
Intraventricular Hemorrhage (CLEAR) Phase III clinical trial. Each |
|
|
17 |
subject’s scans and DICOM (Digital Imaging and Communications in |
|
|
18 |
Medicine) header information are contained in a tarball (`.tar.xz`), |
|
|
19 |
with the name of the tarball being the `id` of the subject (`01.tar.gz` |
|
|
20 |
is for `id` `01`). |
|
|
21 |
|
|
|
22 |
## Each Tarball |
|
|
23 |
|
|
|
24 |
### Image Data |
|
|
25 |
|
|
|
26 |
Each tarball contains a series of NIfTI (Neuroimaging Informatics |
|
|
27 |
Technology Initiative) files. These files can be read using common tools |
|
|
28 |
such as ITK-SNAP (<http://www.itksnap.org/pmwiki/pmwiki.php>), Mango |
|
|
29 |
(<http://ric.uthscsa.edu/mango/>), or Papaya |
|
|
30 |
(<http://ric.uthscsa.edu/mango/papaya.html>). Each image has manual |
|
|
31 |
segmentation from 2 different readers which is a binary image |
|
|
32 |
delineating the brain and an automated estimate of the brain by the |
|
|
33 |
method proposed in the paper below. |
|
|
34 |
|
|
|
35 |
The NIfTI images are as follows: |
|
|
36 |
|
|
|
37 |
1. BRAIN\_1\_Anonymized.nii.gz - CT image that has been |
|
|
38 |
de-identified/anonymized where the face and the ears have been |
|
|
39 |
removed from the image. These should have a range of -1024 to 3071 |
|
|
40 |
and be in Hounsfield Units (HU). |
|
|
41 |
2. BRAIN\_1\_Anonymized\_Mask.nii.gz - a binary image denoting the |
|
|
42 |
brain areas using the method described in Muschelli et al. (2015). |
|
|
43 |
This method is also implemented in the function `CT_Skull_Strip` |
|
|
44 |
function in the `ichseg` R package |
|
|
45 |
(<https://github.com/muschellij2/ichseg>). |
|
|
46 |
3. Manual\_Mask\_1\_Reader\_1.nii.gz - binary brain mask from Reader 1 |
|
|
47 |
4. Manual\_Mask\_1\_Reader\_2.nii.gz - binary brain mask from Reader 2 |
|
|
48 |
|
|
|
49 |
Reader 1 and reader 2 are consistent across subjects (e.g. reader 1 is |
|
|
50 |
always reader 1). |
|
|
51 |
|
|
|
52 |
### Header information |
|
|
53 |
|
|
|
54 |
In the tarballs, there are also `CSV` (Comma-separated) files, such as: |
|
|
55 |
|
|
|
56 |
- BRAIN\_1\_header.csv |
|
|
57 |
- Manual\_Mask\_1\_Reader\_1\_header.csv |
|
|
58 |
- Manual\_Mask\_1\_Reader\_2\_header.csv |
|
|
59 |
|
|
|
60 |
which contain information from the DICOM header. Mostly the only files |
|
|
61 |
of interest are those with `BRAIN/BONE` in the name as the Manual mask |
|
|
62 |
images are derived from those images. These are included for |
|
|
63 |
completeness. |
|
|
64 |
|
|
|
65 |
Specific subsets of the DICOM data/tags are given with the data: |
|
|
66 |
0008-0070-MANUFACTURER 0018-0050-SLICETHICKNESS 0018-0060-KVP |
|
|
67 |
0018-1120-GANTRYDETECTORTILT 0018-1151-XRAYTUBECURRENT |
|
|
68 |
0018-1152-EXPOSURE 0018-1160-FILTERTYPE 0018-1170-GENERATORPOWER |
|
|
69 |
0018-1210-CONVOLUTIONKERNEL 0018-5100-PATIENTPOSITION |
|
|
70 |
0020-0032-IMAGEPOSITIONPATIENT 0020-1041-SLICELOCATION |
|
|
71 |
0028-0030-PIXELSPACING 0008-0070-Manufacturer 0018-0050-SliceThickness |
|
|
72 |
0020-0032-ImagePositionPatient 0020-1041-SliceLocation |
|
|
73 |
0028-0030-PixelSpacing |
|
|
74 |
|
|
|
75 |
### Different files in some tarballs |
|
|
76 |
|
|
|
77 |
Subject 13 has a `BONE` scan as well as a `BRAIN` scan. The `BONE` scan |
|
|
78 |
is a non-contrast CT done with a different convolution kernel, which |
|
|
79 |
gives different properties of the scan. The manual segmentations are in |
|
|
80 |
the same space as both scans (i.e. no registration was done). |
|
|
81 |
|
|
|
82 |
Subjects 06 and 16 have both a `BRAIN_1` and `BRAIN_2` scan. These were |
|
|
83 |
images taken at 2 different time points. If the manual reader performed |
|
|
84 |
segmentation on the scan, then a file such as |
|
|
85 |
`Manual_Mask_2_Reader_1.nii.gz` would indicate the index for the scan |
|
|
86 |
(e.g. `Manual_Mask_2` corresponds to `BRAIN_2`). |
|
|
87 |
|
|
|
88 |
## Demographics |
|
|
89 |
|
|
|
90 |
Demographic information is `id` column – which maps back to our original |
|
|
91 |
de-identified ID, which has no information about patient other that site |
|
|
92 |
and enrollment number, but the mapping is not provided or released. |
|
|
93 |
Additional information such as age in years (range 40 – 78) rounded to |
|
|
94 |
nearest year, sex, race (as measured in the trial), Hispanic indicator, |
|
|
95 |
site ID (again maps back to the original data but no mapping is |
|
|
96 |
provided) and diagnosis (`dx`). the diagnosis was made from the reading |
|
|
97 |
center, indicating intracerebral hemorrhage (ICH) or ICH with |
|
|
98 |
intraventricular hemorrhage (IVH), indicated as ICH with IVH in the `dx` |
|
|
99 |
column. |
|
|
100 |
|
|
|
101 |
The site ID is to allow researchers to see if there is a small site |
|
|
102 |
effect (but not likely possible due to how few subjects). |
|
|
103 |
|
|
|
104 |
## The de-identification mechanism |
|
|
105 |
|
|
|
106 |
A mask was determined outlined on a template of a mask and the ears of |
|
|
107 |
the image. The face and ear masking of the data was done by registering |
|
|
108 |
a template image to the individual images separately. The method is |
|
|
109 |
similar to `pydeface` (<https://github.com/poldracklab/pydeface>) and |
|
|
110 |
was implemented in the `fslr` (<https://github.com/muschellij2/fslr>) |
|
|
111 |
package. The template data was adapted from `pydeface` and the template |
|
|
112 |
used for registration was |
|
|
113 |
<https://github.com/muschellij2/pydeface/raw/master/pydeface/data/mean_reg2mean.nii.gz> |
|
|
114 |
and the template mask used was |
|
|
115 |
<https://github.com/muschellij2/pydeface/raw/master/pydeface/data/facemask_no_ears.nii.gz>. |
|
|
116 |
|
|
|
117 |
## References |
|
|
118 |
|
|
|
119 |
Muschelli, John, et al. “Validated automatic brain extraction of head CT |
|
|
120 |
images.” Neuroimage 114 (2015): 379-385. |
|
|
121 |
<https://doi.org/10.1016/j.neuroimage.2015.03.074> |