Diff of /README.md [000000] .. [d1111c]

Switch to unified view

a b/README.md
1
2
# Head CT Data Release
3
4
Description of the images shared:
5
6
The data consists of 35 different subject’s non-contrast CT scan, with 2
7
subjects having 2 scans but every other subject having one scan. One
8
subject additionally has 2 scans, but from the same time point, but with
9
different convolution kernels. Each scan has at least one reader’s
10
manual segmentation of the image to delineate the mask of the brain
11
areas (including cerebrospinal fluid (CSF)).
12
13
The data are from the Minimally-Invasive Surgery plus rt-PA for
14
Intracerebral hemorrhage Evacuation (MISTIE) Phases II and III clinical
15
trials the Clot Lysis: Evaluating Accelerated Resolution of
16
Intraventricular Hemorrhage (CLEAR) Phase III clinical trial. Each
17
subject’s scans and DICOM (Digital Imaging and Communications in
18
Medicine) header information are contained in a tarball (`.tar.xz`),
19
with the name of the tarball being the `id` of the subject (`01.tar.gz`
20
is for `id` `01`).
21
22
## Each Tarball
23
24
### Image Data
25
26
Each tarball contains a series of NIfTI (Neuroimaging Informatics
27
Technology Initiative) files. These files can be read using common tools
28
such as ITK-SNAP (<http://www.itksnap.org/pmwiki/pmwiki.php>), Mango
29
(<http://ric.uthscsa.edu/mango/>), or Papaya
30
(<http://ric.uthscsa.edu/mango/papaya.html>). Each image has manual
31
segmentation from 2 different readers which is a binary image
32
delineating the brain and an automated estimate of the brain by the
33
method proposed in the paper below.
34
35
The NIfTI images are as follows:
36
37
1.  BRAIN\_1\_Anonymized.nii.gz - CT image that has been
38
    de-identified/anonymized where the face and the ears have been
39
    removed from the image. These should have a range of -1024 to 3071
40
    and be in Hounsfield Units (HU).
41
2.  BRAIN\_1\_Anonymized\_Mask.nii.gz - a binary image denoting the
42
    brain areas using the method described in Muschelli et al. (2015).
43
    This method is also implemented in the function `CT_Skull_Strip`
44
    function in the `ichseg` R package
45
    (<https://github.com/muschellij2/ichseg>).
46
3.  Manual\_Mask\_1\_Reader\_1.nii.gz - binary brain mask from Reader 1
47
4.  Manual\_Mask\_1\_Reader\_2.nii.gz - binary brain mask from Reader 2
48
49
Reader 1 and reader 2 are consistent across subjects (e.g. reader 1 is
50
always reader 1).
51
52
### Header information
53
54
In the tarballs, there are also `CSV` (Comma-separated) files, such as:
55
56
  - BRAIN\_1\_header.csv
57
  - Manual\_Mask\_1\_Reader\_1\_header.csv
58
  - Manual\_Mask\_1\_Reader\_2\_header.csv
59
60
which contain information from the DICOM header. Mostly the only files
61
of interest are those with `BRAIN/BONE` in the name as the Manual mask
62
images are derived from those images. These are included for
63
completeness.
64
65
Specific subsets of the DICOM data/tags are given with the data:
66
0008-0070-MANUFACTURER 0018-0050-SLICETHICKNESS 0018-0060-KVP
67
0018-1120-GANTRYDETECTORTILT 0018-1151-XRAYTUBECURRENT
68
0018-1152-EXPOSURE 0018-1160-FILTERTYPE 0018-1170-GENERATORPOWER
69
0018-1210-CONVOLUTIONKERNEL 0018-5100-PATIENTPOSITION
70
0020-0032-IMAGEPOSITIONPATIENT 0020-1041-SLICELOCATION
71
0028-0030-PIXELSPACING 0008-0070-Manufacturer 0018-0050-SliceThickness
72
0020-0032-ImagePositionPatient 0020-1041-SliceLocation
73
0028-0030-PixelSpacing
74
75
### Different files in some tarballs
76
77
Subject 13 has a `BONE` scan as well as a `BRAIN` scan. The `BONE` scan
78
is a non-contrast CT done with a different convolution kernel, which
79
gives different properties of the scan. The manual segmentations are in
80
the same space as both scans (i.e. no registration was done).
81
82
Subjects 06 and 16 have both a `BRAIN_1` and `BRAIN_2` scan. These were
83
images taken at 2 different time points. If the manual reader performed
84
segmentation on the scan, then a file such as
85
`Manual_Mask_2_Reader_1.nii.gz` would indicate the index for the scan
86
(e.g. `Manual_Mask_2` corresponds to `BRAIN_2`).
87
88
## Demographics
89
90
Demographic information is `id` column – which maps back to our original
91
de-identified ID, which has no information about patient other that site
92
and enrollment number, but the mapping is not provided or released.
93
Additional information such as age in years (range 40 – 78) rounded to
94
nearest year, sex, race (as measured in the trial), Hispanic indicator,
95
site ID (again maps back to the original data but no mapping is
96
provided) and diagnosis (`dx`). the diagnosis was made from the reading
97
center, indicating intracerebral hemorrhage (ICH) or ICH with
98
intraventricular hemorrhage (IVH), indicated as ICH with IVH in the `dx`
99
column.
100
101
The site ID is to allow researchers to see if there is a small site
102
effect (but not likely possible due to how few subjects).
103
104
## The de-identification mechanism
105
106
A mask was determined outlined on a template of a mask and the ears of
107
the image. The face and ear masking of the data was done by registering
108
a template image to the individual images separately. The method is
109
similar to `pydeface` (<https://github.com/poldracklab/pydeface>) and
110
was implemented in the `fslr` (<https://github.com/muschellij2/fslr>)
111
package. The template data was adapted from `pydeface` and the template
112
used for registration was
113
<https://github.com/muschellij2/pydeface/raw/master/pydeface/data/mean_reg2mean.nii.gz>
114
and the template mask used was
115
<https://github.com/muschellij2/pydeface/raw/master/pydeface/data/facemask_no_ears.nii.gz>.
116
117
## References
118
119
Muschelli, John, et al. “Validated automatic brain extraction of head CT
120
images.” Neuroimage 114 (2015): 379-385.
121
<https://doi.org/10.1016/j.neuroimage.2015.03.074>