--- a
+++ b/README.md
@@ -0,0 +1,121 @@
+
+# Head CT Data Release
+
+Description of the images shared:
+
+The data consists of 35 different subject’s non-contrast CT scan, with 2
+subjects having 2 scans but every other subject having one scan. One
+subject additionally has 2 scans, but from the same time point, but with
+different convolution kernels. Each scan has at least one reader’s
+manual segmentation of the image to delineate the mask of the brain
+areas (including cerebrospinal fluid (CSF)).
+
+The data are from the Minimally-Invasive Surgery plus rt-PA for
+Intracerebral hemorrhage Evacuation (MISTIE) Phases II and III clinical
+trials the Clot Lysis: Evaluating Accelerated Resolution of
+Intraventricular Hemorrhage (CLEAR) Phase III clinical trial. Each
+subject’s scans and DICOM (Digital Imaging and Communications in
+Medicine) header information are contained in a tarball (`.tar.xz`),
+with the name of the tarball being the `id` of the subject (`01.tar.gz`
+is for `id` `01`).
+
+## Each Tarball
+
+### Image Data
+
+Each tarball contains a series of NIfTI (Neuroimaging Informatics
+Technology Initiative) files. These files can be read using common tools
+such as ITK-SNAP (<http://www.itksnap.org/pmwiki/pmwiki.php>), Mango
+(<http://ric.uthscsa.edu/mango/>), or Papaya
+(<http://ric.uthscsa.edu/mango/papaya.html>). Each image has manual
+segmentation from 2 different readers which is a binary image
+delineating the brain and an automated estimate of the brain by the
+method proposed in the paper below.
+
+The NIfTI images are as follows:
+
+1.  BRAIN\_1\_Anonymized.nii.gz - CT image that has been
+    de-identified/anonymized where the face and the ears have been
+    removed from the image. These should have a range of -1024 to 3071
+    and be in Hounsfield Units (HU).
+2.  BRAIN\_1\_Anonymized\_Mask.nii.gz - a binary image denoting the
+    brain areas using the method described in Muschelli et al. (2015).
+    This method is also implemented in the function `CT_Skull_Strip`
+    function in the `ichseg` R package
+    (<https://github.com/muschellij2/ichseg>).
+3.  Manual\_Mask\_1\_Reader\_1.nii.gz - binary brain mask from Reader 1
+4.  Manual\_Mask\_1\_Reader\_2.nii.gz - binary brain mask from Reader 2
+
+Reader 1 and reader 2 are consistent across subjects (e.g. reader 1 is
+always reader 1).
+
+### Header information
+
+In the tarballs, there are also `CSV` (Comma-separated) files, such as:
+
+  - BRAIN\_1\_header.csv
+  - Manual\_Mask\_1\_Reader\_1\_header.csv
+  - Manual\_Mask\_1\_Reader\_2\_header.csv
+
+which contain information from the DICOM header. Mostly the only files
+of interest are those with `BRAIN/BONE` in the name as the Manual mask
+images are derived from those images. These are included for
+completeness.
+
+Specific subsets of the DICOM data/tags are given with the data:
+0008-0070-MANUFACTURER 0018-0050-SLICETHICKNESS 0018-0060-KVP
+0018-1120-GANTRYDETECTORTILT 0018-1151-XRAYTUBECURRENT
+0018-1152-EXPOSURE 0018-1160-FILTERTYPE 0018-1170-GENERATORPOWER
+0018-1210-CONVOLUTIONKERNEL 0018-5100-PATIENTPOSITION
+0020-0032-IMAGEPOSITIONPATIENT 0020-1041-SLICELOCATION
+0028-0030-PIXELSPACING 0008-0070-Manufacturer 0018-0050-SliceThickness
+0020-0032-ImagePositionPatient 0020-1041-SliceLocation
+0028-0030-PixelSpacing
+
+### Different files in some tarballs
+
+Subject 13 has a `BONE` scan as well as a `BRAIN` scan. The `BONE` scan
+is a non-contrast CT done with a different convolution kernel, which
+gives different properties of the scan. The manual segmentations are in
+the same space as both scans (i.e. no registration was done).
+
+Subjects 06 and 16 have both a `BRAIN_1` and `BRAIN_2` scan. These were
+images taken at 2 different time points. If the manual reader performed
+segmentation on the scan, then a file such as
+`Manual_Mask_2_Reader_1.nii.gz` would indicate the index for the scan
+(e.g. `Manual_Mask_2` corresponds to `BRAIN_2`).
+
+## Demographics
+
+Demographic information is `id` column – which maps back to our original
+de-identified ID, which has no information about patient other that site
+and enrollment number, but the mapping is not provided or released.
+Additional information such as age in years (range 40 – 78) rounded to
+nearest year, sex, race (as measured in the trial), Hispanic indicator,
+site ID (again maps back to the original data but no mapping is
+provided) and diagnosis (`dx`). the diagnosis was made from the reading
+center, indicating intracerebral hemorrhage (ICH) or ICH with
+intraventricular hemorrhage (IVH), indicated as ICH with IVH in the `dx`
+column.
+
+The site ID is to allow researchers to see if there is a small site
+effect (but not likely possible due to how few subjects).
+
+## The de-identification mechanism
+
+A mask was determined outlined on a template of a mask and the ears of
+the image. The face and ear masking of the data was done by registering
+a template image to the individual images separately. The method is
+similar to `pydeface` (<https://github.com/poldracklab/pydeface>) and
+was implemented in the `fslr` (<https://github.com/muschellij2/fslr>)
+package. The template data was adapted from `pydeface` and the template
+used for registration was
+<https://github.com/muschellij2/pydeface/raw/master/pydeface/data/mean_reg2mean.nii.gz>
+and the template mask used was
+<https://github.com/muschellij2/pydeface/raw/master/pydeface/data/facemask_no_ears.nii.gz>.
+
+## References
+
+Muschelli, John, et al. “Validated automatic brain extraction of head CT
+images.” Neuroimage 114 (2015): 379-385.
+<https://doi.org/10.1016/j.neuroimage.2015.03.074>