Card

About Dataset

The dataset was inspired by O. Gozes, H. Greenspan, Bone structures extraction and enchantment in chest radiographs via CNN trained on synthetic data, 2020.

The main idea is to convert 3D chest CT scans to a 2D chest X-rays, while extracting some additional information (i.e. bone structure) from the 3D image.

5 CT scans datasets from https://www.cancerimagingarchive.net/collections/ were gathered:

  • LIDC-IDRI
  • CPTAC-LSCC
  • CPTAC-LUAD
  • TCGA-LUSC
  • TCGA-LUAD

CT scans with slice thickness of 1.25 or lower were selected. The threshold was set to ensure higher quality of the DRR images in the dataset.

Each CT scan was digitally reconstructed onto 2D plane with proposed method in the paper - same parameters were used. The bone layer of the CT scan was extracted by using HU values from [300,700] range - the bone layer itself was also digitally reconstructed with the same method.

Moreover, manual inspection was done to exclude bad quality DRR images. Bad quality was defined as:

  • Noise artifacts of a CT device - mostly 'white stripe' appearance in upper and lower parts of DRR;
  • CT was done with contrast - heart and vascular structures were visible in the selected HU range;
  • Prominent features of not bone related material;

In the end, roughly 10% of CT scans downloaded met the criteria described above.

Each image was resized and saved in 512x512 format.