Card

About Dataset

This dataset is a preprocessed version of the original UniToChest dataset. The UniToChest dataset, created by Chaudhry et al. and made available through their research work, serves as the foundation for the following preprocessing techniques applied here:

  1. Hounsfield Units: The raw CT scan values have been converted into Hounsfield Units (HU).

  2. Windowing: The dataset includes images with windowing applied, a technique commonly used to enhance specific ranges of Hounsfield Units, thereby improving the visualization of certain tissues.

  3. Lung Segmentation: This preprocessing step isolates the lung regions within the CT scans, using U-Net R231 and thresholding, allowing for focused analysis of lung tissue.

  4. CLAHE (Contrast Limited Adaptive Histogram Equalization): CLAHE has been applied to improve the contrast of the images, particularly in areas with low contrast.

Nodule Subsets

The dataset includes two specific subsets based on the size of the lung nodules:

  • Large Nodules: Nodules greater than 10 mm in diameter.
  • Small Nodules: Nodules smaller than 10 mm in diameter.

These subsets allow for targeted analysis of different nodule sizes, which can be critical for research focused on early detection and characterization of lung conditions.

File Naming Convention

The files in this dataset follow a naming convention designed for compatibility with the nn-UNet framework. Each file name is structured as follows:

  • Images: datasetname_patient_study_slice_modality.png
  • Masks: datasetname_patient_study_slice.png

References

[1] Daniele Perlo, «UniToChest». Zenodo, dic. 22, 2021. doi: 10.5281/zenodo.5797912.
[2] H. A. H. Chaudhry et al., «UniToChest: A Lung Image Dataset for Segmentation of Cancerous Nodules on CT Scans», en Image Analysis and Processing – ICIAP 2022, S. Sclaroff, C. Distante, M. Leo, G. M. Farinella, y F. Tombari, Eds., Cham: Springer International Publishing, 2022, pp. 185-196. doi: 10.1007/978-3-031-06427-2_16.
[3] J. Hofmanninger, F. Prayer, J. Pan, et al., "Automatic lung segmentation in routine imaging is primarily a data diversity problem, not a methodology problem," Eur Radiol Exp, vol. 4, p. 50, 2020. doi: 10.1186/s41747-020-00173-2.