Card

About Dataset

Context

The DataScienceBowl covered the whole process of diagnosing lung cancer and I am to make the individual steps more clear. After segmenting lungs and identifying suspicious nodes, it is important to classify them as malignant or benign.

Content

This dataset consists of several thousand examples formatted in multipage TIFF (for use with tools like ImageJ and KNIME) and HDF5 (for Python and R).

Acknowledgements

The data were preprocessed and extracted partially from the LUNA16 competition (https://luna16.grand-challenge.org/description/) and should be used with the same policy that data has.

Inspiration

The dataset is more for practice with medical images and CNN's but it would be interesting to see how the best manually created features (HoG, SIFT, …) perform against various Deep Learning approaches. It would also be quite interesting to try and visualize exactly which parts of an image made the algorithm guess malignant or benign.