The DataScienceBowl covered the whole process of diagnosing lung cancer and I am to make the individual steps more clear. After segmenting lungs and identifying suspicious nodes, it is important to classify them as malignant or benign.
This dataset consists of several thousand examples formatted in multipage TIFF (for use with tools like ImageJ and KNIME) and HDF5 (for Python and R).
The data were preprocessed and extracted partially from the LUNA16 competition (https://luna16.grand-challenge.org/description/) and should be used with the same policy that data has.
The dataset is more for practice with medical images and CNN's but it would be interesting to see how the best manually created features (HoG, SIFT, …) perform against various Deep Learning approaches. It would also be quite interesting to try and visualize exactly which parts of an image made the algorithm guess malignant or benign.