Diff of /README.md [000000] .. [2f6a7a]

Switch to unified view

a b/README.md
1
# Automated Detection and Classification of Nodules in Lung CT scans
2
## Description
3
Lung cancer is the second most common cancer in both men and women that afflicts 225,500 people a year in the United States. Nearly 1 out of 4 cancer deaths are from lung cancer, more than colon, breast, and prostate cancers combined. Early detection of the cancer can allow for early treatment which significantly increases the chances of survival. 
4
This project creates an algorithm that automatically detects candidate nodules and predicts the probability that the lung will be diagnosed with cancer within 1 year of the CT scans.
5
The algorithm is summarized by the following framework:
6
![Lung nodule detection and classification](https://github.com/mikejhuang/LungNoduleDetectionClassification/blob/master/framework.gif?raw=true)
7
8
## Installation
9
### Required packages
10
* anaconda3
11
* Python 3.4
12
* Tensorflow
13
* Keras
14
* dicom, `$sudo pip install dicom`
15
* cell_magic_wand.py, included and is required to be in place of root directory with the notebooks https://github.com/NoahApthorpe/CellMagicWand
16
* h5py `$sudo pip install h5py`
17
18
### Required Data
19
**LIDC-IDRI dataset**
20
https://wiki.cancerimagingarchive.net/display/Public/LIDC-IDRI
21
* Images (DICOM, 124GB)
22
* DICOM Metadata Digest (CSV)
23
* Nodule Size List http://www.via.cornell.edu/lidc/list3.2.csv
24
25
**Kaggle Data Science Bowl 2017 Dataset** https://www.kaggle.com/c/data-science-bowl-2017/data
26
* stage1.7z (DICOM, 67GB)
27
* stage1_labels.csv.zip
28
* stage1_solution.csv.zip
29
* data_password.txt.zip
30
31
## The pipeline
32
1.  **1ProcessNoduleDataset.ipynb**
33
    * **Inputs:** LIDC dataset (DOI folder), list3_2.csv, LIDC-IDRI_MetaData.csv
34
    * **Outputs:** noduleimages.npy, nodulemasks.npy
35
2.  **2TrainUnet.ipynb**
36
    * **Inputs:** noduleimages.npy, nodulemasks.npy
37
    * **Outputs:** unet-weights-improvement.hdf5
38
3.  **3ClassifyNodulesLIDC.ipynb**
39
    * **Inputs:** LIDC dataset (DOI folder), list3_2.csv, LIDC-IDRI_MetaData.csv, unet-weights-improvement.hdf5
40
    * **Outputs:** truenodule-cnn-weights-improvement.hdf5
41
4.  **4DetectNodules.ipynb**
42
    * **Inputs:** unet-weights-improvement.hdf5, truenodule-cnn-weights-improvement.hdf5, Kaggle DSB2017 dataset (stage1 folder)
43
    * **Outputs:** DSBNoduleImages\*.npy, DSBNoduleMasks\*.npy, DSBPatientNoduleIndex\*.csv 
44
5.  **5CancerPredictionClassifiers.ipynb**
45
    * **Inputs:** DSBPatientNoduleIndex*.csv
46
6.  **6CancerPredictionCNN.ipynb**
47
    * **Inputs:** DSBNoduleImages*.npy, DSBNoduleMasks*.npy, DSBPatientNoduleIndex*.csv
48
49
\*Split into a series of files due to large memory requirements
50
51
52
## About
53
54
Mike Huang, huangjmike@gmail.com