Diff of /README.md [000000] .. [d9566e]

Switch to unified view

a b/README.md
1
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://github.com/pgmikhael/Sybil/blob/main/LICENSE.txt) ![version](https://img.shields.io/badge/version-1.2.0-success)
2
3
# Sybil
4
5
Lung Cancer Risk Prediction.
6
7
Additional documentation can be found on the [GitHub Wiki](https://github.com/reginabarzilaygroup/Sybil/wiki).
8
9
# Run a regression test
10
11
```shell
12
python tests/regression_test.py
13
```
14
15
This will download the`sybil_ensemble` model and sample data, and compare the results to what has previously been calculated.
16
17
18
# Run the model
19
20
You can load our pretrained model trained on the NLST dataset, and score a given DICOM serie as follows:
21
22
```python
23
from sybil import Serie, Sybil
24
25
# Load a trained model
26
model = Sybil("sybil_ensemble")
27
28
# Get risk scores
29
serie = Serie([dicom_path_1, dicom_path_2, ...])
30
scores = model.predict([serie])
31
32
# You can also evaluate by providing labels
33
serie = Serie([dicom_path_1, dicom_path_2, ...], label=1)
34
results = model.evaluate([serie])
35
```
36
37
Models available include: `sybil_1`, `sybil_2`, `sybil_3`, `sybil_4`, `sybil_5` and `sybil_ensemble`.
38
39
All model files are available on [GitHub releases](https://github.com/reginabarzilaygroup/Sybil/releases) as well as [here](https://drive.google.com/drive/folders/1nBp05VV9mf5CfEO6W5RY4ZpcpxmPDEeR?usp=sharing).
40
41
# Replicating results
42
43
You can replicate the results from our model using our training script:
44
45
```sh
46
python train.py
47
```
48
49
See our [documentation](docs/readme.md) for a full description of Sybil's training parameters. Additional information on the training process can be found on the [train](https://github.com/reginabarzilaygroup/Sybil/tree/train) branch of this repository.
50
51
52
# LDCT Orientation
53
54
The model expects the input to be an Axial LDCT, where the first frame is of the abdominal region and the last frame is along the clavicles.
55
56
When the input is of the `dicom` type, the frames will be automatically sorted. However, for `png` inputs, the path of the PNG files must be in the right anatomical order. 
57
58
59
# Annotations
60
61
To help train the model, two fellowship-trained thoracic radiologists jointly annotated suspicious lesions on NLST LDCTs using [MD.AI](https://md.ai) software for all participants who developed cancer within 1 year after an LDCT. Each lesion’s volume was marked with bounding boxes on contiguous thin-cut axial images. The “ground truth” annotations were informed by the imaging appearance and the clinical data provided by the NLST, i.e., the series and image number of cancerous nodules and the anatomical location of biopsy-confirmed lung cancers. For these participants, lesions in the location of subsequently diagnosed cancers were also annotated, even if the precursor lesion lacked imaging features specific for cancer. 
62
63
Annotations are availble to download in JSON format [here](https://drive.google.com/file/d/19aa5yIHPWu3NtjqvXDc8NYB2Ub9V-4WM/view?usp=share_link). The JSON file is structured as below, where `(x,y)` refers to the top left corner of the bounding box, and all values are normlized to the image size (512,512). 
64
65
```
66
{
67
  series1_id: {   # Series Instance UID
68
    image1_id: [  # SOP Instance UID / file name
69
      {"x": x_axis_value, "y": y_axis_value, "height": bounding_box_heigh, "width": bounding_box_width}, # bounding box 1
70
      {"x": x_axis_value, "y": y_axis_value, "height": bounding_box_heigh, "width": bounding_box_width}, # bounding box 2
71
      ...
72
      ],
73
    image2_id: [],
74
    ...
75
  }
76
  series2_id: {},
77
  ...
78
}
79
```
80
81
# Attention Scores
82
83
The multi-attention pooling layer aims to learn the importance of each slice in the 3D volume and the importance of each pixel in the 2D slice. During training, these are supervised by bounding boxes of the cancerous nodules. This is a soft attention mechanism, and the model's primary task is to predict the risk of lung cancer. However, the attention scores can be extracted and used to visualize the model's focus on the 3D volume and the 2D slices. 
84
85
To extract the attention scores, you can use the  `return_attentions` argument as follows:
86
87
```python
88
89
results = model.predict([serie], return_attentions=True)
90
91
attentions = results.attentions
92
93
```
94
95
The `attentions` will be a list of length equal to the number of series. Each series has a dictionary with the following keys:
96
97
- `image_attention_1`: attention scores (as logits) over the pixels in the 2D slice. This will be a list of length equal to the size of the model ensemble.
98
- `volume_attention_1`: attention scores (as logits) over each slice in the 3D volume. This will be a list of length equal to the size of the model ensemble.
99
100
To visualize the attention scores, you can use the following code. This will return a list of 2D images, where the attention scores are overlaid on the original images. If you provide a `save_directory`, the images will be saved as a GIF. If multiple series are provided, the function will return a list of lists, one for each series.
101
102
```python
103
104
from sybil import visualize_attentions
105
106
series_with_attention = visualize_attentions(
107
    series,
108
    attentions = attentions,
109
    save_directory = "path_to_save_directory",
110
    gain = 3, 
111
)
112
113
```
114
115
# Training Data
116
117
The Sybil model was trained using the National Lung Screening Trial (NLST) dataset:
118
119
National Lung Screening Trial Research Team. (2013). Data from the National Lung Screening Trial (NLST) [Data set]. The Cancer Imaging Archive. https://doi.org/10.7937/TCIA.HMQ8-J677
120
121
# Cite
122
123
```
124
@article{mikhael2023sybil,
125
  title={Sybil: a validated deep learning model to predict future lung cancer risk from a single low-dose chest computed tomography},
126
  author={Mikhael, Peter G and Wohlwend, Jeremy and Yala, Adam and Karstens, Ludvig and Xiang, Justin and Takigami, Angelo K and Bourgouin, Patrick P and Chan, PuiYee and Mrah, Sofiane and Amayri, Wael and Juan, Yu-Hsiang and Yang, Cheng-Ta and Wan, Yung-Liang and Lin, Gigin and Sequist, Lecia V and Fintelmann, Florian J. and Barzilay, Regina},
127
  journal={Journal of Clinical Oncology},
128
  pages={JCO--22},
129
  year={2023},
130
  publisher={Wolters Kluwer Health}
131
}
132
```