Diff of /README.md [000000] .. [cec8b4]

Switch to unified view

a b/README.md
1
<p align="center"><img src="https://raw.githubusercontent.com/jamesdolezal/biscuit/master/images/banner_v2.png" width="800px" alt="Main banner"/></p>
2
3
# Uncertainty-Informed Deep Learning Models Enable High-Confidence Predictions for Digital Histopathology
4
5
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7117683.svg)](https://doi.org/10.5281/zenodo.7117683)
6
7
[Journal](https://www.nature.com/articles/s41467-022-34025-x) | [ArXiv](https://arxiv.org/abs/2204.04516)
8
9
_**What does BISCUIT do?** Bayesian Inference of Slide-level Confidence via Uncertainty Index Thresholding (BISCUIT) is an uncertainty quantification and thresholding schema used to separate deep learning classification predictions on whole-slide images (WSIs) into low- and high-confidence. Uncertainty is estimated through dropout, which approximates sampling of the Bayesian posterior, and thresholds are determined on training data to mitigate data leakage during testing._
10
11
## Requirements
12
- Python >= 3.7
13
- [Tensorflow](https://tensorflow.org) >= 2.7.0 (and associated pre-requisites)
14
- [Slideflow](https://github.com/jamesdolezal/slideflow) >= 1.1.0 (and associated pre-requisites)
15
- Whole-slide images for training and validation
16
17
Please refer to our [Installation instructions](https://slideflow.dev/installation) for a guide to installing Slideflow and its prerequisites.
18
19
## Pretrained model
20
The final uncertainty-enabled model, trained on the full TCGA dataset to predict lung adenocarcinoma vs. squamous cell carcinoma, is available on [Hugging Face](https://huggingface.co/jamesdolezal/lung-adeno-squam-v1).
21
22
## Summary
23
This README contains instructions for the following:
24
1. [Reproducing experimental results](#reproducing-experimental-results)
25
2. [Custom projects](#custom-projects-full-experiment)
26
3. [UQ thresholding algorithm](#uq-thresholding-algorithm-direct-use)
27
28
# Reproducing Experimental Results
29
30
## Data preparation
31
The first step to reproducing results described in our manuscript is downloading whole-slide images (\*.svs files) from [The Cancer Genome Atlas (TCGA) data portal](https://portal.gdc.cancer.gov/), projects TCGA-LUAD and TCGA-LUSC, and slides from the [Clinical Proteomics Tumor Analysis Consortium (CPTAC)](https://proteomics.cancer.gov/data-portal) data portal, projects CPTAC-LUAD and CPTAC-LSCC.
32
33
We use Slideflow for deep learning model training, which organizes data and annotations into [Projects](https://slideflow.dev/project_setup). The provided `configure.py` script automatically sets up the TCGA training and CPTAC evaluation projects, using specified paths to the training slides (TCGA) and evaluation slides (CPTAC). This step will also segment the whole-slide images into individual tiles, storing them as `*.tfrecords` for later use.
34
35
```
36
python3 configure.py --train_slides=/path/to/TCGA --val_slides=/path/to/CPTAC
37
```
38
39
Pathologist-annotated regions of interest (ROI) can optionally be used for the training dataset, as described in the [Slideflow documentation](https://slideflow.dev/overview). To use ROIs, specify the path to the ROI CSV files with the `--roi` argument.
40
41
## GAN Training
42
The next step is training the class-conditional GAN (StyleGAN2) used for generating GAN-Intermediate images. Clone the [StyleGAN2-slideflow](https://github.com/jamesdolezal/stylegan2-slideflow) repository, which has been modified to interface with the `*.tfrecords` storage format Slideflow uses. The GAN will be trained on 512 x 512 pixels images at 400 x 400 micron magnification. Synthetic images will be resized down to the target project size of 299 x 299 pixels and 302 x 302 microns during generation.
43
44
Use the `train.py` script **in the StyleGAN2 repository** to train the GAN. Pass the `gan_config.json` file that the `configure.py` script generated earlier to the `--slideflow` flag.
45
46
```
47
python3 train.py --outdir=/path/ --slideflow=/path/to/gan_cofig.json --mirror=1 --cond=1 --augpipe=bgcfnc --metrics=none
48
```
49
50
## Generating GAN images
51
To create GAN-Intermediate images with latent space embedding interpolation, use the `generate_tfrecords.py` script **in the StyleGAN2-slideflow** repository. Flags that will be relevant include:
52
53
- `--network`: Path to network PKL file (saved GAN model)
54
- `--tiles`: Number of tiles per tfrecord to generate (manuscript uses 1000)
55
- `--tfrecords`: Number of tfrecords to generate
56
- `--embed`: Generate intermediate images with class embedding interpolation.
57
- `--name`: Name format for tfrecords.
58
- `--class`: Class index, if not using embedding interpolation.
59
- `--outdir`: Directory in which to save tfrecords.
60
61
For example, to create tfrecords containing synthetic images of class 0 (LUAD / adenocarcinoma):
62
63
```
64
python3 generate_tfrecords.py --network=/path/network.pkl --tiles=1000 --tfrecords=10 --name=gan_luad --class=0 --outdir=gan/
65
```
66
67
To create embedding-interpolated intermediate images:
68
69
```
70
python3 generate_tfrecords.py --network=/path/network.pkl --tiles=1000 --tfrecords=10 --name=gan --embed=1 --outdir=gan/
71
```
72
73
Subsequent steps will assume that the GAN tfrecords are in the folder `gan/`.
74
75
76
## Cross-validation & evaluation
77
Next, models are trained with `train.py`. Experiments are organized by dataset size, each with a corresponding label. The experimental labels for this project are:
78
79
| ID | n_slides |
80
|----|----------|
81
| AA | full     |
82
| U  | 800      |
83
| T  | 700      |
84
| S  | 600      |
85
| R  | 500      |
86
| A  | 400      |
87
| L  | 350      |
88
| M  | 300      |
89
| N  | 250      |
90
| D  | 200      |
91
| O  | 176      |
92
| P  | 150      |
93
| Q  | 126      |
94
| G  | 100      |
95
| V  | 90       |
96
| W  | 80       |
97
| X  | 70       |
98
| Y  | 60       |
99
| Z  | 50       |
100
| ZA | 40       |
101
| ZB | 30       |
102
| ZC | 20       |
103
| ZD | 10       |
104
105
Experiments are performed in 6 steps for each dataset size:
106
107
1. Train cross-validation (CV) models for up to 10 epochs.
108
2. Train CV models at the optimal epoch (1).
109
3. Train UQ models in CV, saving predictions and uncertainty.
110
4. Train nested-UQ models, saving predictions, for uncertainty threshold determination.
111
5. Train models at the full dataset size without validation.
112
6. Perform external evaluation of fully-trained models.
113
114
We perform three types of experiments:
115
116
- `reg`: Regular experiments with balanced outcomes (LUAD:LUSC).
117
- `ratio`: Experiments testing varying degrees of class imbalance.
118
- `gan`: Cross-validation experiments using varying degrees of GAN slides in training/validation sets.
119
120
Specify which category of experiment should be run by setting its flag to `True`. Specify the steps to run using the `--steps` flag. For example, to run steps 2-6 for the ratio experiments, do:
121
122
```
123
python3 train.py --steps=2-6 --ratio=True
124
```
125
126
## Viewing results
127
Once all models have finished training (the published experiment included results from approximately 1000 models, so this may take a while), results can be viewed with the `results.py` script. The same experimental category flags, `--reg`, `--ratio`, and `--gan`, are used to determine which results should be viewed. There are two additional categories of results that can be displayed:
128
129
- `--heatmap`: Generate the heatmap shown in Figure 4.
130
- `--umaps`: Generates UMAPs shown in Figure 5.
131
132
Figures and output will then be saved in the `results/` folder. For example:
133
134
```
135
python3 results.py --ratio=True --umaps=True
136
```
137
138
# Custom projects: full experiment
139
You can also use BISCUIT to supervise custom experiments, including training, evaluation, and UQ thresholding.
140
141
## Setting up a project
142
Start by creating a new project, following the [Project Setup](https://slideflow.dev/project_setup) instructions in the Slideflow documentation. Briefly, projects are initialized by creating an instance of the `slideflow.Project` class and require a pre-configured set of patient-level annotations in CSV format:
143
144
```python
145
import slideflow as sf
146
147
project = sf.Project(
148
    '/project/path',
149
    annotations='/patient/annotations.csv'
150
)
151
```
152
153
Once the project is configured, add a new dataset source with paths to whole-slide images, optional tumor Regions of Interest (ROI) files, and destination paths for extracted tiles/tfrecords:
154
155
```python
156
project.add_source(
157
    name="TCGA_LUNG",
158
    slides="/path/to/slides",
159
    roi="/path/to/ROI",
160
    tiles="/tiles/destination",
161
    tfrecords="/tfrecords/destination"
162
)
163
```
164
165
This step should automatically attempt to associate slide names with the patient identifiers in your annotations CSV file. After this step, double check that your annotations file has a `"slide"` column for each annotation entry corresponding to the filename (without extension) of the corresponding slide. You should also ensure that the outcome labels you will be training to are correctly represented in this file.
166
167
## Extract tiles from slides
168
The next step is to [extract tiles](https://slideflow.dev/slide_processing) from whole-slide images, using the `sf.Project.extract_tiles()` function. This will save image tiles in the binary `*.tfrecord` format in the destination folder you previously configured.
169
170
```python
171
project.extract_tiles(
172
    tile_px=299,  # Tile size in pixels
173
    tile_um=302   # Tile size in microns
174
)
175
```
176
177
A PDF report summarizing the tile extraction phase will be saved in the TFRecords directory.
178
179
## Train models in cross-validation
180
Next, set up a BISCUIT Experiment, configuring your class labels and training project.
181
182
```python
183
from biscuit import Experiment
184
185
experiment = Experiment(
186
    train_project=project, # Slideflow training project
187
    outcome="some_header", # Annotations header with labels
188
    outcome1="class1",     # First class
189
    outcome2="class2"      # Second class
190
)
191
```
192
193
Next, train models in cross-validation using uncertainty quantification (UQ), which estimates uncertainty via dropout. Model hyperparameters can be manually configured with `sf.model.ModelParams`. Alternatively, the hyperparameters we used in the above manuscript can be accessed via `biscuit.hp.nature2022`. The `uq` parameter should be set to `True` to enable UQ.
194
195
```python
196
import biscuit
197
198
# Set the hyperparameters
199
hp = biscuit.hp.nature2022
200
hp.uq = True
201
202
# Train in cross-validation
203
experiment.train(
204
    hp=hp,                  # Hyperparameters
205
    label="EXPERIMENT"      # Experiment label/ID
206
    save_predictions='csv'  # Save predictions in CSV format
207
)
208
```
209
210
## Train nested cross-validation models for UQ thresholds
211
After the outer cross-validation models have been trained, the inner cross-validation models are trained so that optimal UQ thresholds can be found. Initialize the nested cross-validation training with the following:
212
213
```python
214
experiment.train_nested_cv(hp=hp, label="EXPERIMENT")
215
```
216
217
The experimental results for each cross-fold can either be manually viewed by opening `results_log.csv` in each model directory, or with the following functions:
218
219
```python
220
cv_models = biscuit.find_cv(
221
    project=project,
222
    label="EXPERIMENT",
223
    outcome="some_header"
224
)
225
# Print patient-level AUROC for each model
226
for m in cv_models:
227
    results = biscuit.get_model_results(
228
        m,
229
        outcome="some_header",
230
        epoch=1
231
    print(m, results['pt_auc'])
232
```
233
234
## Calculate UQ thresholds and show results
235
Finally, UQ thresholds are determined from the previously trained nested cross-validation models. Use `Experiment.thresholds_from_nested_cv()` to calculate optimal thresholds, and then apply these thresholds to the outer cross-validation data, rendering high-confidence predictions.
236
237
```python
238
df, thresh = experiment.thresholds_from_nested_cv(
239
    label="EXPERIMENT"
240
)
241
```
242
243
`thresh` will be a dictionary of tile- and slide-level UQ thresholds, and the slide-level prediction threshold. `df` is a pandas DataFrame containing the thresholded, high-confidence UQ predictions from outer cross-validation.
244
245
```python
246
>>> print(df)
247
     id  n_slides  fold       uq  patient_auc  patient_uq_perc  slide_auc  slide_uq_perc
248
0  TEST     359.0   1.0  include     0.974119         0.909091   0.974119       0.909091
249
1  TEST     359.0   2.0  include     0.972060         0.840336   0.972060       0.840336
250
2  TEST     359.0   3.0  include     0.901786         0.873950   0.901786       0.873950
251
>>> print(thresh)
252
{'tile_uq': 0.008116906, 'slide_uq': 0.0023400568179163194, 'slide_pred': 0.17693227693333335}
253
```
254
255
## Visualize uncertainty calibration
256
Plots can be generated showing the relationship between predictions and uncertainty, as shown in Figure 3 of the manuscript. The `biscuit.plot_uq_calibration()` function will generate these plots, which can then be shown using `plt.show()`:
257
258
```python
259
import matplotlib.pyplot as plt
260
261
experiment.plot_uq_calibration(
262
    label="EXPERIMENT",
263
    **thresh  # Pass the thresholds from the prior step
264
)
265
plt.show()
266
```
267
268
![calibration](images/example_calibration.png)
269
270
## Full example
271
For reference, the full script to accomplish the above custom UQ experiment would look like:
272
273
```python
274
import matplotlib.pyplot as plt
275
import slideflow as sf
276
import biscuit
277
from biscuit import Experiment
278
279
# Set up a project
280
project = sf.Project(
281
    '/project/path',
282
    annotations='/patient/annotations.csv'
283
)
284
project.add_source(
285
    name="TCGA_LUNG",
286
    slides="/path/to/slides",
287
    roi="/path/to/ROI",
288
    tiles="/tiles/destination",
289
    tfrecords="/tfrecords/destination"
290
)
291
292
# Extract tiles from slides into TFRecords
293
project.extract_tiles(
294
    tile_px=299,  # Tile size in pixels
295
    tile_um=302   # Tile size in microns
296
)
297
298
# Set up the experiment
299
experiment = Experiment(
300
    train_project=project, # Slideflow training project
301
    outcome="some_header", # Annotations header with labels
302
    outcome1="class1",     # First class
303
    outcome2="class2"      # Second class
304
)
305
306
# Train cross-validation (CV) UQ models
307
hp = biscuit.hp.nature2022
308
hp.uq = True
309
experiment.train(
310
    hp=hp,                  # Hyperparameters
311
    label="EXPERIMENT"      # Experiment label/ID
312
    save_predictions='csv'  # Save predictions in CSV format
313
)
314
315
# Train the nested CV models (for thresholds)
316
experiment.train_nested_cv(hp=hp, label="EXPERIMENT")
317
318
# Show the non-thresholded model results
319
cv_models = biscuit.find_cv(
320
    project=project,
321
    label="EXPERIMENT",
322
    outcome="some_header"
323
)
324
for m in cv_models:
325
    results = biscuit.get_model_results(
326
        m,
327
        outcome="some_header",
328
        epoch=1)
329
    print(m, results['pt_auc'])  # Prints patient-level AUC for each model
330
331
# Calculate thresholds from the nested CV models
332
df, thresh = experiment.thresholds_from_nested_cv(
333
    label="EXPERIMENT"
334
)
335
336
# Plot predictions vs. uncertainty
337
experiment.plot_uq_calibration(
338
    label="EXPERIMENT",
339
    **thresh  # Pass the thresholds from the prior step
340
)
341
plt.show()
342
```
343
# UQ thresholding algorithm: direct use
344
Alternatively, you can use the uncertainty thresholding algorithm directly on existing data, outside the context of a Slideflow project (e.g. data generated with another framework). You will need tile-level predictions from a collection of models, such as from nested cross-validation, to calculate the thresholds. The thresholds are then applied to a set of tile-level predictions from a different model. Organize predictions from each model into separate DataFrames, each with the columns:
345
346
- **y_pred**: Tile-level predictions.
347
- **y_true**: Tile-level ground-truth labels.
348
- **uncertainty**: Tile-level uncertainty.
349
- **slide**: Slide labels.
350
- **patient**: Patient labels (*optional*).
351
352
```python
353
>>> dfs = [pd.DataFrame(...), ...]
354
>>> target_df = pd.DataFrame(...)
355
```
356
357
Calculate UQ thresholds from your cross-validation predictions with ``biscuit.threshold.from_cv()``. This will return a dictionary with tile- and slide-level UQ and prediction thresholds.
358
359
```python
360
>>> from biscuit import threshold
361
>>> from pprint import pprint
362
>>> thresholds = threshold.from_cv(dfs)
363
>>> pprint(thresholds)
364
{'tile_uq': 0.02726791,
365
 'slide_uq': 0.0147878695,
366
 'tile_pred': 0.41621968,
367
 'slide_pred': 0.4756707}
368
```
369
Then, apply these thresholds to your target dataframe with ``biscuit.threshold.apply()``. This will return a dictionary with slide- (or patient-) level prediction metrics, and a dataframe of the slide- (or patient-) level predictions. You can specify slide- or patient-level predictions by passing ``level`` (defaults to ``'slide'``):
370
371
```python
372
>>> metrics, thresh_df = threshold.apply(
373
...     df,
374
...     **thresholds,
375
...     level='slide')
376
>>> pprint(metrics)
377
{'auc': 0.9703296703296704,
378
 'percent_incl': 0.907051282051282,
379
 'acc': 0.9222614840989399,
380
 'sensitivity': 0.9230769230769231,
381
 'specificity': 0.9214285714285714}
382
>>> pprint(thresh_df.columns)
383
Index(['slide', 'error', 'uncertainty', 'correct', 'incorrect', 'y_true',
384
       'y_pred', 'y_pred_bin'],
385
      dtype='object')
386
```
387
388
# Reference
389
If you find our work useful for your research, or if you use parts of this code, please consider citing as follows:
390
391
Dolezal, J.M., Srisuwananukorn, A., Karpeyev, D. _et al_. Uncertainty-informed deep learning models enable high-confidence predictions for digital histopathology. _Nat Commun_ 13, 6572 (2022). https://doi.org/10.1038/s41467-022-34025-x
392
393
```
394
@ARTICLE{Dolezal2022-qa,
395
  title    = "Uncertainty-informed deep learning models enable high-confidence
396
              predictions for digital histopathology",
397
  author   = "Dolezal, James M and Srisuwananukorn, Andrew and Karpeyev, Dmitry
398
              and Ramesh, Siddhi and Kochanny, Sara and Cody, Brittany and
399
              Mansfield, Aaron S and Rakshit, Sagar and Bansal, Radhika and
400
              Bois, Melanie C and Bungum, Aaron O and Schulte, Jefree J and
401
              Vokes, Everett E and Garassino, Marina Chiara and Husain, Aliya N
402
              and Pearson, Alexander T",
403
  journal  = "Nature Communications",
404
  volume   =  13,
405
  number   =  1,
406
  pages    = "6572",
407
  month    =  nov,
408
  year     =  2022
409
}
410
```