Switch to unified view

a b/docs-source/source/quickstart.rst
1
Quickstart
2
==========
3
4
This section provides an example of using Slideflow to build a deep learning classifier from digital pathology slides. Follow the links in each section for more information.
5
6
Preparing a project
7
*******************
8
9
Slideflow experiments are organized using :class:`slideflow.Project`, which supervises storage of data, saved models, and results. The ``slideflow.project`` module has three preconfigured projects with associated slides and clinical annotations: ``LungAdenoSquam``, ``ThyroidBRS``, and ``BreastER``.
10
11
For this example, we will the ``LungAdenoSquam`` project to train a classifier to predict lung adenocarcinoma (Adeno) vs. squamous cell carcinoma (Squam).
12
13
.. code-block:: python
14
15
    import slideflow as sf
16
17
    # Download preconfigured project, with slides and annotations.
18
    project = sf.create_project(
19
        root='data',
20
        cfg=sf.project.LungAdenoSquam(),
21
        download=True
22
    )
23
24
Read more about :ref:`setting up a project on your own data <project_setup>`.
25
26
Data preparation
27
****************
28
29
The core imaging data used in Slideflow are image tiles :ref:`extracted from slides <filtering>` at a specific magnification and pixel resolution. Tile extraction and downstream image processing is handled through the primitive :ref:`slideflow.Dataset <datasets_and_validation>`. We can request a ``Dataset`` at a given tile size from our project using :meth:`slideflow.Project.dataset`. Tile magnification can be specified in microns (as an ``int``) or as optical magnification (e.g. ``'40x'``).
30
31
.. code-block:: python
32
33
    # Prepare a dataset of image tiles.
34
    dataset = project.dataset(
35
        tile_px=299,   # Tile size, in pixels.
36
        tile_um='10x'  # Tile size, in microns or magnification.
37
    )
38
    dataset.summary()
39
40
.. rst-class:: sphx-glr-script-out
41
42
 .. code-block:: none
43
44
    Overview:
45
    ╒===============================================╕
46
    │ Configuration file: │ /mnt/data/datasets.json │
47
    │ Tile size (px):     │ 299                     │
48
    │ Tile size (um):     │ 10x                     │
49
    │ Slides:             │ 941                     │
50
    │ Patients:           │ 941                     │
51
    │ Slides with ROIs:   │ 941                     │
52
    │ Patients with ROIs: │ 941                     │
53
    ╘===============================================╛
54
55
    Filters:
56
    ╒====================╕
57
    │ Filters:      │ {} │
58
    ├--------------------┤
59
    │ Filter Blank: │ [] │
60
    ├--------------------┤
61
    │ Min Tiles:    │ 0  │
62
    ╘====================╛
63
64
    Sources:
65
66
    TCGA_LUNG
67
    ╒==============================================╕
68
    │ slides    │ /mnt/raid/SLIDES/TCGA_LUNG       │
69
    │ roi       │ /mnt/raid/SLIDES/TCGA_LUNG       │
70
    │ tiles     │ /mnt/rocket/tiles/TCGA_LUNG      │
71
    │ tfrecords │ /mnt/rocket/tfrecords/TCGA_LUNG/ │
72
    │ label     │ 299px_10x                        │
73
    ╘==============================================╛
74
75
    Number of tiles in TFRecords: 0
76
    Annotation columns:
77
    Index(['patient', 'subtype', 'site', 'slide'],
78
        dtype='object')
79
80
Tile extraction
81
---------------
82
83
We prepare imaging data for training by extracting tiles from slides. Background areas of slides will be filtered out with Otsu's thresholding.
84
85
.. code-block:: python
86
87
    # Extract tiles from all slides in the dataset.
88
    dataset.extract_tiles(qc='otsu')
89
90
Read more about tile extraction and :ref:`slide processing in Slideflow <filtering>`.
91
92
Held-out test sets
93
------------------
94
95
Now that we have our dataset and we've completed the initial tile image processing, we'll split the dataset into a training cohort and a held-out test cohort with :meth:`slideflow.Dataset.split`. We'll split while balancing the outcome ``'subtype'`` equally in the training and test dataset, with 30% of the data retained in the held-out set.
96
97
.. code-block:: python
98
99
    # Split our dataset into a training and held-out test set.
100
    train_dataset, test_dataset = dataset.split(
101
        model_type='classification',
102
        labels='subtype',
103
        val_fraction=0.3
104
    )
105
106
Read more about :ref:`Dataset management <datasets_and_validation>`.
107
108
Configuring models
109
******************
110
111
Neural network models are prepared for training with :class:`slideflow.ModelParams`, through which we define the model architecture, loss, and hyperparameters. Dozens of architectures are available in both the Tensorflow and PyTorch backends, and both neural network :ref:`architectures <tutorial3>` and :ref:`loss <custom_loss>` functions can be customized. In this example, we will use the included Xception network.
112
113
.. code-block:: python
114
115
    # Prepare a model and hyperparameters.
116
    params = sf.ModelParams(
117
        tile_px=299,
118
        tile_um='10x',
119
        model='xception',
120
        batch_size=64,
121
        learning_rate=0.0001
122
    )
123
124
Read more about :ref:`hyperparameter optimization in Slideflow <training>`.
125
126
Training a model
127
****************
128
129
Models can be trained from these hyperparameter configurations using :meth:`Project.train`. Models can be trained to categorical, multi-categorical, continuous, or time-series outcomes, and the training process is :ref:`highly configurable <training>`. In this case, we are training a binary categorization model to predict the outcome ``'subtype'``, and we will distribute training across multiple GPUs.
130
131
By default, Slideflow will train/validate on the full dataset using k-fold cross-validation, but validation settings :ref:`can be customized <validation_planning>`. If you would like to restrict training to only a subset of your data - for example, to leave a held-out test set untouched - you can manually specify a dataset for training. In this case, we will train on ``train_dataset``, and allow Slideflow to further split this into training and validation using three-fold cross-validation.
132
133
.. code-block:: python
134
135
    # Train a model from a set of hyperparameters.
136
    results = P.train(
137
        'subtype',
138
        dataset=train_dataset,
139
        params=params,
140
        val_strategy='k-fold',
141
        val_k_fold=3,
142
        multi_gpu=True,
143
    )
144
145
Models and training results will be saved in the project ``models/`` folder.
146
147
Read more about :ref:`training a model <training>`.
148
149
Evaluating a trained model
150
**************************
151
152
After training, you can test model performance on a held-out test dataset with :meth:`Project.evaluate`, or generate predictions without evaluation (when ground-truth labels are not available) with :meth:`Project.predict`. As with :meth:`Project.train`, we can specify a :class:`slideflow.Dataset` to evaluate.
153
154
.. code-block:: python
155
156
    # Train a model from a set of hyperparameters.
157
    test_results = P.evaluate(
158
        model='/path/to/trained_model_epoch1'
159
        outcomes='subtype',
160
        dataset=test_dataset
161
    )
162
163
Read more about :ref:`model evaluation <evaluation>`.
164
165
Post-hoc analysis
166
*****************
167
168
Slideflow includes a number of analytical tools for working with trained models. Read more about :ref:`heatmaps <evaluation>`, :ref:`model explainability <stylegan>`, :ref:`analysis of layer activations <activations>`, and real-time inference in an interactive :ref:`whole-slide image reader <studio>`.