[78ef36]: / docs-source / source / quickstart.rst

Download this file

168 lines (120 with data), 7.3 kB

Quickstart

This section provides an example of using Slideflow to build a deep learning classifier from digital pathology slides. Follow the links in each section for more information.

Preparing a project

Slideflow experiments are organized using :class:`slideflow.Project`, which supervises storage of data, saved models, and results. The slideflow.project module has three preconfigured projects with associated slides and clinical annotations: LungAdenoSquam, ThyroidBRS, and BreastER.

For this example, we will the LungAdenoSquam project to train a classifier to predict lung adenocarcinoma (Adeno) vs. squamous cell carcinoma (Squam).

import slideflow as sf

# Download preconfigured project, with slides and annotations.
project = sf.create_project(
    root='data',
    cfg=sf.project.LungAdenoSquam(),
    download=True
)

Read more about :ref:`setting up a project on your own data <project_setup>`.

Data preparation

The core imaging data used in Slideflow are image tiles :ref:`extracted from slides <filtering>` at a specific magnification and pixel resolution. Tile extraction and downstream image processing is handled through the primitive :ref:`slideflow.Dataset <datasets_and_validation>`. We can request a Dataset at a given tile size from our project using :meth:`slideflow.Project.dataset`. Tile magnification can be specified in microns (as an int) or as optical magnification (e.g. '40x').

# Prepare a dataset of image tiles.
dataset = project.dataset(
    tile_px=299,   # Tile size, in pixels.
    tile_um='10x'  # Tile size, in microns or magnification.
)
dataset.summary()

Tile extraction

We prepare imaging data for training by extracting tiles from slides. Background areas of slides will be filtered out with Otsu's thresholding.

# Extract tiles from all slides in the dataset.
dataset.extract_tiles(qc='otsu')

Read more about tile extraction and :ref:`slide processing in Slideflow <filtering>`.

Held-out test sets

Now that we have our dataset and we've completed the initial tile image processing, we'll split the dataset into a training cohort and a held-out test cohort with :meth:`slideflow.Dataset.split`. We'll split while balancing the outcome 'subtype' equally in the training and test dataset, with 30% of the data retained in the held-out set.

# Split our dataset into a training and held-out test set.
train_dataset, test_dataset = dataset.split(
    model_type='classification',
    labels='subtype',
    val_fraction=0.3
)

Read more about :ref:`Dataset management <datasets_and_validation>`.

Configuring models

Neural network models are prepared for training with :class:`slideflow.ModelParams`, through which we define the model architecture, loss, and hyperparameters. Dozens of architectures are available in both the Tensorflow and PyTorch backends, and both neural network :ref:`architectures <tutorial3>` and :ref:`loss <custom_loss>` functions can be customized. In this example, we will use the included Xception network.

# Prepare a model and hyperparameters.
params = sf.ModelParams(
    tile_px=299,
    tile_um='10x',
    model='xception',
    batch_size=64,
    learning_rate=0.0001
)

Read more about :ref:`hyperparameter optimization in Slideflow <training>`.

Training a model

Models can be trained from these hyperparameter configurations using :meth:`Project.train`. Models can be trained to categorical, multi-categorical, continuous, or time-series outcomes, and the training process is :ref:`highly configurable <training>`. In this case, we are training a binary categorization model to predict the outcome 'subtype', and we will distribute training across multiple GPUs.

By default, Slideflow will train/validate on the full dataset using k-fold cross-validation, but validation settings :ref:`can be customized <validation_planning>`. If you would like to restrict training to only a subset of your data - for example, to leave a held-out test set untouched - you can manually specify a dataset for training. In this case, we will train on train_dataset, and allow Slideflow to further split this into training and validation using three-fold cross-validation.

# Train a model from a set of hyperparameters.
results = P.train(
    'subtype',
    dataset=train_dataset,
    params=params,
    val_strategy='k-fold',
    val_k_fold=3,
    multi_gpu=True,
)

Models and training results will be saved in the project models/ folder.

Read more about :ref:`training a model <training>`.

Evaluating a trained model

After training, you can test model performance on a held-out test dataset with :meth:`Project.evaluate`, or generate predictions without evaluation (when ground-truth labels are not available) with :meth:`Project.predict`. As with :meth:`Project.train`, we can specify a :class:`slideflow.Dataset` to evaluate.

# Train a model from a set of hyperparameters.
test_results = P.evaluate(
    model='/path/to/trained_model_epoch1'
    outcomes='subtype',
    dataset=test_dataset
)

Read more about :ref:`model evaluation <evaluation>`.

Post-hoc analysis

Slideflow includes a number of analytical tools for working with trained models. Read more about :ref:`heatmaps <evaluation>`, :ref:`model explainability <stylegan>`, :ref:`analysis of layer activations <activations>`, and real-time inference in an interactive :ref:`whole-slide image reader <studio>`.