[78ef36]: / docs / _sources / posthoc.rst.txt

Download this file

259 lines (160 with data), 11.2 kB

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
.. currentmodule:: slideflow.model
.. _activations:
Layer Activations
=================
Investigating the latent space of a neural network can provide useful insights into the structure of your data and what models have learned during training. Slideflow provides several tools for post-hoc latent space analysis of trained neural networks, primarily by calculating activations at one or more neural network layers for all images in a dataset. In the next sections, we will take a look at how these layer activations can be calculated for downstream analysis and provide examples of analyses that can be performed.
Calculating Layer Activations
*****************************
Activations at one or more layers of a trained network can be calculated with :class:`slideflow.model.Features` and :class:`slideflow.DatasetFeatures`. The former provides an interface for calculating layer activations for a batch of images, and the latter supervises calculations across an entire dataset.
Batch of images
---------------
:class:`Features` provides an interface for calculating layer activations and predictions on a batch of images. The following arguments are available:
- ``path``: Path to model, from which layer activations are calculated. Required.
- ``layers``: Layer(s) at which to calculate activations.
- ``include_preds``: Also return the final network output (predictions)
- ``pooling``: Apply pooling to layer activations, to reduce dimensionality to one dimension.
If ``layers`` is not supplied, activations at the post-convolutional layer will be calculated by default.
Once initialized, the resulting object can be called on a batch of images and will return the layer activations for all images in the batch. For example, to calculate activations at the ``sep_conv_3`` layer of a model while looping through a dataset:
.. code-block:: python
import slideflow as sf
sepconv3 = sf.model.Features('model/path', layer='sep_conv_3')
for img_batch in dataset:
postconv_activations = sepconv3(img_batch)
If ``layer`` is a list of layer names, activations at each layer will be calculated and concatenated. If ``include_preds`` is ``True``, the interface will also return the final predictions:
.. code-block:: python
sepconv3_and_preds = sf.model.Features(..., include_preds=True)
layer_activations, preds = sepconv3_and_preds(img_batch)
.. note::
:class:`Features` assumes that image batches already have any necessary preprocessing already applied, including standardization and stain normalization.
See the API documentation for :class:`Features` for more information.
Single slide
------------
Layer activations can also be calculated across an entire slide using the same :class:`Features` interface. Calling the object on a :class:`slideflow.WSI` object will generate a grid of activations of size ``(slide.grid.shape[0], slide.grid.shape[1], num_features)``:
.. code-block:: python
import slideflow as sf
slide = sf.WSI(...)
postconv = sf.model.Features('/model/path', layers='postconv')
feature_grid = postconv(slide)
print(feature_grid.shape)
.. rst-class:: sphx-glr-script-out
.. code-block:: none
(50, 45, 2048)
.. _dataset_features:
Entire dataset
--------------
Finally, layer activations can also be calculated for an entire dataset using :class:`slideflow.DatasetFeatures`. Instancing the class supervises the calculation and caching of layer activations, which can then be used for downstream analysis. The project function :func:`slideflow.Project.generate_features` creates and returns an instance of this class.
.. code-block:: python
dts_ftrs = P.generate_features('/path/to/trained_model')
Alternatively, you can create an instance of this class directly:
.. code-block:: python
import slideflow as sf
dataset = P.dataset(tile_px=299, tile_um=302)
dts_ftrs = sf.DatasetFeatures(
model='/path/to/trained_model',
dataset=dataset,
)
Tile-level feature activations for each slide can be accessed directly from ``DatasetFeatures.activations``, a dict mapping slide names to numpy arrays of shape ``(num_tiles, num_features)``. Predictions are stored in ``DatasetFeatures.predictions``, a dict mapping slide names to numpy arrays of shape ``(num_tiles, num_classes)``. Tile-level location data (coordinates from which the tiles were taken from their respective source slides) is stored in ``DatasetFeatures.locations``, a dict mapping slide names to numpy arrays of shape ``(num_tiles, 2)`` (``x``, ``y``).
Activations can be exported to a Pandas DataFrame with :meth:`slideflow.DatasetFeatures.to_df` or exported into PyTorch format with :meth:`slideflow.DatasetFeatures.to_torch`. See :ref:`features` for more information about generating and exporting features for MIL models.
Read the API documentation for :class:`slideflow.DatasetFeatures` for more information.
.. _slidemap:
Mapping Activations
*******************
Layer activations across a dataset can be dimensionality reduced with UMAP and plotted for visualization using :meth:`slideflow.DatasetFeatures.map_activations`. This function returns an instance of :class:`slideflow.SlideMap`, a class that provides easy access to labeling and plotting.
The below example calculates layer activations at the neural network layer ``sep_conv_3`` for an entire dataset, and then reduces the activations into two dimensions for easy visualization using UMAP. Any valid `UMAP parameters <https://umap-learn.readthedocs.io/en/latest/parameters.html>`_ can be passed via keyword argument.
.. code-block:: python
dts_ftrs = P.generate_features(
model='/path/to/trained_model',
layers='sep_conv_3'
)
slide_map = dts_ftrs.map_activations(
n_neighbors=10, # UMAP parameter
min_dist=0.2 # UMAP parameter
)
We can then plot the activations with :meth:`slideflow.SlideMap.plot`. All keyword arguments are passed to the `matplotlib scatter <https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.scatter.html>`_ function.
.. code-block:: python
import matplotlib.pyplot as plt
slide_map.plot(s=10)
plt.show()
We can add labels to our plot by first passing a dictionary with slide labels to the function :meth:`slideflow.SlideMap.label_by_slide`.
.. code-block:: python
# Get a dictionary mapping slide names to category labels
dataset = P.dataset(tile_px=299, tile_um='10x')
labels, unique_labels = dataset.labels('subtype', format='name')
# Assign the labels to the slide map, then plot
slide_map.label_by_slide(labels)
slide_map.plot()
.. image:: umap_example.png
|
Finally, we can use :meth:`SlideMap.umap_transform` to project new data into two dimensions using the previously fit UMAP.
.. code-block:: python
import slideflow as sf
import numpy as np
# Create a SlideMap using layer activations reduced with UMAP
dts_ftrs = P.generate_features(
model='/path/to/trained_model',
layers='sep_conv_3'
)
slide_map = dts_ftrs.map_activations()
# Load some dummy data.
# Second dimension must match size of activation vector.
dummy = np.random.random((100, 1024))
# Transform the data using the already-fit UMAP.
transformed = slide_map.umap_transform(dummy)
print(transformed.shape)
.. rst-class:: sphx-glr-script-out
.. code-block:: none
(100, 2)
Read more about additional :class:`slideflow.SlideMap` functions, including saving, loading, and clustering, in the linked API documentation.
.. _mosaic_map:
Mosaic Maps
***********
Mosaic maps provide a tool for visualizing the distribution of histologic image features in a dataset through analysis of neural network layer activations. Similar to `activation atlases <https://distill.pub/2019/activation-atlas/>`_, a mosaic map is generated by first calculating layer activations for a dataset, dimensionality reducing these activations with `UMAP <https://joss.theoj.org/papers/10.21105/joss.00861>`_, and then overlaying corresponding images in a grid-wise fashion.
.. image:: mosaic_example.png
|
In the previous sections, we reviewed how to calculate layer activations across a dataset, and then dimensionality reduce these activations into two dimensions using UMAP. :class:`slideflow.Mosaic` provides a tool for converting these activation maps into a grid of image tiles plotted according to their associated activation vectors.
Quickstart
----------
The fastest way to build a mosaic map is using :class:`slideflow.Project.generate_mosaic`, which requires a ``DatasetFeatures`` object as its only mandatory argument and returns an instance of :class:`slideflow.Mosaic`.
.. code-block:: python
dts_ftrs = P.generate_features('/path/to/trained_model', layers='postconv')
mosaic = P.generate_mosaic(dts_ftrs)
mosaic.save('mosaic.png')
When created with this interface, the underlying :class:`slideflow.SlideMap` object used to create the mosaic map is accessible via ``slideflow.Mosaic.slide_map``. You could, for example, use :func:`slideflow.SlideMap.save` to save the UMAP plot:
.. code-block:: python
mosiac.slide_map.save('umap.png')
From a SlideMap
---------------
Any ``SlideMap`` can be converted to a mosaic map with :meth:`slideflow.SlideMap.generate_mosaic()`.
.. code-block:: python
ftrs = P.generate_features('/path/to/model')
slide_map = ftrs.map_activations()
mosaic = slide_map.generate_mosaic()
mosaic.save('mosaic.png')
Manual creation
---------------
Mosaic maps can be flexibly created with :class:`slideflow.Mosaic`, requiring two components: a set of images and corresponding coordinates. Images and coordinates can either be manually provided, or the mosaic can dynamically read images from TFRecords (as is done with :meth:`Project.generate_mosaic()`).
The first argument of :class:`slideflow.Mosaic` provides the images, and may be either of the following:
- A list or array of images (np.ndarray, HxWxC)
- A list of tuples, containing ``(slide_name, tfrecord_index)``
The second argument provides the coordinates:
- A list or array of (x, y) coordinates for each image
For example, to create a mosaic map from a list of images and coordinates:
.. code-block:: python
# Example data (images are HxWxC, np.ndarray)
images = [np.ndarray(...), ...]
coords = [(0.2, 0.9), ...]
# Generate the mosaic
mosaic = Mosaic(images, coordinates)
mosaic.plot()
You can also generate a mosaic map where the images are tuples of `(tfrecord, tfrecord_index)`. In this case, the mosaic map will dynamically read images from TFRecords during plotting.
.. code-block:: python
# Example data
tfrecords = ['/path/to/tfrecord`.tfrecords', ...]
idx = [253, 112, ...]
coords = [(0.2, 0.9), ...]
# Generate mosaic map
mosaic = sf.Mosaic(
images=[(tfr, idx) for tfr, idx in zip(tfrecords, idx)],
coords=coords
)
There are several additional arguments that can be used to customize the mosaic map plotting. Read the linked API documentation for :class:`slideflow.Mosaic` for more information.