[78ef36]: / docs / _sources / features.rst.txt

Download this file

486 lines (344 with data), 18.9 kB

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
.. _features:
Generating Features
===================
Converting images into feature vectors is a common step for many machine learning tasks, including `feature space analysis <activations>`_ and `multiple-instance learning (MIL) <mil>`_. Slideflow provides a simple API for generating features from image tiles and includes several pretrained feature extractors. You can see a list of all available feature extractors with :func:`slideflow.list_extractors`.
Generating Features
*******************
The first step in generating features from a dataset of images is creating a feature extractor. Many types of feature extractors can be used, including imagenet-pretrained models, models finetuned in Slideflow, histology-specific pretrained feature extractors (ie. "foundation models"), or fine-tuned SSL models. In all cases, feature extractors are built with :func:`slideflow.build_feature_extractor`, and features are generated for a `Dataset <datasets_and_val>`_ using :meth:`slideflow.Dataset.generate_feature_bags`, as described :ref:`below <bags>`.
.. code-block:: python
# Build a feature extractor
ctranspath = sf.build_feature_extractor('ctranspath')
# Generate features for a dataset
dataset.generate_feature_bags(ctranspath, outdir='/path/to/features')
Pretrained Extractors
*********************
Slideflow includes several pathology-specific feature extractors, also referred to as foundation models, pretrained on large-scale histology datasets.
.. list-table:: **Pretrained feature extractors.** Note: "histossl" was renamed to "phikon" in Slideflow 3.0.
:header-rows: 1
:widths: 14 10 8 8 8 14 28 10
* - Model
- Type
- WSIs
- Input size
- Dim
- Source
- Package
- Link
* - **Virchow**
- DINOv2
- 1.5M
- 224
- 2560
- Paige
- ``slideflow``
- `Paper <http://arxiv.org/pdf/2309.07778v5>`__
* - **CTransPath**
- SRCL
- 32K
- 224
- 768
- Tencent AI Lab
- ``slideflow-gpl``
- `Paper <https://www.sciencedirect.com/science/article/abs/pii/S1361841522002043>`__
* - **RetCCL**
- CCL
- 32K
- 256
- 2048
- Tencent AI Lab
- ``slideflow-gpl``
- `Paper <https://www.sciencedirect.com/science/article/abs/pii/S1361841522002730>`__
* - **Phikon**
- iBOT
- 6.1K
- 224
- 768
- Owkin
- ``slideflow-noncommercial``
- `Paper <https://www.medrxiv.org/content/10.1101/2023.07.21.23292757v2.full.pdf>`__
* - **PLIP**
- CLIP
- N/A
- 224
- 512
- Zhao Lab
- ``slideflow-noncommercial``
- `Paper <https://www.nature.com/articles/s41591-023-02504-3>`__
* - **UNI**
- DINOv2
- 100K
- 224
- 1024
- Mahmood Lab
- ``slideflow-noncommercial``
- `Paper <https://www.nature.com/articles/s41591-024-02857-3>`__
* - **GigaPath**
- DINOv2
- 170K
- 256
- 1536
- Microsoft
- ``slideflow-noncommercial``
- `Paper <https://aka.ms/gigapath>`__
In order to respect the original licensing agreements, pretrained models are distributed in separate packages. The core ``slideflow`` package provides access to models under the **Apache-2.0** license, while models under **GPL-3.0** are available in the ``slideflow-gpl`` package. Models restricted to non-commercial use are available under the **CC BY-NC 4.0** license through the ``slideflow-noncommercial`` package.
Loading weights
---------------
Pretrained feature extractors will automatically download their weights from Hugging Face upon creation. Some models, such as PLIP, GigaPath, UNI, and Phikon, require approval for access. Request approval on Hugging Face and ensure your local machine has been `authenticated <https://huggingface.co/docs/huggingface_hub/en/quick-start#authentication>`_.
All pretrained models can also be loaded using local weights. Use the ``weights`` argument when creating a feature extractor.
.. code-block:: python
# Load UNI with local weights
uni = sf.build_feature_extractor('uni', weights='../pytorch_model.bin')
Image preprocessing
-------------------
Each feature extractor includes a default image preprocessing pipeline that matches the original implementation. However, preprocessing can also be manually adjusted using various keyword arguments when creating a feature extractor.
- **resize**: ``int`` or ``bool``. If an ``int``, resizes images to this size. If ``True``, resizes images to the input size of the feature extractor. Default is ``False``.
- **center_crop**: ``int`` or ``bool``. If an ``int``, crops images to this size. If ``True``, crops images to the input size of the feature extractor. Center-cropping happens after resizing, if both are used. Default is ``False``.
- **interpolation**: ``str``. Interpolation method for resizing images. Default is ``bilinear`` for most models, but is ``bicubic`` for GigaPath and Virchow.
- **antialias**: ``bool``. Whether to apply antialiasing to resized images. Default is ``False`` (matching the default behavior of torchvision < 0.17).
- **norm_mean**: ``list``. Mean values for image normalization. Default is ``[0.485, 0.456, 0.406]`` for all models except PLIP.
- **norm_std**: ``list``. Standard deviation values for image normalization. Default is ``[0.229, 0.224, 0.225]`` for all models except PLIP.
Example:
.. code-block:: python
# Load a feature extractor with custom preprocessing
extractor = sf.build_feature_extractor(
'ctranspath',
resize=224,
interpolation='bicubic',
antialias=True
)
Default values for these processing arguments are determined by the feature extractor. One notable exception to the standard preprocessing algorithm is GigaPath, for which images are resized first (default to 256x256) and then center cropped (default to 224x224), which mirrors the official implementation.
For transparency, you can see the current preprocessing pipeline with ``extractor.transform``:
.. code-block:: python
>>> import slideflow as sf
>>> ctranspath = sf.build_feature_extractor(
... 'ctranspath',
... resize=256,
... interpolation='bicubic',
... center_crop=224
... )
>>> ctranspath.transform
Compose(
CenterCrop(size=(224, 224))
Resize(size=256, interpolation=bicubic, max_size=None, antialias=False)
Lambda()
Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))
)
GigaPath
--------
GigaPath is a DINOv2-based model from Microsoft/Providence trained on 170k whole-slide images and is bundled with ``slideflow-noncommercial``. The GigaPath model includes additional dependencies which are not broadly compatible with all OS distributions, and are thus not installed by default. To install the GigaPath dependencies:
.. code-block:: bash
pip install slideflow-noncommercial[gigapath] git+ssh://git@github.com/prov-gigapath/prov-gigapath
GigaPath has two stages: a tile encoder and slide-level encoder. The tile encoder (``"gigapath.tile"``) works the same as all other feature extractors in Slideflow. You can build this encoder directly:
.. code-block:: python
# Build the tile encoder
gigapath_tile = sf.build_feature_extractor("gigapath.tile")
# Use the tile encoder
project.generate_feature_bags(gigapath_tile, ...)
or you can build the combined tile+slide model, and then use ``gigapath.tile``:
.. code-block:: python
# Build the tile encoder
gigapath = sf.build_feature_extractor("gigapath")
# Use the tile encoder
project.generate_feature_bags(gigapath.tile, ...)
As there are two stages to GigaPath, there are also separate model weights. As with other pretrained feature extractors, the weights will be auto-downloaded from Hugging Face upon first use if you are logged into Hugging Face and have been granted access to the repository. If you have manually downloaded the weights, these can be used with the following:
.. code-block:: python
# Example of how to supply tile + slide weights
# For the full GigaPath model
gigapath = sf.build_feature_extractor(
'gigapath',
tile_encoder_weights='../pytorch_model.bin',
slide_encoder_weights='../slide_encoder.pth'
)
# Or, just supply the tile weights
gigapath_tile = sf.build_feature_extractor(
'gigapath.tile',
weights='pytorch_model.bin'
)
Once feature bags have been generated and saved with the GigaPath tile encoder, you can then generate slide-level embeddings with ``gigapath.slide``:
.. code-block:: python
# Load GigaPath
gigapath = sf.build_feature_extractor('gigapath')
# Generate tile-level features
project.generate_feature_bags(gigapath.tile, ..., outdir='/gigapath_bags')
# Generate slide-level embeddings
gigapath.slide.generate_and_save('/gigapath_bags', outdir='/gigapath_embeddings')
In addition to running the tile and slide encoder steps separately, you can also run the combined pipeline all at once on a whole-slide image, generating a final slide-level embedding.
.. code-block:: python
# Load GigaPath
gigapath = sf.build_feature_extractor('gigapath')
# Load slide
wsi = sf.WSI('slide.svs', tile_px=256, tile_um=128)
# Generate slide embedding
embedding = gigapath(wsi)
ImageNet Features
*****************
To calculate features from an ImageNet-pretrained network, first build an imagenet feature extractor with :func:`slideflow.build_feature_extractor`. The first argument should be the name of an architecture followed by ``_imagenet``, and the expected tile size should be passed to the keyword argument ``tile_px``. You can optionally specify the layer from which to generate features with the ``layers`` argument; if not provided, it will default to calculating features from post-convolutional layer activations. For example, to build a ResNet50 feature extractor for images at 299 x 299 pixels:
.. code-block:: python
resnet50 = sf.build_feature_extractor(
'resnet50_imagenet',
tile_px=299
)
This will calculate features using activations from the post-convolutional layer. You can also concatenate activations from multiple neural network layers and apply pooling for layers with 2D output shapes.
.. code-block:: python
resnet50 = sf.build_feature_extractor(
'resnet50_imagenet',
layers=['conv1_relu', 'conv3_block1_2_relu'],
pooling='avg',
tile_px=299
)
If a model architecture is available in both the Tensorflow and PyTorch backends, Slideflow will default to using the active backend. You can manually set the feature extractor backend using ``backend``.
.. code-block:: python
# Create a PyTorch feature extractor
extractor = sf.build_feature_extractor(
'resnet50_imagenet',
layers=['layer2.0.conv1', 'layer3.1.conv2'],
pooling='avg',
tile_px=299,
backend='torch'
)
You can view all available feature extractors with :func:`slideflow.model.list_extractors`.
Layer Activations
*****************
You can also calculate features from any model trained in Slideflow. The first argument to ``build_feature_extractor()`` should be the path of the trained model. You can optionally specify the layer at which to calculate activations using the ``layers`` keyword argument. If not specified, activations are calculated at the post-convolutional layer.
.. code-block:: python
# Calculate features from trained model.
features = build_feature_extractor(
'/path/to/model',
layers='sepconv3_bn'
)
Self-Supervised Learning
************************
Finally, you can also generate features from a trained :ref:`self-supervised learning <simclr_ssl>` model (either `SimCLR <https://github.com/jamesdolezal/simclr>`_ or `DinoV2 <https://github.com/jamesdolezal/dinov2>`_).
For SimCLR models, use ``'simclr'`` as the first argument to ``build_feature_extractor()``, and pass the path to a saved model (or saved checkpoint file) via the keyword argument ``ckpt``.
.. code-block:: python
simclr = sf.build_feature_extractor(
'simclr',
ckpt='/path/to/simclr.ckpt'
)
For DinoV2 models, use ``'dinov2'`` as the first argument, and pass the model configuration YAML file to ``cfg`` and the teacher checkpoint weights to ``weights``.
.. code-block:: python
dinov2 = sf.build_feature_extractor(
'dinov2',
weights='/path/to/teacher_checkpoint.pth',
cfg='/path/to/config.yaml'
)
Custom Extractors
*****************
Slideflow also provides an API for integrating your own custom, pretrained feature extractor. See :ref:`custom_extractors` for additional information.
.. _bags:
Exporting Features
******************
Feature bags
------------
Once you have prepared a feature extractor, features can be generated for a dataset and exported to disk for later use. Pass a feature extractor to the first argument of :meth:`slideflow.Project.generate_feature_bags`, with a :class:`slideflow.Dataset` as the second argument.
.. code-block:: python
# Load a project and dataset.
P = sf.Project(...)
dataset = P.dataset(tile_px=299, tile_um=302)
# Create a feature extractor.
ctranspath = sf.build_feature_extractor('ctranspath', resize=True)
# Calculate & export feature bags.
P.generate_feature_bags(ctranspath, dataset)
.. note::
If you are generating features from a SimCLR model trained with stain normalization,
you should specify the stain normalizer using the ``normalizer`` argument to :meth:`slideflow.Project.generate_feature_bags` or :class:`slideflow.DatasetFeatures`.
Features are calculated for slides in batches, keeping memory usage low. By default, features are saved to disk in a directory named ``pt_files`` within the project directory, but you can override the destination directory using the ``outdir`` argument.
Alternatively, you can calculate features for a dataset using :class:`slideflow.DatasetFeatures` and the ``.to_torch()`` method. This will calculate features for your entire dataset at once, which may require a large amount of memory. The first argument should be the feature extractor, and the second argument should be a :class:`slideflow.Dataset`.
.. code-block:: python
# Calculate features for the entire dataset.
features = sf.DatasetFeatures(ctranspath, dataset)
# Export feature bags.
features.to_torch('/path/to/bag_directory/')
.. warning::
Using :class:`slideflow.DatasetFeatures` directly may result in a large amount of memory usage, particularly for sizable datasets. When generating feature bags for training MIL models, it is recommended to use :meth:`slideflow.Project.generate_feature_bags` instead.
Feature "bags" are PyTorch tensors of features for all images in a slide, saved to disk as ``.pt`` files. These bags are used to train MIL models. Bags can be manually loaded and inspected using :func:`torch.load`.
.. code-block:: python
>>> import torch
>>> bag = torch.load('/path/to/bag.pt')
>>> bag.shape
torch.Size([2310, 768])
>>> bag.dtype
torch.float32
When image features are exported for a dataset, the feature extractor configuration is saved to ``bags_config.json`` in the same directory as the exported features. This configuration file can be used to rebuild the feature extractor. An example file is shown below.
.. code-block:: json
{
"extractor": {
"class": "slideflow.model.extractors.ctranspath.CTransPathFeatures",
"kwargs": {
"center_crop": true
}
},
"normalizer": {
"method": "macenko",
"fit": {
"stain_matrix_target": [
[
0.5062568187713623,
0.22186939418315887
],
[
0.7532230615615845,
0.8652154803276062
],
[
0.4069173336029053,
0.42241501808166504
]
],
"target_concentrations": [
1.7656903266906738,
1.2797492742538452
]
}
},
"num_features": 2048,
"tile_px": 299,
"tile_um": 302
}
The feature extractor can be manually rebuilt using :func:`slideflow.model.rebuild_extractor()`:
.. code-block:: python
from slideflow.model import rebuild_extractor
# Recreate the feature extractor
# and stain normalizer, if applicable
extractor, normalizer = rebuild_extractor('/path/to/bags_config.json')
From a TFRecord
---------------
In addition to generating and exporting feature bags for a dataset, features can also be generated from a single TFRecord file. This may be useful for debugging or testing purposes.
.. code-block:: python
import slideflow as sf
# Create a feature extractor
ctranspath = sf.build_feature_extractor('ctranspath')
# Bags is a tensor of shape (n_tiles, n_features)
# Coords is a tensor of shape (n_tiles, 2), containing x/y tile coordinates.
bags, coords = ctranspath('file.tfrecords')
From a whole-slide image
------------------------
Feature extractors can also create features from a whole-slide image. This is useful for single-slide analysis, MIL inference, and other tasks where features are needed for the entire slide. Features are returned as a 3D tensor, with shape ``(width, height, n_features)``, reflecting the spatial arrangement of features for tiles across the image.
.. code-block:: python
# Load a feature extractor.
ctranspath = sf.build_feature_extractor('ctranspath')
# Load a whole-slide image.
wsi = sf.WSI('slide.svs', tile_px=256, tile_um=128)
# Generate features for the whole slide.
# Shape: (width, height, n_features)
features = ctranspath(wsi)
Mixed precision
---------------
All feature extractors will use mixed precision by default. This can be disabled by setting the ``mixed_precision`` argument to ``False`` when creating the feature extractor.
.. code-block:: python
# Load a feature extractor without mixed precision
extractor = sf.build_feature_extractor('ctranspath', mixed_precision=False)
License & Citation
------------------
Licensing and citation information for the pretrained feature extractors is accessible with the ``.license`` and ``.citation`` attributes.
.. code-block:: python
>>> ctranspath.license
'GNU General Public License v3.0'
>>> print(ctranspath.citation)
@{wang2022,
title={Transformer-based Unsupervised Contrastive Learning for Histopathological Image Classification},
author={Wang, Xiyue and Yang, Sen and Zhang, Jun and Wang, Minghui and Zhang, Jing and Yang, Wei and Huang, Junzhou and Han, Xiao},
journal={Medical Image Analysis},
year={2022},
publisher={Elsevier}
}