|
a |
|
b/documentation/inference.md |
|
|
1 |
# Inference Usage Instruction |
|
|
2 |
|
|
|
3 |
Merlin can be run by instantiating the model in PyTorch. Merlin weights are also publicly available on [HuggingFace](https://huggingface.co/stanfordmimi/Merlin). |
|
|
4 |
- Image/Text contrastive embeddings |
|
|
5 |
- Image-only embeddings (provide similar functionality to Google CT Foundation) |
|
|
6 |
|
|
|
7 |
For a better understanding of the phenotypes and their associated PheWAS attributes, please refer to the [phenotypes](phenotypes.csv) file. |
|
|
8 |
|
|
|
9 |
**Please see the [demo](demo.py) for programmatic examples.** |
|
|
10 |
|
|
|
11 |
#### Image/Text contrastive embeddings |
|
|
12 |
|
|
|
13 |
To get the image/text constrastive embeddings for inference, the breakdown is as follows: |
|
|
14 |
|
|
|
15 |
```python |
|
|
16 |
import torch |
|
|
17 |
from merlin import Merlin |
|
|
18 |
|
|
|
19 |
model = Merlin() |
|
|
20 |
model.eval() |
|
|
21 |
model.cuda() |
|
|
22 |
|
|
|
23 |
for batch in dataloader: |
|
|
24 |
outputs = model( |
|
|
25 |
batch["image"].to(device), |
|
|
26 |
batch["text"] |
|
|
27 |
) |
|
|
28 |
``` |
|
|
29 |
|
|
|
30 |
where `outputs` is a tuple: |
|
|
31 |
- `outputs[0]` : returns the constrative image embeddings (shape: [1, 512]) |
|
|
32 |
- `outputs[1]` : returns the phenotype prediction (shape: [1, 1692]) |
|
|
33 |
- `outputs[2]` : returns the constrative text embeddings (shape: [1, 512]) |
|
|
34 |
|
|
|
35 |
#### Image-only embeddings |
|
|
36 |
|
|
|
37 |
```python |
|
|
38 |
import torch |
|
|
39 |
from merlin import Merlin |
|
|
40 |
|
|
|
41 |
model = Merlin(ImageEmbedding=True) |
|
|
42 |
model.eval() |
|
|
43 |
model.cuda() |
|
|
44 |
|
|
|
45 |
for batch in dataloader: |
|
|
46 |
outputs = model( |
|
|
47 |
batch["image"].to(device), |
|
|
48 |
) |
|
|
49 |
``` |
|
|
50 |
|
|
|
51 |
where `outputs` is a tuple: |
|
|
52 |
- `outputs[0]` : returns the image embeddings (shape: [1, 2048]) |
|
|
53 |
|
|
|
54 |
|
|
|
55 |
## 👨‍💻 Merlin Finetuning |
|
|
56 |
|
|
|
57 |
Since both Merlin’s model architecture and pretrained weights are provided, Merlin allows for straightforward finetuning in PyTorch VLM and vision-only pipelines. Additionally, Merlin was trained on a single NVIDIA A6000 GPU (with a Vision-Language batch size of 18), meaning finetuning can be performed even in compute-constrained environments. |
|
|
58 |
|
|
|
59 |
Merlin supports both Image/Text and Image-only finetuning. To perform finetuning, simply remove the following lines of code and train on your data: |
|
|
60 |
~~`model.eval()`~~ |
|
|
61 |
~~`model.cuda()`~~ |
|
|
62 |
|
|
|
63 |
For compute-efficient finetuning, we recommend using mixed-precision training and gradient accumulation. |