Switch to unified view

a b/documentation/inference.md
1
# Inference Usage Instruction
2
3
Merlin can be run by instantiating the model in PyTorch. Merlin weights are also publicly available on [HuggingFace](https://huggingface.co/stanfordmimi/Merlin).
4
- Image/Text contrastive embeddings
5
- Image-only embeddings (provide similar functionality to Google CT Foundation)
6
7
For a better understanding of the phenotypes and their associated PheWAS attributes, please refer to the [phenotypes](phenotypes.csv) file.
8
9
**Please see the [demo](demo.py) for programmatic examples.**
10
11
#### Image/Text contrastive embeddings
12
13
To get the image/text constrastive embeddings for inference, the breakdown is as follows:
14
15
```python
16
import torch
17
from merlin import Merlin
18
19
model = Merlin()
20
model.eval()
21
model.cuda()
22
23
for batch in dataloader:
24
    outputs = model(
25
        batch["image"].to(device), 
26
        batch["text"]
27
        )
28
```
29
30
where `outputs` is a tuple:
31
- `outputs[0]` : returns the constrative image embeddings (shape: [1, 512])
32
- `outputs[1]` : returns the phenotype prediction (shape: [1, 1692])
33
- `outputs[2]` : returns the constrative text embeddings (shape: [1, 512])
34
35
#### Image-only embeddings
36
37
```python
38
import torch
39
from merlin import Merlin
40
41
model = Merlin(ImageEmbedding=True)
42
model.eval()
43
model.cuda()
44
45
for batch in dataloader:
46
    outputs = model(
47
        batch["image"].to(device), 
48
        )
49
```
50
51
where `outputs` is a tuple:
52
- `outputs[0]` : returns the image embeddings (shape: [1, 2048])
53
54
55
## 👨‍💻 Merlin Finetuning
56
57
Since both Merlin’s model architecture and pretrained weights are provided, Merlin allows for straightforward finetuning in PyTorch VLM and vision-only pipelines. Additionally, Merlin was trained on a single NVIDIA A6000 GPU (with a Vision-Language batch size of 18), meaning finetuning can be performed even in compute-constrained environments.
58
59
Merlin supports both Image/Text and Image-only finetuning. To perform finetuning, simply remove the following lines of code and train on your data:
60
~~`model.eval()`~~  
61
~~`model.cuda()`~~  
62
63
For compute-efficient finetuning, we recommend using mixed-precision training and gradient accumulation.