|
a |
|
b/README.md |
|
|
1 |
# Merlin: Vision Language Foundation Model for 3D Computed Tomography |
|
|
2 |
|
|
|
3 |
[](https://arxiv.org/abs/2406.06512) [](https://huggingface.co/stanfordmimi/Merlin) [](https://pypi.org/project/merlin-vlm/) [](https://youtu.be/XWmCkbpXOUw?si=6GggZgj9U4kbLAKx)  |
|
|
4 |
|
|
|
5 |
*Merlin is a 3D VLM for computed tomography that leverages both structured electronic health records (EHR) and unstructured radiology reports for pretraining.* |
|
|
6 |
|
|
|
7 |
 |
|
|
8 |
|
|
|
9 |
## ⚡️ Installation |
|
|
10 |
|
|
|
11 |
To install Merlin (Python 3.9 required), you can simply run: |
|
|
12 |
|
|
|
13 |
```python |
|
|
14 |
pip install merlin-vlm |
|
|
15 |
``` |
|
|
16 |
|
|
|
17 |
For an editable installation, use the following commands to clone and install this repository. |
|
|
18 |
|
|
|
19 |
```python |
|
|
20 |
conda create -name merlin python==3.9.0 # python == 3.9 |
|
|
21 |
conda activate merlin |
|
|
22 |
|
|
|
23 |
git clone https://github.com/StanfordMIMI/Merlin.git |
|
|
24 |
cd merlin |
|
|
25 |
pip install -e . |
|
|
26 |
``` |
|
|
27 |
|
|
|
28 |
## 🚀 Inference with Merlin |
|
|
29 |
|
|
|
30 |
To create a Merlin model with both image and text embeddings enabled, use the following: |
|
|
31 |
|
|
|
32 |
```python |
|
|
33 |
from merlin import Merlin |
|
|
34 |
|
|
|
35 |
model = Merlin() |
|
|
36 |
``` |
|
|
37 |
|
|
|
38 |
To initialize the model with **only image embeddings** active, use: |
|
|
39 |
|
|
|
40 |
```python |
|
|
41 |
from merlin import Merlin |
|
|
42 |
|
|
|
43 |
model = Merlin(ImageEmbedding=True) |
|
|
44 |
``` |
|
|
45 |
|
|
|
46 |
#### For inference on a demo CT scan, please check out the [demo](documentation/demo.py) |
|
|
47 |
|
|
|
48 |
#### For additional information, please read the [documentation](documentation/inference.md). |
|
|
49 |
|
|
|
50 |
## 📎 Citation |
|
|
51 |
|
|
|
52 |
If you find this repository useful for your work, please cite the cite the [original paper](https://arxiv.org/abs/2406.06512): |
|
|
53 |
|
|
|
54 |
```bibtex |
|
|
55 |
@article{blankemeier2024merlin, |
|
|
56 |
title={Merlin: A vision language foundation model for 3d computed tomography}, |
|
|
57 |
author={Blankemeier, Louis and Cohen, Joseph Paul and Kumar, Ashwin and Van Veen, Dave and Gardezi, Syed Jamal Safdar and Paschali, Magdalini and Chen, Zhihong and Delbrouck, Jean-Benoit and Reis, Eduardo and Truyts, Cesar and others}, |
|
|
58 |
journal={Research Square}, |
|
|
59 |
pages={rs--3}, |
|
|
60 |
year={2024} |
|
|
61 |
} |
|
|
62 |
``` |