Merlin is a 3D VLM for computed tomography that leverages both structured electronic health records (EHR) and unstructured radiology reports for pretraining.
To install Merlin (Python 3.9 required), you can simply run:
pip install merlin-vlm
For an editable installation, use the following commands to clone and install this repository.
conda create -name merlin python==3.9.0 # python == 3.9
conda activate merlin
git clone https://github.com/StanfordMIMI/Merlin.git
cd merlin
pip install -e .
To create a Merlin model with both image and text embeddings enabled, use the following:
from merlin import Merlin
model = Merlin()
To initialize the model with only image embeddings active, use:
from merlin import Merlin
model = Merlin(ImageEmbedding=True)
If you find this repository useful for your work, please cite the cite the original paper:
@article{blankemeier2024merlin,
title={Merlin: A vision language foundation model for 3d computed tomography},
author={Blankemeier, Louis and Cohen, Joseph Paul and Kumar, Ashwin and Van Veen, Dave and Gardezi, Syed Jamal Safdar and Paschali, Magdalini and Chen, Zhihong and Delbrouck, Jean-Benoit and Reis, Eduardo and Truyts, Cesar and others},
journal={Research Square},
pages={rs--3},
year={2024}
}