|
a/README.md |
|
b/README.md |
1 |
# Pathology Language and Image Pre-Training (PLIP) |
1 |
# Pathology Language and Image Pre-Training (PLIP) |
2 |
|
2 |
|
3 |
Pathology Language and Image Pre-Training (PLIP) is the first vision and language foundation model for Pathology AI. PLIP is a large-scale pre-trained model that can be used to extract visual and language features from pathology images and text description. |
3 |
Pathology Language and Image Pre-Training (PLIP) is the first vision and language foundation model for Pathology AI. PLIP is a large-scale pre-trained model that can be used to extract visual and language features from pathology images and text description.
|
4 |
The model is a fine-tuned version of the original CLIP model. |
4 |
The model is a fine-tuned version of the original CLIP model. |
5 |
|
5 |
|
6 |
|
6 |
|
7 |
 |
7 |
|
8 |
|
8 |
|
9 |
|
|
|
10 |
## Resources |
9 |
## Resources
|
11 |
- ๐ [Official Demo](https://huggingface.co/spaces/vinid/webplip) |
10 |
- ๐ [Official Demo](https://huggingface.co/spaces/vinid/webplip)
|
12 |
- ๐ [PLIP on HuggingFace](https://huggingface.co/vinid/plip) |
11 |
- ๐ [PLIP on HuggingFace](https://huggingface.co/vinid/plip)
|
13 |
- ๐ [Paper](https://www.nature.com/articles/s41591-023-02504-3) |
12 |
- ๐ [Paper](https://www.nature.com/articles/s41591-023-02504-3) |
14 |
|
13 |
|
15 |
|
14 |
|
16 |
### Internal API Usage |
15 |
### Internal API Usage |
17 |
|
16 |
|
18 |
```python |
17 |
```python
|
19 |
from plip.plip import PLIP |
18 |
from plip.plip import PLIP
|
20 |
import numpy as np |
19 |
import numpy as np
|
21 |
|
20 |
|
22 |
plip = PLIP('vinid/plip') |
21 |
plip = PLIP('vinid/plip')
|
23 |
|
22 |
|
24 |
# we create image embeddings and text embeddings |
23 |
# we create image embeddings and text embeddings
|
25 |
image_embeddings = plip.encode_images(images, batch_size=32) |
24 |
image_embeddings = plip.encode_images(images, batch_size=32)
|
26 |
text_embeddings = plip.encode_text(texts, batch_size=32) |
25 |
text_embeddings = plip.encode_text(texts, batch_size=32)
|
27 |
|
26 |
|
28 |
# we normalize the embeddings to unit norm (so that we can use dot product instead of cosine similarity to do comparisons) |
27 |
# we normalize the embeddings to unit norm (so that we can use dot product instead of cosine similarity to do comparisons)
|
29 |
image_embeddings = image_embeddings/np.linalg.norm(image_embeddings, ord=2, axis=-1, keepdims=True) |
28 |
image_embeddings = image_embeddings/np.linalg.norm(image_embeddings, ord=2, axis=-1, keepdims=True)
|
30 |
text_embeddings = text_embeddings/np.linalg.norm(text_embeddings, ord=2, axis=-1, keepdims=True) |
29 |
text_embeddings = text_embeddings/np.linalg.norm(text_embeddings, ord=2, axis=-1, keepdims=True)
|
31 |
``` |
30 |
``` |
32 |
|
31 |
|
33 |
### HuggingFace API Usage |
32 |
### HuggingFace API Usage |
34 |
|
33 |
|
35 |
```python |
34 |
```python |
36 |
|
35 |
|
37 |
from PIL import Image |
36 |
from PIL import Image
|
38 |
from transformers import CLIPProcessor, CLIPModel |
37 |
from transformers import CLIPProcessor, CLIPModel
|
39 |
|
38 |
|
40 |
model = CLIPModel.from_pretrained("vinid/plip") |
39 |
model = CLIPModel.from_pretrained("vinid/plip")
|
41 |
processor = CLIPProcessor.from_pretrained("vinid/plip") |
40 |
processor = CLIPProcessor.from_pretrained("vinid/plip")
|
42 |
|
41 |
|
43 |
image = Image.open("images/image1.jpg") |
42 |
image = Image.open("images/image1.jpg")
|
44 |
|
43 |
|
45 |
inputs = processor(text=["a photo of label 1", "a photo of label 2"], |
44 |
inputs = processor(text=["a photo of label 1", "a photo of label 2"],
|
46 |
images=image, return_tensors="pt", padding=True) |
45 |
images=image, return_tensors="pt", padding=True)
|
47 |
|
46 |
|
48 |
outputs = model(**inputs) |
47 |
outputs = model(**inputs)
|
49 |
logits_per_image = outputs.logits_per_image # this is the image-text similarity score |
48 |
logits_per_image = outputs.logits_per_image # this is the image-text similarity score
|
50 |
probs = logits_per_image.softmax(dim=1) |
49 |
probs = logits_per_image.softmax(dim=1)
|
51 |
print(probs) |
50 |
print(probs)
|
52 |
image.resize((224, 224)) |
51 |
image.resize((224, 224)) |
53 |
|
52 |
|
54 |
``` |
53 |
``` |
55 |
|
54 |
|
56 |
### Citation |
55 |
### Citation |
57 |
|
56 |
|
58 |
If you use PLIP in your research, please cite the following paper: |
57 |
If you use PLIP in your research, please cite the following paper: |
59 |
|
58 |
|
60 |
```bibtex |
59 |
```bibtex
|
61 |
@article{huang2023visual, |
60 |
@article{huang2023visual,
|
62 |
title={A visual--language foundation model for pathology image analysis using medical Twitter}, |
61 |
title={A visual--language foundation model for pathology image analysis using medical Twitter},
|
63 |
author={Huang, Zhi and Bianchi, Federico and Yuksekgonul, Mert and Montine, Thomas J and Zou, James}, |
62 |
author={Huang, Zhi and Bianchi, Federico and Yuksekgonul, Mert and Montine, Thomas J and Zou, James},
|
64 |
journal={Nature Medicine}, |
63 |
journal={Nature Medicine},
|
65 |
pages={1--10}, |
64 |
pages={1--10},
|
66 |
year={2023}, |
65 |
year={2023},
|
67 |
publisher={Nature Publishing Group US New York} |
66 |
publisher={Nature Publishing Group US New York}
|
68 |
} |
67 |
}
|
69 |
``` |
68 |
``` |
70 |
|
69 |
|
71 |
### Acknowledgements |
70 |
### Acknowledgements |
72 |
|
71 |
|
73 |
The internal API has been **copied** from FashionCLIP. |
72 |
The internal API has been **copied** from FashionCLIP. |