Switch to unified view

a/README.md b/README.md
1
# Few shot learning for phenotype-driven diagnosis of patients with rare genetic diseases
1
# Few shot learning for phenotype-driven diagnosis of patients with rare genetic diseases
2
2
3
**Authors**:
3
**Authors**:
4
- [Emily Alsentzer](https://emilyalsentzer.com) (Equal contribution)
4
- [Emily Alsentzer](https://emilyalsentzer.com) (Equal contribution)
5
- [Michelle M. Li](http://michellemli.com) (Equal contribution)
5
- [Michelle M. Li](http://michellemli.com) (Equal contribution)
6
- [Shilpa N. Kobren](http://shilpakobren.com)
6
- [Shilpa N. Kobren](http://shilpakobren.com)
7
- [Ayush Noori](https://www.ayushnoori.com)
7
- [Ayush Noori](https://www.ayushnoori.com)
8
- [Undiagnosed Diseases Network](https://undiagnosed.hms.harvard.edu)
8
- [Undiagnosed Diseases Network](https://undiagnosed.hms.harvard.edu)
9
- [Isaac S. Kohane](http://zaklab.org)
9
- [Isaac S. Kohane](http://zaklab.org)
10
- [Marinka Zitnik](http://zitniklab.hms.harvard.edu)
10
- [Marinka Zitnik](http://zitniklab.hms.harvard.edu)
11
11
12
**Additional resources**:
12
**Additional resources**:
13
- [Paper](https://www.medrxiv.org/content/10.1101/2022.12.07.22283238v2)
13
- [Paper](https://www.medrxiv.org/content/10.1101/2022.12.07.22283238v2)
14
- [Project Website](https://zitniklab.hms.harvard.edu/projects/SHEPHERD/)
14
- [Project Website](https://zitniklab.hms.harvard.edu/projects/SHEPHERD/)
15
- [HuggingFace Space illustrating SHEPHERD's use for causal gene nomination, patients-like-me identification and disease characterization](https://huggingface.co/spaces/emilyalsentzer/SHEPHERD)
15
- [HuggingFace Space illustrating SHEPHERD's use for causal gene nomination, patients-like-me identification and disease characterization](https://huggingface.co/spaces/emilyalsentzer/SHEPHERD)
16
16
17
## Overview of SHEPHERD
17
## Overview of SHEPHERD
18
18
19
There are over 7,000 unique rare diseases, some of which affect 3,500 or fewer patients in the US. Due to clinicians' limited experience with such diseases and the considerable heterogeneity of their clinical presentations, many patients with rare genetic diseases remain undiagnosed. While artificial intelligence has demonstrated success in assisting diagnosis, its success is usually contingent on the availability of large annotated datasets. Here, we present SHEPHERD, a deep learning approach for multi-faceted rare disease diagnosis. To overcome the limitations of supervised learning, SHEPHERD performs label-efficient training by (1) training exclusively on simulated rare disease patients without the use of any real labeled data and (2) incorporating external knowledge of known phenotype, gene and disease associations via knowledge-guided deep learning.
19
There are over 7,000 unique rare diseases, some of which affect 3,500 or fewer patients in the US. Due to clinicians' limited experience with such diseases and the considerable heterogeneity of their clinical presentations, many patients with rare genetic diseases remain undiagnosed. While artificial intelligence has demonstrated success in assisting diagnosis, its success is usually contingent on the availability of large annotated datasets. Here, we present SHEPHERD, a deep learning approach for multi-faceted rare disease diagnosis. To overcome the limitations of supervised learning, SHEPHERD performs label-efficient training by (1) training exclusively on simulated rare disease patients without the use of any real labeled data and (2) incorporating external knowledge of known phenotype, gene and disease associations via knowledge-guided deep learning.
20
20
21
### The Rare Disease Diagnosis Pipeline
21
### The Rare Disease Diagnosis Pipeline
22
22
23
After years of failed diagnostic attempts, once a patient is accepted to the UDN, they receive a thorough clinical workup and genetic sequencing, and their case is analyzed in an iterative process to identify the candidate genes likely to explain the patient's symptoms. SHEPHERD can be utilized throughout the pipeline to accelerate the diagnosis process: after the clinical workup to find similar patients, after the sequencing analysis to identify strong candidate genes, and after case review to further prioritize candidate genes, characterize the patient's disease, and/or validate candidate genes by finding phenotype and genotype-matched patients.
23
After years of failed diagnostic attempts, once a patient is accepted to the UDN, they receive a thorough clinical workup and genetic sequencing, and their case is analyzed in an iterative process to identify the candidate genes likely to explain the patient's symptoms. SHEPHERD can be utilized throughout the pipeline to accelerate the diagnosis process: after the clinical workup to find similar patients, after the sequencing analysis to identify strong candidate genes, and after case review to further prioritize candidate genes, characterize the patient's disease, and/or validate candidate genes by finding phenotype and genotype-matched patients.
24
24
25
<p align="center">
25
26
<img src="img/rare_diseases_pipeline.png?raw=true" width="600" >
27
</p>
28
29
### The SHEPHERD Algorithm
26
### The SHEPHERD Algorithm
30
27
31
SHEPHERD is guided by existing knowledge of diseases, phenotypes, and genes to learn novel connections between a patient's clinico-genomic information and phenotype and gene relationships. SHEPHERD takes in as input the patient’s set of phenotypes as well a list of either candidates genes, patients, or diseases and leverages an external rare disease knowledge graph to perform multi-faceted rare disease diagnosis: causal gene discovery, patients-like-me identification, and novel disease characterization.
28
SHEPHERD is guided by existing knowledge of diseases, phenotypes, and genes to learn novel connections between a patient's clinico-genomic information and phenotype and gene relationships. SHEPHERD takes in as input the patient’s set of phenotypes as well a list of either candidates genes, patients, or diseases and leverages an external rare disease knowledge graph to perform multi-faceted rare disease diagnosis: causal gene discovery, patients-like-me identification, and novel disease characterization.
32
29
33
<p align="center">
30
34
<img src="img/shepherd_overview.png?raw=true" width="250" >
31
35
</p>
32
36
37
38
## Installation and Setup
33
## Installation and Setup
39
34
40
### :one: Download the Repo
35
### :one: Download the Repo
41
36
42
First, clone the GitHub repository:
37
First, clone the GitHub repository:
43
38
44
```
39
```
45
git clone https://github.com/mims-harvard/SHEPHERD
40
git clone https://github.com/mims-harvard/SHEPHERD
46
cd SHEPHERD
41
cd SHEPHERD
47
```
42
```
48
43
49
### :two: Set Up Environment
44
### :two: Set Up Environment
50
45
51
This codebase leverages Python, Pytorch, Pytorch Geometric, etc. To create an environment with all of the required packages, please ensure that [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html) is installed and then execute the commands:
46
This codebase leverages Python, Pytorch, Pytorch Geometric, etc. To create an environment with all of the required packages, please ensure that [conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html) is installed and then execute the commands:
52
47
53
```
48
```
54
conda env create -f environment.yml
49
conda env create -f environment.yml
55
conda activate shepherd
50
conda activate shepherd
56
bash install_pyg.sh
51
bash install_pyg.sh
57
```
52
```
58
53
59
### :three: Download Datasets
54
### :three: Download Datasets
60
55
61
The data is hosted on [Harvard Dataverse](https://doi.org/10.7910/DVN/TZTPFL). To maintain the directory structure while downloading the files, make sure to select all files and download in the original format. Make sure to also unzip all files in the download (e.g. [this file](https://dataverse.harvard.edu/file.xhtml?fileId=6697676&version=2.0))
56
The data is hosted on [Harvard Dataverse](https://doi.org/10.7910/DVN/TZTPFL). To maintain the directory structure while downloading the files, make sure to select all files and download in the original format. Make sure to also unzip all files in the download (e.g. [this file](https://dataverse.harvard.edu/file.xhtml?fileId=6697676&version=2.0))
62
57
63
We provide the following datasets for training SHEPHERD:
58
We provide the following datasets for training SHEPHERD:
64
- Rare disease knowledge graph
59
- Rare disease knowledge graph
65
- Disease-split train and validation sets for simulated patients
60
- Disease-split train and validation sets for simulated patients
66
- MyGene2 patients
61
- MyGene2 patients
67
62
68
More details about the simulated rare disease patients can be found [here](https://github.com/EmilyAlsentzer/rare-disease-simulation). We are unfortunately unable to provide the UDN patients due to patient privacy concerns.
63
More details about the simulated rare disease patients can be found [here](https://github.com/EmilyAlsentzer/rare-disease-simulation). We are unfortunately unable to provide the UDN patients due to patient privacy concerns.
69
64
70
The rare disease knowledge graph and patient datasets are provided in the appropriate format for SHEPHERD. If you would like to add your own set of patients, please adhere to the format used in the MyGene2 and simulated rare disease patients' files (see [README](https://github.com/mims-harvard/SHEPHERD/blob/main/data_prep/README.md) in `data_prep` folder for more details). The file should be structured as a `jsonlines` file, where each json (i.e., line in the file) contains information for a single patient. Each json must contain at least the following elements: patient ID ("id"), a list of phenotypes present in the patient as HPO terms ("positive_phenotypes"), and a list of causal genes as Ensembl IDs ("true_genes"). To run causal gene discovery, the json must also include a list of all candidate genes as Ensembl IDs ("all_candidate_genes"). To run novel disease characterization, the json must also include a list of true disease names as MONDO IDs ("true_diseases").
65
The rare disease knowledge graph and patient datasets are provided in the appropriate format for SHEPHERD. If you would like to add your own set of patients, please adhere to the format used in the MyGene2 and simulated rare disease patients' files (see [README](https://github.com/mims-harvard/SHEPHERD/blob/main/data_prep/README.md) in `data_prep` folder for more details). The file should be structured as a `jsonlines` file, where each json (i.e., line in the file) contains information for a single patient. Each json must contain at least the following elements: patient ID ("id"), a list of phenotypes present in the patient as HPO terms ("positive_phenotypes"), and a list of causal genes as Ensembl IDs ("true_genes"). To run causal gene discovery, the json must also include a list of all candidate genes as Ensembl IDs ("all_candidate_genes"). To run novel disease characterization, the json must also include a list of true disease names as MONDO IDs ("true_diseases").
71
66
72
### :four: Set Configuration File
67
### :four: Set Configuration File
73
68
74
Go to `project_config.py` and set the project directory (`PROJECT_DIR`) to be the path to the data folder downloaded in the previous step.
69
Go to `project_config.py` and set the project directory (`PROJECT_DIR`) to be the path to the data folder downloaded in the previous step.
75
70
76
If you would like to use your own data, be sure to
71
If you would like to use your own data, be sure to
77
1. Modify the data variables in `project_config.py` in lines 10-16.
72
1. Modify the data variables in `project_config.py` in lines 10-16.
78
2. Generate the required shortest path length data files for your patients using the code and instructions in `data_prep/shortest_paths`
73
2. Generate the required shortest path length data files for your patients using the code and instructions in `data_prep/shortest_paths`
79
74
80
75
81
### :five: (Optional) Download Model Checkpoints
76
### :five: (Optional) Download Model Checkpoints
82
We also provide checkpoints for SHEPHERD after pretraining and after training on the rare disease diagnosis tasks. The checkpoints for SHEPHERD can be found [here](https://figshare.com/articles/software/SHEPHERD/21444873). You'll need to move them to the directory specified by `project_config.PROJECT_DIR / 'checkpoints'` (see above step). Make sure all downloaded files are unzipped. You can use these checkpoints directly with the `predict.py` scripts below instead of training the models yourself.
77
We also provide checkpoints for SHEPHERD after pretraining and after training on the rare disease diagnosis tasks. The checkpoints for SHEPHERD can be found [here](https://figshare.com/articles/software/SHEPHERD/21444873). You'll need to move them to the directory specified by `project_config.PROJECT_DIR / 'checkpoints'` (see above step). Make sure all downloaded files are unzipped. You can use these checkpoints directly with the `predict.py` scripts below instead of training the models yourself.
83
78
84
79
85
## Usage
80
## Usage
86
81
87
### Run SHEPHERD on Your Own Patient Cohort
82
### Run SHEPHERD on Your Own Patient Cohort
88
83
89
You can run SHEPHERD on your own patient cohort by using our provided model checkpoints (i.e., no re-training needed). Please review this [README](https://github.com/mims-harvard/SHEPHERD/blob/main/Inference-README.md) to learn how to preprocess and run SHEPHERD on your own patient dataset.
84
You can run SHEPHERD on your own patient cohort by using our provided model checkpoints (i.e., no re-training needed). Please review this [README](https://github.com/mims-harvard/SHEPHERD/blob/main/Inference-README.md) to learn how to preprocess and run SHEPHERD on your own patient dataset.
90
85
91
### Pretrain on Rare Disease KG
86
### Pretrain on Rare Disease KG
92
87
93
You can reproduce our pretraining results or pretrain SHEPHERD on your own knowledge graph:
88
You can reproduce our pretraining results or pretrain SHEPHERD on your own knowledge graph:
94
```
89
```
95
cd shepherd
90
cd shepherd
96
python pretrain.py \
91
python pretrain.py \
97
        --edgelist KG_edgelist_mask.txt \
92
        --edgelist KG_edgelist_mask.txt \
98
        --node_map KG_node_map.txt \
93
        --node_map KG_node_map.txt \
99
        --save_dir checkpoints/
94
        --save_dir checkpoints/
100
```
95
```
101
96
102
To see and/or modify the default hyperparameters, please see the `get_pretrain_hparams()` function in `shepherd/hparams.py`.
97
To see and/or modify the default hyperparameters, please see the `get_pretrain_hparams()` function in `shepherd/hparams.py`.
103
98
104
An example bash script is provided in `shepherd/run_pretrain.sh`.
99
An example bash script is provided in `shepherd/run_pretrain.sh`.
105
100
106
### Train SHEPHERD
101
### Train SHEPHERD
107
102
108
:sparkles: To train SHEPHERD for causal gene discovery:
103
:sparkles: To train SHEPHERD for causal gene discovery:
109
104
110
```
105
```
111
cd shepherd
106
cd shepherd
112
python train.py \
107
python train.py \
113
        --edgelist KG_edgelist_mask.txt \
108
        --edgelist KG_edgelist_mask.txt \
114
        --node_map KG_node_map.txt \
109
        --node_map KG_node_map.txt \
115
        --patient_data disease_simulated \
110
        --patient_data disease_simulated \
116
        --run_type causal_gene_discovery \
111
        --run_type causal_gene_discovery \
117
        --saved_node_embeddings_path checkpoints/<BEST_PRETRAIN_CHECKPOINT>.ckpt
112
        --saved_node_embeddings_path checkpoints/<BEST_PRETRAIN_CHECKPOINT>.ckpt
118
```
113
```
119
114
120
An example bash script is provided in `shepherd/run_causal_gene_discovery.sh`.
115
An example bash script is provided in `shepherd/run_causal_gene_discovery.sh`.
121
116
122
:sparkles: To train SHEPHERD for patients-like-me identification:
117
:sparkles: To train SHEPHERD for patients-like-me identification:
123
118
124
```
119
```
125
cd shepherd
120
cd shepherd
126
python train.py \
121
python train.py \
127
        --edgelist KG_edgelist_mask.txt \
122
        --edgelist KG_edgelist_mask.txt \
128
        --node_map KG_node_map.txt \
123
        --node_map KG_node_map.txt \
129
        --patient_data disease_simulated \
124
        --patient_data disease_simulated \
130
        --run_type patients_like_me \
125
        --run_type patients_like_me \
131
        --saved_node_embeddings_path checkpoints/<BEST_PRETRAIN_CHECKPOINT>.ckpt
126
        --saved_node_embeddings_path checkpoints/<BEST_PRETRAIN_CHECKPOINT>.ckpt
132
```
127
```
133
128
134
An example bash script is provided in `shepherd/run_patients_like_me.sh`.
129
An example bash script is provided in `shepherd/run_patients_like_me.sh`.
135
130
136
:sparkles: To train SHEPHERD for novel disease characterization:
131
:sparkles: To train SHEPHERD for novel disease characterization:
137
132
138
```
133
```
139
cd shepherd
134
cd shepherd
140
python train.py \
135
python train.py \
141
        --edgelist KG_edgelist_mask.txt \
136
        --edgelist KG_edgelist_mask.txt \
142
        --node_map KG_node_map.txt \
137
        --node_map KG_node_map.txt \
143
        --patient_data disease_simulated \
138
        --patient_data disease_simulated \
144
        --run_type disease_characterization \
139
        --run_type disease_characterization \
145
        --saved_node_embeddings_path checkpoints/<BEST_PRETRAIN_CHECKPOINT>.ckpt
140
        --saved_node_embeddings_path checkpoints/<BEST_PRETRAIN_CHECKPOINT>.ckpt
146
```
141
```
147
142
148
An example bash script is provided in `shepherd/run_disease_characterization.sh`.
143
An example bash script is provided in `shepherd/run_disease_characterization.sh`.
149
144
150
To see and/or modify the default hyperparameters, please see the `get_train_hparams()` function in `shepherd/hparams.py`.
145
To see and/or modify the default hyperparameters, please see the `get_train_hparams()` function in `shepherd/hparams.py`.
151
146
152
### Report SHEPHERD Performance Metrics on Test Patient Dataset
147
### Report SHEPHERD Performance Metrics on Test Patient Dataset
153
148
154
After training SHEPHERD, you can calculate SHEPHERD's performance on a test patient dataset. Simply run the same command used to train the model with the additional flags: `--do_inference` and `--best_ckpt <PATH/TO/BEST_MODEL_CHECKPOINT.ckpt>`.
149
After training SHEPHERD, you can calculate SHEPHERD's performance on a test patient dataset. Simply run the same command used to train the model with the additional flags: `--do_inference` and `--best_ckpt <PATH/TO/BEST_MODEL_CHECKPOINT.ckpt>`.
155
150
156
### Generate Predictions for Patients
151
### Generate Predictions for Patients
157
152
158
After training SHEPHERD (you may also simply use our already-trained models), you can generate predictions for patients (without performance metrics). An example bash script can be found [here](https://github.com/mims-harvard/SHEPHERD/blob/main/shepherd/run_predict.sh).
153
After training SHEPHERD (you may also simply use our already-trained models), you can generate predictions for patients (without performance metrics). An example bash script can be found [here](https://github.com/mims-harvard/SHEPHERD/blob/main/shepherd/run_predict.sh).
159
154
160
The results of the `predict.py` script are found in 
155
The results of the `predict.py` script are found in 
161
```
156
```
162
project_config.PROJECT_RESULTS/<TASK>/<RUN_NAME>/<DATASET_NAME>
157
project_config.PROJECT_RESULTS/<TASK>/<RUN_NAME>/<DATASET_NAME>
163
```
158
```
164
where
159
where
165
- `<TASK>` is `causal_gene_discovery`, `patients_like_me`, or `disease_characterization`
160
- `<TASK>` is `causal_gene_discovery`, `patients_like_me`, or `disease_characterization`
166
- `<RUN_NAME>` is the name of the run created during training
161
- `<RUN_NAME>` is the name of the run created during training
167
- `<DATASET_NAME>` is the name of your patient cohort
162
- `<DATASET_NAME>` is the name of your patient cohort
168
163
169
:sparkles: To run causal gene discovery:
164
:sparkles: To run causal gene discovery:
170
165
171
```
166
```
172
cd shepherd
167
cd shepherd
173
python predict.py \
168
python predict.py \
174
        --run_type causal_gene_discovery \
169
        --run_type causal_gene_discovery \
175
        --patient_data <TEST_DATA> \
170
        --patient_data <TEST_DATA> \
176
        --edgelist KG_edgelist_mask.txt \
171
        --edgelist KG_edgelist_mask.txt \
177
        --node_map KG_node_map.txt \
172
        --node_map KG_node_map.txt \
178
        --saved_node_embeddings_path checkpoints/<BEST_PRETRAIN_CHECKPOINT>.ckpt \
173
        --saved_node_embeddings_path checkpoints/<BEST_PRETRAIN_CHECKPOINT>.ckpt \
179
        --best_ckpt PATH/TO/BEST_MODEL_CHECKPOINT.ckpt 
174
        --best_ckpt PATH/TO/BEST_MODEL_CHECKPOINT.ckpt 
180
```
175
```
181
To generate predictions on your own dataset, please use `--patient_data my_data`. To generate predictions on simulated test patients, please use `--patient_data test_predict`. If using the provided checkpoint models, `checkpoints/<BEST_PRETRAIN_CHECKPOINT>.ckpt` should be `checkpoints/pretrain.ckpt` and `PATH/TO/BEST_MODEL_CHECKPOINT.ckpt` should be `checkpoints/causal_gene_discovery.ckpt`.
176
To generate predictions on your own dataset, please use `--patient_data my_data`. To generate predictions on simulated test patients, please use `--patient_data test_predict`. If using the provided checkpoint models, `checkpoints/<BEST_PRETRAIN_CHECKPOINT>.ckpt` should be `checkpoints/pretrain.ckpt` and `PATH/TO/BEST_MODEL_CHECKPOINT.ckpt` should be `checkpoints/causal_gene_discovery.ckpt`.
182
177
183
:sparkles: To run patients-like-me identification:
178
:sparkles: To run patients-like-me identification:
184
179
185
```
180
```
186
cd shepherd
181
cd shepherd
187
python predict.py \
182
python predict.py \
188
        --run_type patients_like_me \
183
        --run_type patients_like_me \
189
        --patient_data <TEST_DATA> \
184
        --patient_data <TEST_DATA> \
190
        --edgelist KG_edgelist_mask.txt \
185
        --edgelist KG_edgelist_mask.txt \
191
        --node_map KG_node_map.txt \
186
        --node_map KG_node_map.txt \
192
        --saved_node_embeddings_path checkpoints/<BEST_PRETRAIN_CHECKPOINT>.ckpt \
187
        --saved_node_embeddings_path checkpoints/<BEST_PRETRAIN_CHECKPOINT>.ckpt \
193
        --best_ckpt PATH/TO/BEST_MODEL_CHECKPOINT.ckpt 
188
        --best_ckpt PATH/TO/BEST_MODEL_CHECKPOINT.ckpt 
194
```
189
```
195
To generate predictions on your own dataset, please use `--patient_data my_data`. To generate predictions on simulated test patients, please use `--patient_data test_predict`. If using the provided checkpoint models, `checkpoints/<BEST_PRETRAIN_CHECKPOINT>.ckpt` should be `checkpoints/pretrain.ckpt` and `PATH/TO/BEST_MODEL_CHECKPOINT.ckpt` should be `checkpoints/patients_like_me.ckpt`.
190
To generate predictions on your own dataset, please use `--patient_data my_data`. To generate predictions on simulated test patients, please use `--patient_data test_predict`. If using the provided checkpoint models, `checkpoints/<BEST_PRETRAIN_CHECKPOINT>.ckpt` should be `checkpoints/pretrain.ckpt` and `PATH/TO/BEST_MODEL_CHECKPOINT.ckpt` should be `checkpoints/patients_like_me.ckpt`.
196
191
197
:sparkles: To run novel disease characterization:
192
:sparkles: To run novel disease characterization:
198
193
199
```
194
```
200
cd shepherd
195
cd shepherd
201
python predict.py \
196
python predict.py \
202
        --run_type disease_characterization \
197
        --run_type disease_characterization \
203
        --patient_data <TEST_DATA> \
198
        --patient_data <TEST_DATA> \
204
        --edgelist KG_edgelist_mask.txt \
199
        --edgelist KG_edgelist_mask.txt \
205
        --node_map KG_node_map.txt \
200
        --node_map KG_node_map.txt \
206
        --saved_node_embeddings_path checkpoints/<BEST_PRETRAIN_CHECKPOINT>.ckpt \
201
        --saved_node_embeddings_path checkpoints/<BEST_PRETRAIN_CHECKPOINT>.ckpt \
207
        --best_ckpt PATH/TO/BEST_MODEL_CHECKPOINT.ckpt 
202
        --best_ckpt PATH/TO/BEST_MODEL_CHECKPOINT.ckpt 
208
```
203
```
209
To generate predictions on your own dataset, please use `--patient_data my_data`. To generate predictions on simulated test patients, please use `--patient_data test_predict`. If using the provided checkpoint models, `checkpoints/<BEST_PRETRAIN_CHECKPOINT>.ckpt` should be `checkpoints/pretrain.ckpt` and `PATH/TO/BEST_MODEL_CHECKPOINT.ckpt` should be `checkpoints/disease_characterization.ckpt`.
204
To generate predictions on your own dataset, please use `--patient_data my_data`. To generate predictions on simulated test patients, please use `--patient_data test_predict`. If using the provided checkpoint models, `checkpoints/<BEST_PRETRAIN_CHECKPOINT>.ckpt` should be `checkpoints/pretrain.ckpt` and `PATH/TO/BEST_MODEL_CHECKPOINT.ckpt` should be `checkpoints/disease_characterization.ckpt`.
210
205
211
To see and/or modify the default hyperparameters, please see the `get_predict_hparams()` function in `shepherd/hparams.py`.
206
To see and/or modify the default hyperparameters, please see the `get_predict_hparams()` function in `shepherd/hparams.py`.
212
207
213
## Manuscript
208
## Manuscript
214
209
215
```
210
```
216
@article{shepherd,
211
@article{shepherd,
217
  title={Few shot learning for phenotype-driven diagnosis of patients with rare genetic diseases},
212
  title={Few shot learning for phenotype-driven diagnosis of patients with rare genetic diseases},
218
  author={Alsentzer, Emily and Li, Michelle M. and Kobren, Shilpa and Noori, Ayush and Undiagnosed Diseases Network and Kohane, Isaac S. and Zitnik, Marinka},
213
  author={Alsentzer, Emily and Li, Michelle M. and Kobren, Shilpa and Noori, Ayush and Undiagnosed Diseases Network and Kohane, Isaac S. and Zitnik, Marinka},
219
  journal={medRxiv},
214
  journal={medRxiv},
220
  year={2024}
215
  year={2024}
221
}
216
}
222
```
217
```
223
218
224
## Questions
219
## Questions
225
220
226
Please leave a Github issue or contact Emily Alsentzer at ealsentzer@bwh.harvard.edu and Michelle Li at michelleli@g.harvard.edu.
221
Please leave a Github issue or contact Emily Alsentzer at ealsentzer@bwh.harvard.edu and Michelle Li at michelleli@g.harvard.edu.