1879 lines (1878 with data), 63.3 kB
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Basic API Usage of KGWAS\n",
"\n",
"KGWAS consists of two main class `KGWAS` and `KGWAS_Data`. `KGWAS` is the main class for the KGWAS model, and `KGWAS_Data` is the class for the data manipulation. In default, to ensure fast user experience, we provide a default fast mode of KGWAS, which uses Enformer embedding for variant feature and ESM embedding for gene features (instead of the baselineLD for variant and PoPS for gene since they are large files). For the fast mode, you do not need to download any data, the KGWAS API will automatically download the relevant files. This mode can be used to apply KGWAS to your own GWAS sumstats. "
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"All required data files are present.\n",
"--loading KG---\n",
"--using enformer SNP embedding--\n",
"--using random go embedding--\n",
"--using ESM gene embedding--\n"
]
}
],
"source": [
"import sys\n",
"sys.path.append('../')\n",
"\n",
"from kgwas import KGWAS, KGWAS_Data\n",
"data = KGWAS_Data(data_path = './data/')\n",
"data.load_kg()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, the data needed for training is downloaded from the server and the knowledge graph is loaded. Next, we load the GWAS file. Here, we are using an example GWAS file, which is also automatically downloaded from the server. But you can also use your own GWAS file. The GWAS file should be in the format of a pandas DataFrame with columns `CHR`/`#CHROM`, `SNP`, `P`, `N`. Note that at the moment, our knowledge graph is UKBioBank directly genotyped variant set so it will automatically takes the overlap with the KG. Current efforts are underway for improving the coverage of the KG."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Loading example GWAS file...\n",
"Example file already exists locally.\n",
"Loading GWAS file from ./data/biochemistry_Creatinine_fastgwa_full_10000_1.fastGWA...\n",
"Number of SNPs in the KG: 784256\n",
"Number of SNPs in the GWAS: 542758\n",
"Number of SNPs in the KG variant set: 542758\n",
"Using ldsc weight...\n",
"ldsc_weight mean: 0.9999999999999993\n"
]
}
],
"source": [
"data.load_external_gwas(example_file = True)\n",
"data.process_gwas_file()\n",
"data.prepare_split()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>#CHROM</th>\n",
" <th>ID</th>\n",
" <th>POS</th>\n",
" <th>A1</th>\n",
" <th>A2</th>\n",
" <th>N</th>\n",
" <th>AF1</th>\n",
" <th>BETA</th>\n",
" <th>SE</th>\n",
" <th>P</th>\n",
" <th>ld_score</th>\n",
" <th>w_ld_score</th>\n",
" <th>y</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>rs3131962</td>\n",
" <td>756604</td>\n",
" <td>A</td>\n",
" <td>G</td>\n",
" <td>9988</td>\n",
" <td>0.131007</td>\n",
" <td>-0.117134</td>\n",
" <td>0.246231</td>\n",
" <td>0.634282</td>\n",
" <td>72.862240</td>\n",
" <td>4.474788</td>\n",
" <td>0.226298</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>rs12562034</td>\n",
" <td>768448</td>\n",
" <td>A</td>\n",
" <td>G</td>\n",
" <td>9978</td>\n",
" <td>0.104981</td>\n",
" <td>-0.064894</td>\n",
" <td>0.273746</td>\n",
" <td>0.812611</td>\n",
" <td>34.749233</td>\n",
" <td>1.877341</td>\n",
" <td>0.056197</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1</td>\n",
" <td>rs4040617</td>\n",
" <td>779322</td>\n",
" <td>G</td>\n",
" <td>A</td>\n",
" <td>9975</td>\n",
" <td>0.129123</td>\n",
" <td>-0.001462</td>\n",
" <td>0.247254</td>\n",
" <td>0.995281</td>\n",
" <td>72.271390</td>\n",
" <td>4.208873</td>\n",
" <td>0.000035</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1</td>\n",
" <td>rs79373928</td>\n",
" <td>801536</td>\n",
" <td>G</td>\n",
" <td>T</td>\n",
" <td>9994</td>\n",
" <td>0.014659</td>\n",
" <td>0.081544</td>\n",
" <td>0.688261</td>\n",
" <td>0.905688</td>\n",
" <td>16.740126</td>\n",
" <td>1.949177</td>\n",
" <td>0.014037</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1</td>\n",
" <td>rs11240779</td>\n",
" <td>808631</td>\n",
" <td>G</td>\n",
" <td>A</td>\n",
" <td>9919</td>\n",
" <td>0.226737</td>\n",
" <td>-0.184268</td>\n",
" <td>0.198982</td>\n",
" <td>0.354418</td>\n",
" <td>50.215000</td>\n",
" <td>2.825456</td>\n",
" <td>0.857575</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>542753</th>\n",
" <td>22</td>\n",
" <td>rs73174435</td>\n",
" <td>51174939</td>\n",
" <td>T</td>\n",
" <td>C</td>\n",
" <td>9979</td>\n",
" <td>0.056118</td>\n",
" <td>-0.158762</td>\n",
" <td>0.362390</td>\n",
" <td>0.661316</td>\n",
" <td>21.981667</td>\n",
" <td>1.363001</td>\n",
" <td>0.191929</td>\n",
" </tr>\n",
" <tr>\n",
" <th>542754</th>\n",
" <td>22</td>\n",
" <td>rs3810648</td>\n",
" <td>51175626</td>\n",
" <td>G</td>\n",
" <td>A</td>\n",
" <td>9931</td>\n",
" <td>0.058856</td>\n",
" <td>0.272493</td>\n",
" <td>0.352508</td>\n",
" <td>0.439515</td>\n",
" <td>34.619377</td>\n",
" <td>1.804193</td>\n",
" <td>0.597548</td>\n",
" </tr>\n",
" <tr>\n",
" <th>542755</th>\n",
" <td>22</td>\n",
" <td>rs5771002</td>\n",
" <td>51183255</td>\n",
" <td>A</td>\n",
" <td>G</td>\n",
" <td>9840</td>\n",
" <td>0.333638</td>\n",
" <td>0.116325</td>\n",
" <td>0.175675</td>\n",
" <td>0.507869</td>\n",
" <td>16.231083</td>\n",
" <td>1.273770</td>\n",
" <td>0.438456</td>\n",
" </tr>\n",
" <tr>\n",
" <th>542756</th>\n",
" <td>22</td>\n",
" <td>rs3865764</td>\n",
" <td>51185848</td>\n",
" <td>G</td>\n",
" <td>A</td>\n",
" <td>9974</td>\n",
" <td>0.051133</td>\n",
" <td>-0.026670</td>\n",
" <td>0.376132</td>\n",
" <td>0.943472</td>\n",
" <td>18.649513</td>\n",
" <td>1.010000</td>\n",
" <td>0.005028</td>\n",
" </tr>\n",
" <tr>\n",
" <th>542757</th>\n",
" <td>22</td>\n",
" <td>rs142680588</td>\n",
" <td>51193629</td>\n",
" <td>G</td>\n",
" <td>A</td>\n",
" <td>9981</td>\n",
" <td>0.076595</td>\n",
" <td>-0.109532</td>\n",
" <td>0.312971</td>\n",
" <td>0.726358</td>\n",
" <td>52.471287</td>\n",
" <td>1.873861</td>\n",
" <td>0.122482</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>542758 rows × 13 columns</p>\n",
"</div>"
],
"text/plain": [
" #CHROM ID POS A1 A2 N AF1 BETA \\\n",
"0 1 rs3131962 756604 A G 9988 0.131007 -0.117134 \n",
"1 1 rs12562034 768448 A G 9978 0.104981 -0.064894 \n",
"2 1 rs4040617 779322 G A 9975 0.129123 -0.001462 \n",
"3 1 rs79373928 801536 G T 9994 0.014659 0.081544 \n",
"4 1 rs11240779 808631 G A 9919 0.226737 -0.184268 \n",
"... ... ... ... .. .. ... ... ... \n",
"542753 22 rs73174435 51174939 T C 9979 0.056118 -0.158762 \n",
"542754 22 rs3810648 51175626 G A 9931 0.058856 0.272493 \n",
"542755 22 rs5771002 51183255 A G 9840 0.333638 0.116325 \n",
"542756 22 rs3865764 51185848 G A 9974 0.051133 -0.026670 \n",
"542757 22 rs142680588 51193629 G A 9981 0.076595 -0.109532 \n",
"\n",
" SE P ld_score w_ld_score y \n",
"0 0.246231 0.634282 72.862240 4.474788 0.226298 \n",
"1 0.273746 0.812611 34.749233 1.877341 0.056197 \n",
"2 0.247254 0.995281 72.271390 4.208873 0.000035 \n",
"3 0.688261 0.905688 16.740126 1.949177 0.014037 \n",
"4 0.198982 0.354418 50.215000 2.825456 0.857575 \n",
"... ... ... ... ... ... \n",
"542753 0.362390 0.661316 21.981667 1.363001 0.191929 \n",
"542754 0.352508 0.439515 34.619377 1.804193 0.597548 \n",
"542755 0.175675 0.507869 16.231083 1.273770 0.438456 \n",
"542756 0.376132 0.943472 18.649513 1.010000 0.005028 \n",
"542757 0.312971 0.726358 52.471287 1.873861 0.122482 \n",
"\n",
"[542758 rows x 13 columns]"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.lr_uni"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, we are ready to train the model! Here we are using epoch = 1 for the demo purpose, but in reality, you should use a higher number of epochs for better performance."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Creating data loader...\n",
"Start Training...\n",
"Training Progress Epoch 1/1: 52%|█████▏ | 500/956 [12:56<15:47, 2.08s/it]Epoch 1 Step 501 Train Loss: 1.8115\n",
"Training Progress Epoch 1/1: 100%|██████████| 956/956 [24:26<00:00, 1.53s/it]\n",
"100%|██████████| 50/50 [00:58<00:00, 1.17s/it]\n",
"Epoch 1: Validation MSE: 2.1730 Validation Pearson: 0.0096. \n",
"Saving models to ./data//model/test\n",
"100%|██████████| 54/54 [00:56<00:00, 1.04s/it]\n",
"100%|██████████| 1061/1061 [05:40<00:00, 3.11it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"KGWAS prediction and p-values saved to ./data//model_pred/new_experiments/test_pred.csv\n"
]
}
],
"source": [
"run = KGWAS(data, device = 'cuda:9', exp_name = 'test')\n",
"run.initialize_model()\n",
"run.train(epoch = 1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The output of the model is saved to `/model_pred/new_experiments/{exp_name}_pred.csv`. You can also load it via `run.kgwas_res`. The model is also saved to `/model/{exp_name}`."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>#CHROM</th>\n",
" <th>ID</th>\n",
" <th>POS</th>\n",
" <th>A1</th>\n",
" <th>A2</th>\n",
" <th>N</th>\n",
" <th>AF1</th>\n",
" <th>BETA</th>\n",
" <th>SE</th>\n",
" <th>P</th>\n",
" <th>ld_score</th>\n",
" <th>w_ld_score</th>\n",
" <th>y</th>\n",
" <th>pred</th>\n",
" <th>P_weighted</th>\n",
" <th>KGWAS_P</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>rs3131962</td>\n",
" <td>756604</td>\n",
" <td>A</td>\n",
" <td>G</td>\n",
" <td>9988</td>\n",
" <td>0.131007</td>\n",
" <td>-0.117134</td>\n",
" <td>0.246231</td>\n",
" <td>0.634282</td>\n",
" <td>72.862240</td>\n",
" <td>4.474788</td>\n",
" <td>0.226298</td>\n",
" <td>1.082365</td>\n",
" <td>0.234167</td>\n",
" <td>0.346428</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>rs12562034</td>\n",
" <td>768448</td>\n",
" <td>A</td>\n",
" <td>G</td>\n",
" <td>9978</td>\n",
" <td>0.104981</td>\n",
" <td>-0.064894</td>\n",
" <td>0.273746</td>\n",
" <td>0.812611</td>\n",
" <td>34.749233</td>\n",
" <td>1.877341</td>\n",
" <td>0.056197</td>\n",
" <td>1.087724</td>\n",
" <td>0.382894</td>\n",
" <td>0.566456</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1</td>\n",
" <td>rs4040617</td>\n",
" <td>779322</td>\n",
" <td>G</td>\n",
" <td>A</td>\n",
" <td>9975</td>\n",
" <td>0.129123</td>\n",
" <td>-0.001462</td>\n",
" <td>0.247254</td>\n",
" <td>0.995281</td>\n",
" <td>72.271390</td>\n",
" <td>4.208873</td>\n",
" <td>0.000035</td>\n",
" <td>1.058530</td>\n",
" <td>0.995281</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1</td>\n",
" <td>rs79373928</td>\n",
" <td>801536</td>\n",
" <td>G</td>\n",
" <td>T</td>\n",
" <td>9994</td>\n",
" <td>0.014659</td>\n",
" <td>0.081544</td>\n",
" <td>0.688261</td>\n",
" <td>0.905688</td>\n",
" <td>16.740126</td>\n",
" <td>1.949177</td>\n",
" <td>0.014037</td>\n",
" <td>1.105125</td>\n",
" <td>0.225107</td>\n",
" <td>0.333025</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1</td>\n",
" <td>rs11240779</td>\n",
" <td>808631</td>\n",
" <td>G</td>\n",
" <td>A</td>\n",
" <td>9919</td>\n",
" <td>0.226737</td>\n",
" <td>-0.184268</td>\n",
" <td>0.198982</td>\n",
" <td>0.354418</td>\n",
" <td>50.215000</td>\n",
" <td>2.825456</td>\n",
" <td>0.857575</td>\n",
" <td>1.081468</td>\n",
" <td>0.041646</td>\n",
" <td>0.061612</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>542753</th>\n",
" <td>22</td>\n",
" <td>rs73174435</td>\n",
" <td>51174939</td>\n",
" <td>T</td>\n",
" <td>C</td>\n",
" <td>9979</td>\n",
" <td>0.056118</td>\n",
" <td>-0.158762</td>\n",
" <td>0.362390</td>\n",
" <td>0.661316</td>\n",
" <td>21.981667</td>\n",
" <td>1.363001</td>\n",
" <td>0.191929</td>\n",
" <td>1.008835</td>\n",
" <td>0.233609</td>\n",
" <td>0.345602</td>\n",
" </tr>\n",
" <tr>\n",
" <th>542754</th>\n",
" <td>22</td>\n",
" <td>rs3810648</td>\n",
" <td>51175626</td>\n",
" <td>G</td>\n",
" <td>A</td>\n",
" <td>9931</td>\n",
" <td>0.058856</td>\n",
" <td>0.272493</td>\n",
" <td>0.352508</td>\n",
" <td>0.439515</td>\n",
" <td>34.619377</td>\n",
" <td>1.804193</td>\n",
" <td>0.597548</td>\n",
" <td>1.034187</td>\n",
" <td>0.439515</td>\n",
" <td>0.650221</td>\n",
" </tr>\n",
" <tr>\n",
" <th>542755</th>\n",
" <td>22</td>\n",
" <td>rs5771002</td>\n",
" <td>51183255</td>\n",
" <td>A</td>\n",
" <td>G</td>\n",
" <td>9840</td>\n",
" <td>0.333638</td>\n",
" <td>0.116325</td>\n",
" <td>0.175675</td>\n",
" <td>0.507869</td>\n",
" <td>16.231083</td>\n",
" <td>1.273770</td>\n",
" <td>0.438456</td>\n",
" <td>1.093221</td>\n",
" <td>0.449038</td>\n",
" <td>0.66431</td>\n",
" </tr>\n",
" <tr>\n",
" <th>542756</th>\n",
" <td>22</td>\n",
" <td>rs3865764</td>\n",
" <td>51185848</td>\n",
" <td>G</td>\n",
" <td>A</td>\n",
" <td>9974</td>\n",
" <td>0.051133</td>\n",
" <td>-0.026670</td>\n",
" <td>0.376132</td>\n",
" <td>0.943472</td>\n",
" <td>18.649513</td>\n",
" <td>1.010000</td>\n",
" <td>0.005028</td>\n",
" <td>0.987747</td>\n",
" <td>0.943472</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>542757</th>\n",
" <td>22</td>\n",
" <td>rs142680588</td>\n",
" <td>51193629</td>\n",
" <td>G</td>\n",
" <td>A</td>\n",
" <td>9981</td>\n",
" <td>0.076595</td>\n",
" <td>-0.109532</td>\n",
" <td>0.312971</td>\n",
" <td>0.726358</td>\n",
" <td>52.471287</td>\n",
" <td>1.873861</td>\n",
" <td>0.122482</td>\n",
" <td>1.082649</td>\n",
" <td>0.26816</td>\n",
" <td>0.396718</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>542758 rows × 16 columns</p>\n",
"</div>"
],
"text/plain": [
" #CHROM ID POS A1 A2 N AF1 BETA \\\n",
"0 1 rs3131962 756604 A G 9988 0.131007 -0.117134 \n",
"1 1 rs12562034 768448 A G 9978 0.104981 -0.064894 \n",
"2 1 rs4040617 779322 G A 9975 0.129123 -0.001462 \n",
"3 1 rs79373928 801536 G T 9994 0.014659 0.081544 \n",
"4 1 rs11240779 808631 G A 9919 0.226737 -0.184268 \n",
"... ... ... ... .. .. ... ... ... \n",
"542753 22 rs73174435 51174939 T C 9979 0.056118 -0.158762 \n",
"542754 22 rs3810648 51175626 G A 9931 0.058856 0.272493 \n",
"542755 22 rs5771002 51183255 A G 9840 0.333638 0.116325 \n",
"542756 22 rs3865764 51185848 G A 9974 0.051133 -0.026670 \n",
"542757 22 rs142680588 51193629 G A 9981 0.076595 -0.109532 \n",
"\n",
" SE P ld_score w_ld_score y pred \\\n",
"0 0.246231 0.634282 72.862240 4.474788 0.226298 1.082365 \n",
"1 0.273746 0.812611 34.749233 1.877341 0.056197 1.087724 \n",
"2 0.247254 0.995281 72.271390 4.208873 0.000035 1.058530 \n",
"3 0.688261 0.905688 16.740126 1.949177 0.014037 1.105125 \n",
"4 0.198982 0.354418 50.215000 2.825456 0.857575 1.081468 \n",
"... ... ... ... ... ... ... \n",
"542753 0.362390 0.661316 21.981667 1.363001 0.191929 1.008835 \n",
"542754 0.352508 0.439515 34.619377 1.804193 0.597548 1.034187 \n",
"542755 0.175675 0.507869 16.231083 1.273770 0.438456 1.093221 \n",
"542756 0.376132 0.943472 18.649513 1.010000 0.005028 0.987747 \n",
"542757 0.312971 0.726358 52.471287 1.873861 0.122482 1.082649 \n",
"\n",
" P_weighted KGWAS_P \n",
"0 0.234167 0.346428 \n",
"1 0.382894 0.566456 \n",
"2 0.995281 1 \n",
"3 0.225107 0.333025 \n",
"4 0.041646 0.061612 \n",
"... ... ... \n",
"542753 0.233609 0.345602 \n",
"542754 0.439515 0.650221 \n",
"542755 0.449038 0.66431 \n",
"542756 0.943472 1 \n",
"542757 0.26816 0.396718 \n",
"\n",
"[542758 rows x 16 columns]"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"run.kgwas_res"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If needed, you can load the pre-trained model via `run.load_pretrained()`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"run.load_pretrained('./data/model/test')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you want to (1) use the full mode of KGWAS (i.e. larger node embeddings) or (2) access the null/causal simulations or (3) access the 21 subsampled GWAS sumstats across various sample sizes or (4) analyze the KGWAS sumstats for subsampled data or (5) analyze the KGWAS sumstats for all UKBB ICD10 diseases, please use [this link](https://drive.google.com/file/d/14UcHzPRIbdMmnLPZCHx_4G-gz2pipeg9/view?usp=sharing). Note that this file is large (around 45GB) and may take a while to download. After unzipping it, you can use that directory as the data directory for the KGWAS API."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"All required data files are present.\n"
]
}
],
"source": [
"from kgwas import KGWAS, KGWAS_Data\n",
"data = KGWAS_Data(data_path = '/dfs/project/datasets/20220524-ukbiobank/data/kgwas_data/')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that you can use various variant, gene, and program embeddings. For example, for the result in the paper, we use the baselineLD for variant and PoPS for gene."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--loading KG---\n",
"--using baselineLD SNP embedding--\n",
"--using random go embedding--\n",
"--using PoPs expression+PPI+pathways gene embedding--\n"
]
}
],
"source": [
"data.load_kg(snp_init_emb = 'baselineLD', \n",
" go_init_emb = 'random',\n",
" gene_init_emb = 'pops')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are many alternative embeddings as well. \n",
"- For variant: `enformer` (default), `baselineLD`, `SLDSC`, `cadd`, `kg`, `random`\n",
"- For gene: `esm` (default), `pops_expression`, `pops`, `kg`, `random`\n",
"- For program/go: `random` (default), `biogpt`, `kg`\n",
"\n",
"In additional to more embeddings, the full data folder contains summary statistics used in each analysis in the paper. For example, for the simulations, you can load it via:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"All required data files are present.\n",
"Using simulation data....\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>#CHROM</th>\n",
" <th>ID</th>\n",
" <th>POS</th>\n",
" <th>A1</th>\n",
" <th>A2</th>\n",
" <th>N</th>\n",
" <th>AF1</th>\n",
" <th>BETA</th>\n",
" <th>SE</th>\n",
" <th>P</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>rs3131962</td>\n",
" <td>756604</td>\n",
" <td>A</td>\n",
" <td>G</td>\n",
" <td>4993</td>\n",
" <td>0.129882</td>\n",
" <td>14.559400</td>\n",
" <td>17.1871</td>\n",
" <td>0.396933</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>rs12562034</td>\n",
" <td>768448</td>\n",
" <td>A</td>\n",
" <td>G</td>\n",
" <td>4994</td>\n",
" <td>0.103124</td>\n",
" <td>-15.034400</td>\n",
" <td>19.0234</td>\n",
" <td>0.429345</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1</td>\n",
" <td>rs4040617</td>\n",
" <td>779322</td>\n",
" <td>G</td>\n",
" <td>A</td>\n",
" <td>4979</td>\n",
" <td>0.127435</td>\n",
" <td>15.537200</td>\n",
" <td>17.3933</td>\n",
" <td>0.371704</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1</td>\n",
" <td>rs79373928</td>\n",
" <td>801536</td>\n",
" <td>G</td>\n",
" <td>T</td>\n",
" <td>4996</td>\n",
" <td>0.015012</td>\n",
" <td>16.142600</td>\n",
" <td>47.7752</td>\n",
" <td>0.735448</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1</td>\n",
" <td>rs11240779</td>\n",
" <td>808631</td>\n",
" <td>G</td>\n",
" <td>A</td>\n",
" <td>4961</td>\n",
" <td>0.222233</td>\n",
" <td>0.859838</td>\n",
" <td>13.9158</td>\n",
" <td>0.950731</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>542753</th>\n",
" <td>22</td>\n",
" <td>rs73174435</td>\n",
" <td>51174939</td>\n",
" <td>T</td>\n",
" <td>C</td>\n",
" <td>4991</td>\n",
" <td>0.057103</td>\n",
" <td>53.082400</td>\n",
" <td>24.8130</td>\n",
" <td>0.032412</td>\n",
" </tr>\n",
" <tr>\n",
" <th>542754</th>\n",
" <td>22</td>\n",
" <td>rs3810648</td>\n",
" <td>51175626</td>\n",
" <td>G</td>\n",
" <td>A</td>\n",
" <td>4959</td>\n",
" <td>0.066243</td>\n",
" <td>17.689800</td>\n",
" <td>23.2562</td>\n",
" <td>0.446867</td>\n",
" </tr>\n",
" <tr>\n",
" <th>542755</th>\n",
" <td>22</td>\n",
" <td>rs5771002</td>\n",
" <td>51183255</td>\n",
" <td>A</td>\n",
" <td>G</td>\n",
" <td>4937</td>\n",
" <td>0.334414</td>\n",
" <td>-12.170400</td>\n",
" <td>12.3314</td>\n",
" <td>0.323670</td>\n",
" </tr>\n",
" <tr>\n",
" <th>542756</th>\n",
" <td>22</td>\n",
" <td>rs3865764</td>\n",
" <td>51185848</td>\n",
" <td>G</td>\n",
" <td>A</td>\n",
" <td>4984</td>\n",
" <td>0.050662</td>\n",
" <td>-43.871900</td>\n",
" <td>26.3007</td>\n",
" <td>0.095299</td>\n",
" </tr>\n",
" <tr>\n",
" <th>542757</th>\n",
" <td>22</td>\n",
" <td>rs142680588</td>\n",
" <td>51193629</td>\n",
" <td>G</td>\n",
" <td>A</td>\n",
" <td>4994</td>\n",
" <td>0.073388</td>\n",
" <td>11.338700</td>\n",
" <td>22.2066</td>\n",
" <td>0.609630</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>542758 rows × 10 columns</p>\n",
"</div>"
],
"text/plain": [
" #CHROM ID POS A1 A2 N AF1 BETA \\\n",
"0 1 rs3131962 756604 A G 4993 0.129882 14.559400 \n",
"1 1 rs12562034 768448 A G 4994 0.103124 -15.034400 \n",
"2 1 rs4040617 779322 G A 4979 0.127435 15.537200 \n",
"3 1 rs79373928 801536 G T 4996 0.015012 16.142600 \n",
"4 1 rs11240779 808631 G A 4961 0.222233 0.859838 \n",
"... ... ... ... .. .. ... ... ... \n",
"542753 22 rs73174435 51174939 T C 4991 0.057103 53.082400 \n",
"542754 22 rs3810648 51175626 G A 4959 0.066243 17.689800 \n",
"542755 22 rs5771002 51183255 A G 4937 0.334414 -12.170400 \n",
"542756 22 rs3865764 51185848 G A 4984 0.050662 -43.871900 \n",
"542757 22 rs142680588 51193629 G A 4994 0.073388 11.338700 \n",
"\n",
" SE P \n",
"0 17.1871 0.396933 \n",
"1 19.0234 0.429345 \n",
"2 17.3933 0.371704 \n",
"3 47.7752 0.735448 \n",
"4 13.9158 0.950731 \n",
"... ... ... \n",
"542753 24.8130 0.032412 \n",
"542754 23.2562 0.446867 \n",
"542755 12.3314 0.323670 \n",
"542756 26.3007 0.095299 \n",
"542757 22.2066 0.609630 \n",
"\n",
"[542758 rows x 10 columns]"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.load_simulation_gwas('causal', seed = 1) # seed can range from 1-500\n",
"data.lr_uni"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Similarly for null simulations, you can load it via:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Using simulation data....\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>#CHROM</th>\n",
" <th>ID</th>\n",
" <th>POS</th>\n",
" <th>A1</th>\n",
" <th>A2</th>\n",
" <th>N</th>\n",
" <th>AF1</th>\n",
" <th>BETA</th>\n",
" <th>SE</th>\n",
" <th>P</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>rs3131962</td>\n",
" <td>756604</td>\n",
" <td>A</td>\n",
" <td>G</td>\n",
" <td>4993</td>\n",
" <td>0.129882</td>\n",
" <td>-2.960260</td>\n",
" <td>7.66276</td>\n",
" <td>0.699261</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>rs12562034</td>\n",
" <td>768448</td>\n",
" <td>A</td>\n",
" <td>G</td>\n",
" <td>4994</td>\n",
" <td>0.103124</td>\n",
" <td>-19.335700</td>\n",
" <td>8.47710</td>\n",
" <td>0.022552</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1</td>\n",
" <td>rs4040617</td>\n",
" <td>779322</td>\n",
" <td>G</td>\n",
" <td>A</td>\n",
" <td>4979</td>\n",
" <td>0.127435</td>\n",
" <td>-3.287600</td>\n",
" <td>7.75475</td>\n",
" <td>0.671605</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1</td>\n",
" <td>rs79373928</td>\n",
" <td>801536</td>\n",
" <td>G</td>\n",
" <td>T</td>\n",
" <td>4996</td>\n",
" <td>0.015012</td>\n",
" <td>-12.530000</td>\n",
" <td>21.29860</td>\n",
" <td>0.556329</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1</td>\n",
" <td>rs11240779</td>\n",
" <td>808631</td>\n",
" <td>G</td>\n",
" <td>A</td>\n",
" <td>4961</td>\n",
" <td>0.222233</td>\n",
" <td>-8.564830</td>\n",
" <td>6.20273</td>\n",
" <td>0.167335</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>542753</th>\n",
" <td>22</td>\n",
" <td>rs73174435</td>\n",
" <td>51174939</td>\n",
" <td>T</td>\n",
" <td>C</td>\n",
" <td>4991</td>\n",
" <td>0.057103</td>\n",
" <td>-24.859400</td>\n",
" <td>11.06160</td>\n",
" <td>0.024617</td>\n",
" </tr>\n",
" <tr>\n",
" <th>542754</th>\n",
" <td>22</td>\n",
" <td>rs3810648</td>\n",
" <td>51175626</td>\n",
" <td>G</td>\n",
" <td>A</td>\n",
" <td>4959</td>\n",
" <td>0.066243</td>\n",
" <td>-0.725793</td>\n",
" <td>10.36870</td>\n",
" <td>0.944195</td>\n",
" </tr>\n",
" <tr>\n",
" <th>542755</th>\n",
" <td>22</td>\n",
" <td>rs5771002</td>\n",
" <td>51183255</td>\n",
" <td>A</td>\n",
" <td>G</td>\n",
" <td>4937</td>\n",
" <td>0.334414</td>\n",
" <td>-5.555300</td>\n",
" <td>5.49753</td>\n",
" <td>0.312251</td>\n",
" </tr>\n",
" <tr>\n",
" <th>542756</th>\n",
" <td>22</td>\n",
" <td>rs3865764</td>\n",
" <td>51185848</td>\n",
" <td>G</td>\n",
" <td>A</td>\n",
" <td>4984</td>\n",
" <td>0.050662</td>\n",
" <td>12.588200</td>\n",
" <td>11.72730</td>\n",
" <td>0.283085</td>\n",
" </tr>\n",
" <tr>\n",
" <th>542757</th>\n",
" <td>22</td>\n",
" <td>rs142680588</td>\n",
" <td>51193629</td>\n",
" <td>G</td>\n",
" <td>A</td>\n",
" <td>4994</td>\n",
" <td>0.073388</td>\n",
" <td>-13.533700</td>\n",
" <td>9.89851</td>\n",
" <td>0.171548</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>542758 rows × 10 columns</p>\n",
"</div>"
],
"text/plain": [
" #CHROM ID POS A1 A2 N AF1 BETA \\\n",
"0 1 rs3131962 756604 A G 4993 0.129882 -2.960260 \n",
"1 1 rs12562034 768448 A G 4994 0.103124 -19.335700 \n",
"2 1 rs4040617 779322 G A 4979 0.127435 -3.287600 \n",
"3 1 rs79373928 801536 G T 4996 0.015012 -12.530000 \n",
"4 1 rs11240779 808631 G A 4961 0.222233 -8.564830 \n",
"... ... ... ... .. .. ... ... ... \n",
"542753 22 rs73174435 51174939 T C 4991 0.057103 -24.859400 \n",
"542754 22 rs3810648 51175626 G A 4959 0.066243 -0.725793 \n",
"542755 22 rs5771002 51183255 A G 4937 0.334414 -5.555300 \n",
"542756 22 rs3865764 51185848 G A 4984 0.050662 12.588200 \n",
"542757 22 rs142680588 51193629 G A 4994 0.073388 -13.533700 \n",
"\n",
" SE P \n",
"0 7.66276 0.699261 \n",
"1 8.47710 0.022552 \n",
"2 7.75475 0.671605 \n",
"3 21.29860 0.556329 \n",
"4 6.20273 0.167335 \n",
"... ... ... \n",
"542753 11.06160 0.024617 \n",
"542754 10.36870 0.944195 \n",
"542755 5.49753 0.312251 \n",
"542756 11.72730 0.283085 \n",
"542757 9.89851 0.171548 \n",
"\n",
"[542758 rows x 10 columns]"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.load_simulation_gwas('null', seed = 1)# seed can range from 1-500\n",
"data.lr_uni"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, for the subsampling analysis, you can load any trait out of the 21 subsampled traits in various sample sizes across 5 replicates. The phenotype list can be accessed via:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['body_BALDING1',\n",
" 'disease_ALLERGY_ECZEMA_DIAGNOSED',\n",
" 'disease_HYPOTHYROIDISM_SELF_REP',\n",
" 'pigment_SUNBURN',\n",
" '21001',\n",
" '50',\n",
" '30080',\n",
" '30070',\n",
" '30010',\n",
" '30000',\n",
" 'biochemistry_AlkalinePhosphatase',\n",
" 'biochemistry_AspartateAminotransferase',\n",
" 'biochemistry_Cholesterol',\n",
" 'biochemistry_Creatinine',\n",
" 'biochemistry_IGF1',\n",
" 'biochemistry_Phosphate',\n",
" 'biochemistry_Testosterone_Male',\n",
" 'biochemistry_TotalBilirubin',\n",
" 'biochemistry_TotalProtein',\n",
" 'biochemistry_VitaminD',\n",
" 'bmd_HEEL_TSCOREz']"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.get_pheno_list()['21_indep_traits']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Usually each trait has the following sample sizes available: 1000, 2500, 5000, 7500, 10000, 50000, 100000, 200000. For example, to load body_BALDING1 at sample size 1000 at replicate 1, you can use:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>#CHROM</th>\n",
" <th>POS</th>\n",
" <th>ID</th>\n",
" <th>REF</th>\n",
" <th>ALT</th>\n",
" <th>A1</th>\n",
" <th>FIRTH?</th>\n",
" <th>TEST</th>\n",
" <th>OBS_CT</th>\n",
" <th>OR</th>\n",
" <th>LOG(OR)_SE</th>\n",
" <th>Z_STAT</th>\n",
" <th>P</th>\n",
" <th>ERRCODE</th>\n",
" <th>SNP</th>\n",
" <th>A2</th>\n",
" <th>N</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>756604</td>\n",
" <td>rs3131962</td>\n",
" <td>G</td>\n",
" <td>A</td>\n",
" <td>A</td>\n",
" <td>Y</td>\n",
" <td>ADD</td>\n",
" <td>999</td>\n",
" <td>1.241130</td>\n",
" <td>0.209870</td>\n",
" <td>1.029320</td>\n",
" <td>0.303330</td>\n",
" <td>.</td>\n",
" <td>rs3131962</td>\n",
" <td>G</td>\n",
" <td>999</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>768448</td>\n",
" <td>rs12562034</td>\n",
" <td>G</td>\n",
" <td>A</td>\n",
" <td>A</td>\n",
" <td>Y</td>\n",
" <td>ADD</td>\n",
" <td>996</td>\n",
" <td>0.433894</td>\n",
" <td>0.285912</td>\n",
" <td>-2.920330</td>\n",
" <td>0.003497</td>\n",
" <td>.</td>\n",
" <td>rs12562034</td>\n",
" <td>G</td>\n",
" <td>996</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1</td>\n",
" <td>779322</td>\n",
" <td>rs4040617</td>\n",
" <td>A</td>\n",
" <td>G</td>\n",
" <td>G</td>\n",
" <td>Y</td>\n",
" <td>ADD</td>\n",
" <td>996</td>\n",
" <td>1.178310</td>\n",
" <td>0.211892</td>\n",
" <td>0.774379</td>\n",
" <td>0.438707</td>\n",
" <td>.</td>\n",
" <td>rs4040617</td>\n",
" <td>A</td>\n",
" <td>996</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1</td>\n",
" <td>801536</td>\n",
" <td>rs79373928</td>\n",
" <td>T</td>\n",
" <td>G</td>\n",
" <td>G</td>\n",
" <td>Y</td>\n",
" <td>ADD</td>\n",
" <td>998</td>\n",
" <td>0.989852</td>\n",
" <td>0.479159</td>\n",
" <td>-0.021286</td>\n",
" <td>0.983018</td>\n",
" <td>.</td>\n",
" <td>rs79373928</td>\n",
" <td>T</td>\n",
" <td>998</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1</td>\n",
" <td>808631</td>\n",
" <td>rs11240779</td>\n",
" <td>A</td>\n",
" <td>G</td>\n",
" <td>G</td>\n",
" <td>Y</td>\n",
" <td>ADD</td>\n",
" <td>994</td>\n",
" <td>0.880382</td>\n",
" <td>0.173114</td>\n",
" <td>-0.735930</td>\n",
" <td>0.461773</td>\n",
" <td>.</td>\n",
" <td>rs11240779</td>\n",
" <td>A</td>\n",
" <td>994</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>542753</th>\n",
" <td>22</td>\n",
" <td>51174939</td>\n",
" <td>rs73174435</td>\n",
" <td>C</td>\n",
" <td>T</td>\n",
" <td>T</td>\n",
" <td>Y</td>\n",
" <td>ADD</td>\n",
" <td>999</td>\n",
" <td>0.642727</td>\n",
" <td>0.362564</td>\n",
" <td>-1.219190</td>\n",
" <td>0.222772</td>\n",
" <td>.</td>\n",
" <td>rs73174435</td>\n",
" <td>C</td>\n",
" <td>999</td>\n",
" </tr>\n",
" <tr>\n",
" <th>542754</th>\n",
" <td>22</td>\n",
" <td>51175626</td>\n",
" <td>rs3810648</td>\n",
" <td>A</td>\n",
" <td>G</td>\n",
" <td>G</td>\n",
" <td>Y</td>\n",
" <td>ADD</td>\n",
" <td>996</td>\n",
" <td>0.752885</td>\n",
" <td>0.286799</td>\n",
" <td>-0.989690</td>\n",
" <td>0.322326</td>\n",
" <td>.</td>\n",
" <td>rs3810648</td>\n",
" <td>A</td>\n",
" <td>996</td>\n",
" </tr>\n",
" <tr>\n",
" <th>542755</th>\n",
" <td>22</td>\n",
" <td>51183255</td>\n",
" <td>rs5771002</td>\n",
" <td>G</td>\n",
" <td>A</td>\n",
" <td>A</td>\n",
" <td>Y</td>\n",
" <td>ADD</td>\n",
" <td>981</td>\n",
" <td>0.792577</td>\n",
" <td>0.150356</td>\n",
" <td>-1.546100</td>\n",
" <td>0.122080</td>\n",
" <td>.</td>\n",
" <td>rs5771002</td>\n",
" <td>G</td>\n",
" <td>981</td>\n",
" </tr>\n",
" <tr>\n",
" <th>542756</th>\n",
" <td>22</td>\n",
" <td>51185848</td>\n",
" <td>rs3865764</td>\n",
" <td>A</td>\n",
" <td>G</td>\n",
" <td>G</td>\n",
" <td>Y</td>\n",
" <td>ADD</td>\n",
" <td>996</td>\n",
" <td>1.004930</td>\n",
" <td>0.386700</td>\n",
" <td>0.012715</td>\n",
" <td>0.989855</td>\n",
" <td>.</td>\n",
" <td>rs3865764</td>\n",
" <td>A</td>\n",
" <td>996</td>\n",
" </tr>\n",
" <tr>\n",
" <th>542757</th>\n",
" <td>22</td>\n",
" <td>51193629</td>\n",
" <td>rs142680588</td>\n",
" <td>A</td>\n",
" <td>G</td>\n",
" <td>G</td>\n",
" <td>Y</td>\n",
" <td>ADD</td>\n",
" <td>1000</td>\n",
" <td>1.497360</td>\n",
" <td>0.267489</td>\n",
" <td>1.509230</td>\n",
" <td>0.131240</td>\n",
" <td>.</td>\n",
" <td>rs142680588</td>\n",
" <td>A</td>\n",
" <td>1000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>542758 rows × 17 columns</p>\n",
"</div>"
],
"text/plain": [
" #CHROM POS ID REF ALT A1 FIRTH? TEST OBS_CT \\\n",
"0 1 756604 rs3131962 G A A Y ADD 999 \n",
"1 1 768448 rs12562034 G A A Y ADD 996 \n",
"2 1 779322 rs4040617 A G G Y ADD 996 \n",
"3 1 801536 rs79373928 T G G Y ADD 998 \n",
"4 1 808631 rs11240779 A G G Y ADD 994 \n",
"... ... ... ... .. .. .. ... ... ... \n",
"542753 22 51174939 rs73174435 C T T Y ADD 999 \n",
"542754 22 51175626 rs3810648 A G G Y ADD 996 \n",
"542755 22 51183255 rs5771002 G A A Y ADD 981 \n",
"542756 22 51185848 rs3865764 A G G Y ADD 996 \n",
"542757 22 51193629 rs142680588 A G G Y ADD 1000 \n",
"\n",
" OR LOG(OR)_SE Z_STAT P ERRCODE SNP A2 N \n",
"0 1.241130 0.209870 1.029320 0.303330 . rs3131962 G 999 \n",
"1 0.433894 0.285912 -2.920330 0.003497 . rs12562034 G 996 \n",
"2 1.178310 0.211892 0.774379 0.438707 . rs4040617 A 996 \n",
"3 0.989852 0.479159 -0.021286 0.983018 . rs79373928 T 998 \n",
"4 0.880382 0.173114 -0.735930 0.461773 . rs11240779 A 994 \n",
"... ... ... ... ... ... ... .. ... \n",
"542753 0.642727 0.362564 -1.219190 0.222772 . rs73174435 C 999 \n",
"542754 0.752885 0.286799 -0.989690 0.322326 . rs3810648 A 996 \n",
"542755 0.792577 0.150356 -1.546100 0.122080 . rs5771002 G 981 \n",
"542756 1.004930 0.386700 0.012715 0.989855 . rs3865764 A 996 \n",
"542757 1.497360 0.267489 1.509230 0.131240 . rs142680588 A 1000 \n",
"\n",
"[542758 rows x 17 columns]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.load_gwas_subsample(pheno = 'body_BALDING1', sample_size = 1000, seed = 1)\n",
"data.lr_uni"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also load the full cohort GWAS for these 21 traits via:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>#CHROM</th>\n",
" <th>ID</th>\n",
" <th>POS</th>\n",
" <th>A1</th>\n",
" <th>A2</th>\n",
" <th>N</th>\n",
" <th>AF1</th>\n",
" <th>BETA</th>\n",
" <th>SE</th>\n",
" <th>P</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>rs3131962</td>\n",
" <td>756604</td>\n",
" <td>A</td>\n",
" <td>G</td>\n",
" <td>407023</td>\n",
" <td>0.129655</td>\n",
" <td>0.000286</td>\n",
" <td>0.001048</td>\n",
" <td>0.784760</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>rs12562034</td>\n",
" <td>768448</td>\n",
" <td>A</td>\n",
" <td>G</td>\n",
" <td>407057</td>\n",
" <td>0.104966</td>\n",
" <td>-0.001491</td>\n",
" <td>0.001147</td>\n",
" <td>0.193592</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1</td>\n",
" <td>rs4040617</td>\n",
" <td>779322</td>\n",
" <td>G</td>\n",
" <td>A</td>\n",
" <td>406623</td>\n",
" <td>0.127520</td>\n",
" <td>0.000108</td>\n",
" <td>0.001056</td>\n",
" <td>0.918404</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1</td>\n",
" <td>rs79373928</td>\n",
" <td>801536</td>\n",
" <td>G</td>\n",
" <td>T</td>\n",
" <td>407517</td>\n",
" <td>0.014884</td>\n",
" <td>0.004382</td>\n",
" <td>0.002904</td>\n",
" <td>0.131349</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1</td>\n",
" <td>rs11240779</td>\n",
" <td>808631</td>\n",
" <td>G</td>\n",
" <td>A</td>\n",
" <td>404493</td>\n",
" <td>0.224886</td>\n",
" <td>-0.001155</td>\n",
" <td>0.000846</td>\n",
" <td>0.172345</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>542753</th>\n",
" <td>22</td>\n",
" <td>rs73174435</td>\n",
" <td>51174939</td>\n",
" <td>T</td>\n",
" <td>C</td>\n",
" <td>407201</td>\n",
" <td>0.053846</td>\n",
" <td>-0.001980</td>\n",
" <td>0.001559</td>\n",
" <td>0.203959</td>\n",
" </tr>\n",
" <tr>\n",
" <th>542754</th>\n",
" <td>22</td>\n",
" <td>rs3810648</td>\n",
" <td>51175626</td>\n",
" <td>G</td>\n",
" <td>A</td>\n",
" <td>404901</td>\n",
" <td>0.060979</td>\n",
" <td>0.001922</td>\n",
" <td>0.001474</td>\n",
" <td>0.192116</td>\n",
" </tr>\n",
" <tr>\n",
" <th>542755</th>\n",
" <td>22</td>\n",
" <td>rs5771002</td>\n",
" <td>51183255</td>\n",
" <td>A</td>\n",
" <td>G</td>\n",
" <td>401398</td>\n",
" <td>0.333603</td>\n",
" <td>-0.000165</td>\n",
" <td>0.000751</td>\n",
" <td>0.826494</td>\n",
" </tr>\n",
" <tr>\n",
" <th>542756</th>\n",
" <td>22</td>\n",
" <td>rs3865764</td>\n",
" <td>51185848</td>\n",
" <td>G</td>\n",
" <td>A</td>\n",
" <td>406611</td>\n",
" <td>0.050601</td>\n",
" <td>-0.001311</td>\n",
" <td>0.001605</td>\n",
" <td>0.413994</td>\n",
" </tr>\n",
" <tr>\n",
" <th>542757</th>\n",
" <td>22</td>\n",
" <td>rs142680588</td>\n",
" <td>51193629</td>\n",
" <td>G</td>\n",
" <td>A</td>\n",
" <td>407108</td>\n",
" <td>0.075912</td>\n",
" <td>-0.002861</td>\n",
" <td>0.001329</td>\n",
" <td>0.031362</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>542758 rows × 10 columns</p>\n",
"</div>"
],
"text/plain": [
" #CHROM ID POS A1 A2 N AF1 BETA \\\n",
"0 1 rs3131962 756604 A G 407023 0.129655 0.000286 \n",
"1 1 rs12562034 768448 A G 407057 0.104966 -0.001491 \n",
"2 1 rs4040617 779322 G A 406623 0.127520 0.000108 \n",
"3 1 rs79373928 801536 G T 407517 0.014884 0.004382 \n",
"4 1 rs11240779 808631 G A 404493 0.224886 -0.001155 \n",
"... ... ... ... .. .. ... ... ... \n",
"542753 22 rs73174435 51174939 T C 407201 0.053846 -0.001980 \n",
"542754 22 rs3810648 51175626 G A 404901 0.060979 0.001922 \n",
"542755 22 rs5771002 51183255 A G 401398 0.333603 -0.000165 \n",
"542756 22 rs3865764 51185848 G A 406611 0.050601 -0.001311 \n",
"542757 22 rs142680588 51193629 G A 407108 0.075912 -0.002861 \n",
"\n",
" SE P \n",
"0 0.001048 0.784760 \n",
"1 0.001147 0.193592 \n",
"2 0.001056 0.918404 \n",
"3 0.002904 0.131349 \n",
"4 0.000846 0.172345 \n",
"... ... ... \n",
"542753 0.001559 0.203959 \n",
"542754 0.001474 0.192116 \n",
"542755 0.000751 0.826494 \n",
"542756 0.001605 0.413994 \n",
"542757 0.001329 0.031362 \n",
"\n",
"[542758 rows x 10 columns]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.load_full_gwas(pheno = 'body_BALDING1')\n",
"data.lr_uni"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is the basic KGWAS interface! Check out the other notebooks for other capabilities of KGWAS!"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "a100_env",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.0"
}
},
"nbformat": 4,
"nbformat_minor": 2
}