[8790ab]: / demo / kgwas_101.ipynb

Download this file

1879 lines (1878 with data), 63.3 kB

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Basic API Usage of KGWAS\n",
    "\n",
    "KGWAS consists of two main class `KGWAS` and `KGWAS_Data`. `KGWAS` is the main class for the KGWAS model, and `KGWAS_Data` is the class for the data manipulation. In default, to ensure fast user experience, we provide a default fast mode of KGWAS, which uses Enformer embedding for variant feature and ESM embedding for gene features (instead of the baselineLD for variant and PoPS for gene since they are large files). For the fast mode, you do not need to download any data, the KGWAS API will automatically download the relevant files. This mode can be used to apply KGWAS to your own GWAS sumstats. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "All required data files are present.\n",
      "--loading KG---\n",
      "--using enformer SNP embedding--\n",
      "--using random go embedding--\n",
      "--using ESM gene embedding--\n"
     ]
    }
   ],
   "source": [
    "import sys\n",
    "sys.path.append('../')\n",
    "\n",
    "from kgwas import KGWAS, KGWAS_Data\n",
    "data = KGWAS_Data(data_path = './data/')\n",
    "data.load_kg()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, the data needed for training is downloaded from the server and the knowledge graph is loaded. Next, we load the GWAS file. Here, we are using an example GWAS file, which is also automatically downloaded from the server. But you can also use your own GWAS file. The GWAS file should be in the format of a pandas DataFrame with columns `CHR`/`#CHROM`, `SNP`, `P`, `N`. Note that at the moment, our knowledge graph is UKBioBank directly genotyped variant set so it will automatically takes the overlap with the KG. Current efforts are underway for improving the coverage of the KG."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loading example GWAS file...\n",
      "Example file already exists locally.\n",
      "Loading GWAS file from ./data/biochemistry_Creatinine_fastgwa_full_10000_1.fastGWA...\n",
      "Number of SNPs in the KG: 784256\n",
      "Number of SNPs in the GWAS: 542758\n",
      "Number of SNPs in the KG variant set: 542758\n",
      "Using ldsc weight...\n",
      "ldsc_weight mean:  0.9999999999999993\n"
     ]
    }
   ],
   "source": [
    "data.load_external_gwas(example_file = True)\n",
    "data.process_gwas_file()\n",
    "data.prepare_split()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>#CHROM</th>\n",
       "      <th>ID</th>\n",
       "      <th>POS</th>\n",
       "      <th>A1</th>\n",
       "      <th>A2</th>\n",
       "      <th>N</th>\n",
       "      <th>AF1</th>\n",
       "      <th>BETA</th>\n",
       "      <th>SE</th>\n",
       "      <th>P</th>\n",
       "      <th>ld_score</th>\n",
       "      <th>w_ld_score</th>\n",
       "      <th>y</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>rs3131962</td>\n",
       "      <td>756604</td>\n",
       "      <td>A</td>\n",
       "      <td>G</td>\n",
       "      <td>9988</td>\n",
       "      <td>0.131007</td>\n",
       "      <td>-0.117134</td>\n",
       "      <td>0.246231</td>\n",
       "      <td>0.634282</td>\n",
       "      <td>72.862240</td>\n",
       "      <td>4.474788</td>\n",
       "      <td>0.226298</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>rs12562034</td>\n",
       "      <td>768448</td>\n",
       "      <td>A</td>\n",
       "      <td>G</td>\n",
       "      <td>9978</td>\n",
       "      <td>0.104981</td>\n",
       "      <td>-0.064894</td>\n",
       "      <td>0.273746</td>\n",
       "      <td>0.812611</td>\n",
       "      <td>34.749233</td>\n",
       "      <td>1.877341</td>\n",
       "      <td>0.056197</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>rs4040617</td>\n",
       "      <td>779322</td>\n",
       "      <td>G</td>\n",
       "      <td>A</td>\n",
       "      <td>9975</td>\n",
       "      <td>0.129123</td>\n",
       "      <td>-0.001462</td>\n",
       "      <td>0.247254</td>\n",
       "      <td>0.995281</td>\n",
       "      <td>72.271390</td>\n",
       "      <td>4.208873</td>\n",
       "      <td>0.000035</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>rs79373928</td>\n",
       "      <td>801536</td>\n",
       "      <td>G</td>\n",
       "      <td>T</td>\n",
       "      <td>9994</td>\n",
       "      <td>0.014659</td>\n",
       "      <td>0.081544</td>\n",
       "      <td>0.688261</td>\n",
       "      <td>0.905688</td>\n",
       "      <td>16.740126</td>\n",
       "      <td>1.949177</td>\n",
       "      <td>0.014037</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>rs11240779</td>\n",
       "      <td>808631</td>\n",
       "      <td>G</td>\n",
       "      <td>A</td>\n",
       "      <td>9919</td>\n",
       "      <td>0.226737</td>\n",
       "      <td>-0.184268</td>\n",
       "      <td>0.198982</td>\n",
       "      <td>0.354418</td>\n",
       "      <td>50.215000</td>\n",
       "      <td>2.825456</td>\n",
       "      <td>0.857575</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>542753</th>\n",
       "      <td>22</td>\n",
       "      <td>rs73174435</td>\n",
       "      <td>51174939</td>\n",
       "      <td>T</td>\n",
       "      <td>C</td>\n",
       "      <td>9979</td>\n",
       "      <td>0.056118</td>\n",
       "      <td>-0.158762</td>\n",
       "      <td>0.362390</td>\n",
       "      <td>0.661316</td>\n",
       "      <td>21.981667</td>\n",
       "      <td>1.363001</td>\n",
       "      <td>0.191929</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>542754</th>\n",
       "      <td>22</td>\n",
       "      <td>rs3810648</td>\n",
       "      <td>51175626</td>\n",
       "      <td>G</td>\n",
       "      <td>A</td>\n",
       "      <td>9931</td>\n",
       "      <td>0.058856</td>\n",
       "      <td>0.272493</td>\n",
       "      <td>0.352508</td>\n",
       "      <td>0.439515</td>\n",
       "      <td>34.619377</td>\n",
       "      <td>1.804193</td>\n",
       "      <td>0.597548</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>542755</th>\n",
       "      <td>22</td>\n",
       "      <td>rs5771002</td>\n",
       "      <td>51183255</td>\n",
       "      <td>A</td>\n",
       "      <td>G</td>\n",
       "      <td>9840</td>\n",
       "      <td>0.333638</td>\n",
       "      <td>0.116325</td>\n",
       "      <td>0.175675</td>\n",
       "      <td>0.507869</td>\n",
       "      <td>16.231083</td>\n",
       "      <td>1.273770</td>\n",
       "      <td>0.438456</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>542756</th>\n",
       "      <td>22</td>\n",
       "      <td>rs3865764</td>\n",
       "      <td>51185848</td>\n",
       "      <td>G</td>\n",
       "      <td>A</td>\n",
       "      <td>9974</td>\n",
       "      <td>0.051133</td>\n",
       "      <td>-0.026670</td>\n",
       "      <td>0.376132</td>\n",
       "      <td>0.943472</td>\n",
       "      <td>18.649513</td>\n",
       "      <td>1.010000</td>\n",
       "      <td>0.005028</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>542757</th>\n",
       "      <td>22</td>\n",
       "      <td>rs142680588</td>\n",
       "      <td>51193629</td>\n",
       "      <td>G</td>\n",
       "      <td>A</td>\n",
       "      <td>9981</td>\n",
       "      <td>0.076595</td>\n",
       "      <td>-0.109532</td>\n",
       "      <td>0.312971</td>\n",
       "      <td>0.726358</td>\n",
       "      <td>52.471287</td>\n",
       "      <td>1.873861</td>\n",
       "      <td>0.122482</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>542758 rows × 13 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "        #CHROM           ID       POS A1 A2     N       AF1      BETA  \\\n",
       "0            1    rs3131962    756604  A  G  9988  0.131007 -0.117134   \n",
       "1            1   rs12562034    768448  A  G  9978  0.104981 -0.064894   \n",
       "2            1    rs4040617    779322  G  A  9975  0.129123 -0.001462   \n",
       "3            1   rs79373928    801536  G  T  9994  0.014659  0.081544   \n",
       "4            1   rs11240779    808631  G  A  9919  0.226737 -0.184268   \n",
       "...        ...          ...       ... .. ..   ...       ...       ...   \n",
       "542753      22   rs73174435  51174939  T  C  9979  0.056118 -0.158762   \n",
       "542754      22    rs3810648  51175626  G  A  9931  0.058856  0.272493   \n",
       "542755      22    rs5771002  51183255  A  G  9840  0.333638  0.116325   \n",
       "542756      22    rs3865764  51185848  G  A  9974  0.051133 -0.026670   \n",
       "542757      22  rs142680588  51193629  G  A  9981  0.076595 -0.109532   \n",
       "\n",
       "              SE         P   ld_score  w_ld_score         y  \n",
       "0       0.246231  0.634282  72.862240    4.474788  0.226298  \n",
       "1       0.273746  0.812611  34.749233    1.877341  0.056197  \n",
       "2       0.247254  0.995281  72.271390    4.208873  0.000035  \n",
       "3       0.688261  0.905688  16.740126    1.949177  0.014037  \n",
       "4       0.198982  0.354418  50.215000    2.825456  0.857575  \n",
       "...          ...       ...        ...         ...       ...  \n",
       "542753  0.362390  0.661316  21.981667    1.363001  0.191929  \n",
       "542754  0.352508  0.439515  34.619377    1.804193  0.597548  \n",
       "542755  0.175675  0.507869  16.231083    1.273770  0.438456  \n",
       "542756  0.376132  0.943472  18.649513    1.010000  0.005028  \n",
       "542757  0.312971  0.726358  52.471287    1.873861  0.122482  \n",
       "\n",
       "[542758 rows x 13 columns]"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.lr_uni"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we are ready to train the model! Here we are using epoch = 1 for the demo purpose, but in reality, you should use a higher number of epochs for better performance."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Creating data loader...\n",
      "Start Training...\n",
      "Training Progress Epoch 1/1:  52%|█████▏    | 500/956 [12:56<15:47,  2.08s/it]Epoch 1 Step 501 Train Loss: 1.8115\n",
      "Training Progress Epoch 1/1: 100%|██████████| 956/956 [24:26<00:00,  1.53s/it]\n",
      "100%|██████████| 50/50 [00:58<00:00,  1.17s/it]\n",
      "Epoch 1: Validation MSE: 2.1730 Validation Pearson: 0.0096. \n",
      "Saving models to ./data//model/test\n",
      "100%|██████████| 54/54 [00:56<00:00,  1.04s/it]\n",
      "100%|██████████| 1061/1061 [05:40<00:00,  3.11it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "KGWAS prediction and p-values saved to ./data//model_pred/new_experiments/test_pred.csv\n"
     ]
    }
   ],
   "source": [
    "run = KGWAS(data, device = 'cuda:9', exp_name = 'test')\n",
    "run.initialize_model()\n",
    "run.train(epoch = 1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The output of the model is saved to `/model_pred/new_experiments/{exp_name}_pred.csv`. You can also load it via `run.kgwas_res`. The model is also saved to `/model/{exp_name}`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>#CHROM</th>\n",
       "      <th>ID</th>\n",
       "      <th>POS</th>\n",
       "      <th>A1</th>\n",
       "      <th>A2</th>\n",
       "      <th>N</th>\n",
       "      <th>AF1</th>\n",
       "      <th>BETA</th>\n",
       "      <th>SE</th>\n",
       "      <th>P</th>\n",
       "      <th>ld_score</th>\n",
       "      <th>w_ld_score</th>\n",
       "      <th>y</th>\n",
       "      <th>pred</th>\n",
       "      <th>P_weighted</th>\n",
       "      <th>KGWAS_P</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>rs3131962</td>\n",
       "      <td>756604</td>\n",
       "      <td>A</td>\n",
       "      <td>G</td>\n",
       "      <td>9988</td>\n",
       "      <td>0.131007</td>\n",
       "      <td>-0.117134</td>\n",
       "      <td>0.246231</td>\n",
       "      <td>0.634282</td>\n",
       "      <td>72.862240</td>\n",
       "      <td>4.474788</td>\n",
       "      <td>0.226298</td>\n",
       "      <td>1.082365</td>\n",
       "      <td>0.234167</td>\n",
       "      <td>0.346428</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>rs12562034</td>\n",
       "      <td>768448</td>\n",
       "      <td>A</td>\n",
       "      <td>G</td>\n",
       "      <td>9978</td>\n",
       "      <td>0.104981</td>\n",
       "      <td>-0.064894</td>\n",
       "      <td>0.273746</td>\n",
       "      <td>0.812611</td>\n",
       "      <td>34.749233</td>\n",
       "      <td>1.877341</td>\n",
       "      <td>0.056197</td>\n",
       "      <td>1.087724</td>\n",
       "      <td>0.382894</td>\n",
       "      <td>0.566456</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>rs4040617</td>\n",
       "      <td>779322</td>\n",
       "      <td>G</td>\n",
       "      <td>A</td>\n",
       "      <td>9975</td>\n",
       "      <td>0.129123</td>\n",
       "      <td>-0.001462</td>\n",
       "      <td>0.247254</td>\n",
       "      <td>0.995281</td>\n",
       "      <td>72.271390</td>\n",
       "      <td>4.208873</td>\n",
       "      <td>0.000035</td>\n",
       "      <td>1.058530</td>\n",
       "      <td>0.995281</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>rs79373928</td>\n",
       "      <td>801536</td>\n",
       "      <td>G</td>\n",
       "      <td>T</td>\n",
       "      <td>9994</td>\n",
       "      <td>0.014659</td>\n",
       "      <td>0.081544</td>\n",
       "      <td>0.688261</td>\n",
       "      <td>0.905688</td>\n",
       "      <td>16.740126</td>\n",
       "      <td>1.949177</td>\n",
       "      <td>0.014037</td>\n",
       "      <td>1.105125</td>\n",
       "      <td>0.225107</td>\n",
       "      <td>0.333025</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>rs11240779</td>\n",
       "      <td>808631</td>\n",
       "      <td>G</td>\n",
       "      <td>A</td>\n",
       "      <td>9919</td>\n",
       "      <td>0.226737</td>\n",
       "      <td>-0.184268</td>\n",
       "      <td>0.198982</td>\n",
       "      <td>0.354418</td>\n",
       "      <td>50.215000</td>\n",
       "      <td>2.825456</td>\n",
       "      <td>0.857575</td>\n",
       "      <td>1.081468</td>\n",
       "      <td>0.041646</td>\n",
       "      <td>0.061612</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>542753</th>\n",
       "      <td>22</td>\n",
       "      <td>rs73174435</td>\n",
       "      <td>51174939</td>\n",
       "      <td>T</td>\n",
       "      <td>C</td>\n",
       "      <td>9979</td>\n",
       "      <td>0.056118</td>\n",
       "      <td>-0.158762</td>\n",
       "      <td>0.362390</td>\n",
       "      <td>0.661316</td>\n",
       "      <td>21.981667</td>\n",
       "      <td>1.363001</td>\n",
       "      <td>0.191929</td>\n",
       "      <td>1.008835</td>\n",
       "      <td>0.233609</td>\n",
       "      <td>0.345602</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>542754</th>\n",
       "      <td>22</td>\n",
       "      <td>rs3810648</td>\n",
       "      <td>51175626</td>\n",
       "      <td>G</td>\n",
       "      <td>A</td>\n",
       "      <td>9931</td>\n",
       "      <td>0.058856</td>\n",
       "      <td>0.272493</td>\n",
       "      <td>0.352508</td>\n",
       "      <td>0.439515</td>\n",
       "      <td>34.619377</td>\n",
       "      <td>1.804193</td>\n",
       "      <td>0.597548</td>\n",
       "      <td>1.034187</td>\n",
       "      <td>0.439515</td>\n",
       "      <td>0.650221</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>542755</th>\n",
       "      <td>22</td>\n",
       "      <td>rs5771002</td>\n",
       "      <td>51183255</td>\n",
       "      <td>A</td>\n",
       "      <td>G</td>\n",
       "      <td>9840</td>\n",
       "      <td>0.333638</td>\n",
       "      <td>0.116325</td>\n",
       "      <td>0.175675</td>\n",
       "      <td>0.507869</td>\n",
       "      <td>16.231083</td>\n",
       "      <td>1.273770</td>\n",
       "      <td>0.438456</td>\n",
       "      <td>1.093221</td>\n",
       "      <td>0.449038</td>\n",
       "      <td>0.66431</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>542756</th>\n",
       "      <td>22</td>\n",
       "      <td>rs3865764</td>\n",
       "      <td>51185848</td>\n",
       "      <td>G</td>\n",
       "      <td>A</td>\n",
       "      <td>9974</td>\n",
       "      <td>0.051133</td>\n",
       "      <td>-0.026670</td>\n",
       "      <td>0.376132</td>\n",
       "      <td>0.943472</td>\n",
       "      <td>18.649513</td>\n",
       "      <td>1.010000</td>\n",
       "      <td>0.005028</td>\n",
       "      <td>0.987747</td>\n",
       "      <td>0.943472</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>542757</th>\n",
       "      <td>22</td>\n",
       "      <td>rs142680588</td>\n",
       "      <td>51193629</td>\n",
       "      <td>G</td>\n",
       "      <td>A</td>\n",
       "      <td>9981</td>\n",
       "      <td>0.076595</td>\n",
       "      <td>-0.109532</td>\n",
       "      <td>0.312971</td>\n",
       "      <td>0.726358</td>\n",
       "      <td>52.471287</td>\n",
       "      <td>1.873861</td>\n",
       "      <td>0.122482</td>\n",
       "      <td>1.082649</td>\n",
       "      <td>0.26816</td>\n",
       "      <td>0.396718</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>542758 rows × 16 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "        #CHROM           ID       POS A1 A2     N       AF1      BETA  \\\n",
       "0            1    rs3131962    756604  A  G  9988  0.131007 -0.117134   \n",
       "1            1   rs12562034    768448  A  G  9978  0.104981 -0.064894   \n",
       "2            1    rs4040617    779322  G  A  9975  0.129123 -0.001462   \n",
       "3            1   rs79373928    801536  G  T  9994  0.014659  0.081544   \n",
       "4            1   rs11240779    808631  G  A  9919  0.226737 -0.184268   \n",
       "...        ...          ...       ... .. ..   ...       ...       ...   \n",
       "542753      22   rs73174435  51174939  T  C  9979  0.056118 -0.158762   \n",
       "542754      22    rs3810648  51175626  G  A  9931  0.058856  0.272493   \n",
       "542755      22    rs5771002  51183255  A  G  9840  0.333638  0.116325   \n",
       "542756      22    rs3865764  51185848  G  A  9974  0.051133 -0.026670   \n",
       "542757      22  rs142680588  51193629  G  A  9981  0.076595 -0.109532   \n",
       "\n",
       "              SE         P   ld_score  w_ld_score         y      pred  \\\n",
       "0       0.246231  0.634282  72.862240    4.474788  0.226298  1.082365   \n",
       "1       0.273746  0.812611  34.749233    1.877341  0.056197  1.087724   \n",
       "2       0.247254  0.995281  72.271390    4.208873  0.000035  1.058530   \n",
       "3       0.688261  0.905688  16.740126    1.949177  0.014037  1.105125   \n",
       "4       0.198982  0.354418  50.215000    2.825456  0.857575  1.081468   \n",
       "...          ...       ...        ...         ...       ...       ...   \n",
       "542753  0.362390  0.661316  21.981667    1.363001  0.191929  1.008835   \n",
       "542754  0.352508  0.439515  34.619377    1.804193  0.597548  1.034187   \n",
       "542755  0.175675  0.507869  16.231083    1.273770  0.438456  1.093221   \n",
       "542756  0.376132  0.943472  18.649513    1.010000  0.005028  0.987747   \n",
       "542757  0.312971  0.726358  52.471287    1.873861  0.122482  1.082649   \n",
       "\n",
       "       P_weighted   KGWAS_P  \n",
       "0        0.234167  0.346428  \n",
       "1        0.382894  0.566456  \n",
       "2        0.995281         1  \n",
       "3        0.225107  0.333025  \n",
       "4        0.041646  0.061612  \n",
       "...           ...       ...  \n",
       "542753   0.233609  0.345602  \n",
       "542754   0.439515  0.650221  \n",
       "542755   0.449038   0.66431  \n",
       "542756   0.943472         1  \n",
       "542757    0.26816  0.396718  \n",
       "\n",
       "[542758 rows x 16 columns]"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "run.kgwas_res"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If needed, you can load the pre-trained model via `run.load_pretrained()`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "run.load_pretrained('./data/model/test')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you want to (1) use the full mode of KGWAS (i.e. larger node embeddings) or (2) access the null/causal simulations or (3) access the 21 subsampled GWAS sumstats across various sample sizes or (4) analyze the KGWAS sumstats for subsampled data or (5) analyze the KGWAS sumstats for all UKBB ICD10 diseases, please use [this link](https://drive.google.com/file/d/14UcHzPRIbdMmnLPZCHx_4G-gz2pipeg9/view?usp=sharing). Note that this file is large (around 45GB) and may take a while to download. After unzipping it, you can use that directory as the data directory for the KGWAS API."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "All required data files are present.\n"
     ]
    }
   ],
   "source": [
    "from kgwas import KGWAS, KGWAS_Data\n",
    "data = KGWAS_Data(data_path = '/dfs/project/datasets/20220524-ukbiobank/data/kgwas_data/')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that you can use various variant, gene, and program embeddings. For example, for the result in the paper, we use the baselineLD for variant and PoPS for gene."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--loading KG---\n",
      "--using baselineLD SNP embedding--\n",
      "--using random go embedding--\n",
      "--using PoPs expression+PPI+pathways gene embedding--\n"
     ]
    }
   ],
   "source": [
    "data.load_kg(snp_init_emb = 'baselineLD', \n",
    "             go_init_emb = 'random',\n",
    "             gene_init_emb = 'pops')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There are many alternative embeddings as well. \n",
    "- For variant: `enformer` (default), `baselineLD`, `SLDSC`, `cadd`, `kg`, `random`\n",
    "- For gene: `esm` (default), `pops_expression`, `pops`, `kg`, `random`\n",
    "- For program/go: `random` (default), `biogpt`, `kg`\n",
    "\n",
    "In additional to more embeddings, the full data folder contains summary statistics used in each analysis in the paper. For example, for the simulations, you can load it via:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "All required data files are present.\n",
      "Using simulation data....\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>#CHROM</th>\n",
       "      <th>ID</th>\n",
       "      <th>POS</th>\n",
       "      <th>A1</th>\n",
       "      <th>A2</th>\n",
       "      <th>N</th>\n",
       "      <th>AF1</th>\n",
       "      <th>BETA</th>\n",
       "      <th>SE</th>\n",
       "      <th>P</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>rs3131962</td>\n",
       "      <td>756604</td>\n",
       "      <td>A</td>\n",
       "      <td>G</td>\n",
       "      <td>4993</td>\n",
       "      <td>0.129882</td>\n",
       "      <td>14.559400</td>\n",
       "      <td>17.1871</td>\n",
       "      <td>0.396933</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>rs12562034</td>\n",
       "      <td>768448</td>\n",
       "      <td>A</td>\n",
       "      <td>G</td>\n",
       "      <td>4994</td>\n",
       "      <td>0.103124</td>\n",
       "      <td>-15.034400</td>\n",
       "      <td>19.0234</td>\n",
       "      <td>0.429345</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>rs4040617</td>\n",
       "      <td>779322</td>\n",
       "      <td>G</td>\n",
       "      <td>A</td>\n",
       "      <td>4979</td>\n",
       "      <td>0.127435</td>\n",
       "      <td>15.537200</td>\n",
       "      <td>17.3933</td>\n",
       "      <td>0.371704</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>rs79373928</td>\n",
       "      <td>801536</td>\n",
       "      <td>G</td>\n",
       "      <td>T</td>\n",
       "      <td>4996</td>\n",
       "      <td>0.015012</td>\n",
       "      <td>16.142600</td>\n",
       "      <td>47.7752</td>\n",
       "      <td>0.735448</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>rs11240779</td>\n",
       "      <td>808631</td>\n",
       "      <td>G</td>\n",
       "      <td>A</td>\n",
       "      <td>4961</td>\n",
       "      <td>0.222233</td>\n",
       "      <td>0.859838</td>\n",
       "      <td>13.9158</td>\n",
       "      <td>0.950731</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>542753</th>\n",
       "      <td>22</td>\n",
       "      <td>rs73174435</td>\n",
       "      <td>51174939</td>\n",
       "      <td>T</td>\n",
       "      <td>C</td>\n",
       "      <td>4991</td>\n",
       "      <td>0.057103</td>\n",
       "      <td>53.082400</td>\n",
       "      <td>24.8130</td>\n",
       "      <td>0.032412</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>542754</th>\n",
       "      <td>22</td>\n",
       "      <td>rs3810648</td>\n",
       "      <td>51175626</td>\n",
       "      <td>G</td>\n",
       "      <td>A</td>\n",
       "      <td>4959</td>\n",
       "      <td>0.066243</td>\n",
       "      <td>17.689800</td>\n",
       "      <td>23.2562</td>\n",
       "      <td>0.446867</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>542755</th>\n",
       "      <td>22</td>\n",
       "      <td>rs5771002</td>\n",
       "      <td>51183255</td>\n",
       "      <td>A</td>\n",
       "      <td>G</td>\n",
       "      <td>4937</td>\n",
       "      <td>0.334414</td>\n",
       "      <td>-12.170400</td>\n",
       "      <td>12.3314</td>\n",
       "      <td>0.323670</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>542756</th>\n",
       "      <td>22</td>\n",
       "      <td>rs3865764</td>\n",
       "      <td>51185848</td>\n",
       "      <td>G</td>\n",
       "      <td>A</td>\n",
       "      <td>4984</td>\n",
       "      <td>0.050662</td>\n",
       "      <td>-43.871900</td>\n",
       "      <td>26.3007</td>\n",
       "      <td>0.095299</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>542757</th>\n",
       "      <td>22</td>\n",
       "      <td>rs142680588</td>\n",
       "      <td>51193629</td>\n",
       "      <td>G</td>\n",
       "      <td>A</td>\n",
       "      <td>4994</td>\n",
       "      <td>0.073388</td>\n",
       "      <td>11.338700</td>\n",
       "      <td>22.2066</td>\n",
       "      <td>0.609630</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>542758 rows × 10 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "        #CHROM           ID       POS A1 A2     N       AF1       BETA  \\\n",
       "0            1    rs3131962    756604  A  G  4993  0.129882  14.559400   \n",
       "1            1   rs12562034    768448  A  G  4994  0.103124 -15.034400   \n",
       "2            1    rs4040617    779322  G  A  4979  0.127435  15.537200   \n",
       "3            1   rs79373928    801536  G  T  4996  0.015012  16.142600   \n",
       "4            1   rs11240779    808631  G  A  4961  0.222233   0.859838   \n",
       "...        ...          ...       ... .. ..   ...       ...        ...   \n",
       "542753      22   rs73174435  51174939  T  C  4991  0.057103  53.082400   \n",
       "542754      22    rs3810648  51175626  G  A  4959  0.066243  17.689800   \n",
       "542755      22    rs5771002  51183255  A  G  4937  0.334414 -12.170400   \n",
       "542756      22    rs3865764  51185848  G  A  4984  0.050662 -43.871900   \n",
       "542757      22  rs142680588  51193629  G  A  4994  0.073388  11.338700   \n",
       "\n",
       "             SE         P  \n",
       "0       17.1871  0.396933  \n",
       "1       19.0234  0.429345  \n",
       "2       17.3933  0.371704  \n",
       "3       47.7752  0.735448  \n",
       "4       13.9158  0.950731  \n",
       "...         ...       ...  \n",
       "542753  24.8130  0.032412  \n",
       "542754  23.2562  0.446867  \n",
       "542755  12.3314  0.323670  \n",
       "542756  26.3007  0.095299  \n",
       "542757  22.2066  0.609630  \n",
       "\n",
       "[542758 rows x 10 columns]"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.load_simulation_gwas('causal', seed = 1) # seed can range from 1-500\n",
    "data.lr_uni"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Similarly for null simulations, you can load it via:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Using simulation data....\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>#CHROM</th>\n",
       "      <th>ID</th>\n",
       "      <th>POS</th>\n",
       "      <th>A1</th>\n",
       "      <th>A2</th>\n",
       "      <th>N</th>\n",
       "      <th>AF1</th>\n",
       "      <th>BETA</th>\n",
       "      <th>SE</th>\n",
       "      <th>P</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>rs3131962</td>\n",
       "      <td>756604</td>\n",
       "      <td>A</td>\n",
       "      <td>G</td>\n",
       "      <td>4993</td>\n",
       "      <td>0.129882</td>\n",
       "      <td>-2.960260</td>\n",
       "      <td>7.66276</td>\n",
       "      <td>0.699261</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>rs12562034</td>\n",
       "      <td>768448</td>\n",
       "      <td>A</td>\n",
       "      <td>G</td>\n",
       "      <td>4994</td>\n",
       "      <td>0.103124</td>\n",
       "      <td>-19.335700</td>\n",
       "      <td>8.47710</td>\n",
       "      <td>0.022552</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>rs4040617</td>\n",
       "      <td>779322</td>\n",
       "      <td>G</td>\n",
       "      <td>A</td>\n",
       "      <td>4979</td>\n",
       "      <td>0.127435</td>\n",
       "      <td>-3.287600</td>\n",
       "      <td>7.75475</td>\n",
       "      <td>0.671605</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>rs79373928</td>\n",
       "      <td>801536</td>\n",
       "      <td>G</td>\n",
       "      <td>T</td>\n",
       "      <td>4996</td>\n",
       "      <td>0.015012</td>\n",
       "      <td>-12.530000</td>\n",
       "      <td>21.29860</td>\n",
       "      <td>0.556329</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>rs11240779</td>\n",
       "      <td>808631</td>\n",
       "      <td>G</td>\n",
       "      <td>A</td>\n",
       "      <td>4961</td>\n",
       "      <td>0.222233</td>\n",
       "      <td>-8.564830</td>\n",
       "      <td>6.20273</td>\n",
       "      <td>0.167335</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>542753</th>\n",
       "      <td>22</td>\n",
       "      <td>rs73174435</td>\n",
       "      <td>51174939</td>\n",
       "      <td>T</td>\n",
       "      <td>C</td>\n",
       "      <td>4991</td>\n",
       "      <td>0.057103</td>\n",
       "      <td>-24.859400</td>\n",
       "      <td>11.06160</td>\n",
       "      <td>0.024617</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>542754</th>\n",
       "      <td>22</td>\n",
       "      <td>rs3810648</td>\n",
       "      <td>51175626</td>\n",
       "      <td>G</td>\n",
       "      <td>A</td>\n",
       "      <td>4959</td>\n",
       "      <td>0.066243</td>\n",
       "      <td>-0.725793</td>\n",
       "      <td>10.36870</td>\n",
       "      <td>0.944195</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>542755</th>\n",
       "      <td>22</td>\n",
       "      <td>rs5771002</td>\n",
       "      <td>51183255</td>\n",
       "      <td>A</td>\n",
       "      <td>G</td>\n",
       "      <td>4937</td>\n",
       "      <td>0.334414</td>\n",
       "      <td>-5.555300</td>\n",
       "      <td>5.49753</td>\n",
       "      <td>0.312251</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>542756</th>\n",
       "      <td>22</td>\n",
       "      <td>rs3865764</td>\n",
       "      <td>51185848</td>\n",
       "      <td>G</td>\n",
       "      <td>A</td>\n",
       "      <td>4984</td>\n",
       "      <td>0.050662</td>\n",
       "      <td>12.588200</td>\n",
       "      <td>11.72730</td>\n",
       "      <td>0.283085</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>542757</th>\n",
       "      <td>22</td>\n",
       "      <td>rs142680588</td>\n",
       "      <td>51193629</td>\n",
       "      <td>G</td>\n",
       "      <td>A</td>\n",
       "      <td>4994</td>\n",
       "      <td>0.073388</td>\n",
       "      <td>-13.533700</td>\n",
       "      <td>9.89851</td>\n",
       "      <td>0.171548</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>542758 rows × 10 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "        #CHROM           ID       POS A1 A2     N       AF1       BETA  \\\n",
       "0            1    rs3131962    756604  A  G  4993  0.129882  -2.960260   \n",
       "1            1   rs12562034    768448  A  G  4994  0.103124 -19.335700   \n",
       "2            1    rs4040617    779322  G  A  4979  0.127435  -3.287600   \n",
       "3            1   rs79373928    801536  G  T  4996  0.015012 -12.530000   \n",
       "4            1   rs11240779    808631  G  A  4961  0.222233  -8.564830   \n",
       "...        ...          ...       ... .. ..   ...       ...        ...   \n",
       "542753      22   rs73174435  51174939  T  C  4991  0.057103 -24.859400   \n",
       "542754      22    rs3810648  51175626  G  A  4959  0.066243  -0.725793   \n",
       "542755      22    rs5771002  51183255  A  G  4937  0.334414  -5.555300   \n",
       "542756      22    rs3865764  51185848  G  A  4984  0.050662  12.588200   \n",
       "542757      22  rs142680588  51193629  G  A  4994  0.073388 -13.533700   \n",
       "\n",
       "              SE         P  \n",
       "0        7.66276  0.699261  \n",
       "1        8.47710  0.022552  \n",
       "2        7.75475  0.671605  \n",
       "3       21.29860  0.556329  \n",
       "4        6.20273  0.167335  \n",
       "...          ...       ...  \n",
       "542753  11.06160  0.024617  \n",
       "542754  10.36870  0.944195  \n",
       "542755   5.49753  0.312251  \n",
       "542756  11.72730  0.283085  \n",
       "542757   9.89851  0.171548  \n",
       "\n",
       "[542758 rows x 10 columns]"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.load_simulation_gwas('null', seed = 1)# seed can range from 1-500\n",
    "data.lr_uni"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, for the subsampling analysis, you can load any trait out of the 21 subsampled traits in various sample sizes across 5 replicates. The phenotype list can be accessed via:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['body_BALDING1',\n",
       " 'disease_ALLERGY_ECZEMA_DIAGNOSED',\n",
       " 'disease_HYPOTHYROIDISM_SELF_REP',\n",
       " 'pigment_SUNBURN',\n",
       " '21001',\n",
       " '50',\n",
       " '30080',\n",
       " '30070',\n",
       " '30010',\n",
       " '30000',\n",
       " 'biochemistry_AlkalinePhosphatase',\n",
       " 'biochemistry_AspartateAminotransferase',\n",
       " 'biochemistry_Cholesterol',\n",
       " 'biochemistry_Creatinine',\n",
       " 'biochemistry_IGF1',\n",
       " 'biochemistry_Phosphate',\n",
       " 'biochemistry_Testosterone_Male',\n",
       " 'biochemistry_TotalBilirubin',\n",
       " 'biochemistry_TotalProtein',\n",
       " 'biochemistry_VitaminD',\n",
       " 'bmd_HEEL_TSCOREz']"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.get_pheno_list()['21_indep_traits']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Usually each trait has the following sample sizes available: 1000, 2500, 5000, 7500, 10000, 50000, 100000, 200000. For example, to load body_BALDING1 at sample size 1000 at replicate 1, you can use:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>#CHROM</th>\n",
       "      <th>POS</th>\n",
       "      <th>ID</th>\n",
       "      <th>REF</th>\n",
       "      <th>ALT</th>\n",
       "      <th>A1</th>\n",
       "      <th>FIRTH?</th>\n",
       "      <th>TEST</th>\n",
       "      <th>OBS_CT</th>\n",
       "      <th>OR</th>\n",
       "      <th>LOG(OR)_SE</th>\n",
       "      <th>Z_STAT</th>\n",
       "      <th>P</th>\n",
       "      <th>ERRCODE</th>\n",
       "      <th>SNP</th>\n",
       "      <th>A2</th>\n",
       "      <th>N</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>756604</td>\n",
       "      <td>rs3131962</td>\n",
       "      <td>G</td>\n",
       "      <td>A</td>\n",
       "      <td>A</td>\n",
       "      <td>Y</td>\n",
       "      <td>ADD</td>\n",
       "      <td>999</td>\n",
       "      <td>1.241130</td>\n",
       "      <td>0.209870</td>\n",
       "      <td>1.029320</td>\n",
       "      <td>0.303330</td>\n",
       "      <td>.</td>\n",
       "      <td>rs3131962</td>\n",
       "      <td>G</td>\n",
       "      <td>999</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>768448</td>\n",
       "      <td>rs12562034</td>\n",
       "      <td>G</td>\n",
       "      <td>A</td>\n",
       "      <td>A</td>\n",
       "      <td>Y</td>\n",
       "      <td>ADD</td>\n",
       "      <td>996</td>\n",
       "      <td>0.433894</td>\n",
       "      <td>0.285912</td>\n",
       "      <td>-2.920330</td>\n",
       "      <td>0.003497</td>\n",
       "      <td>.</td>\n",
       "      <td>rs12562034</td>\n",
       "      <td>G</td>\n",
       "      <td>996</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>779322</td>\n",
       "      <td>rs4040617</td>\n",
       "      <td>A</td>\n",
       "      <td>G</td>\n",
       "      <td>G</td>\n",
       "      <td>Y</td>\n",
       "      <td>ADD</td>\n",
       "      <td>996</td>\n",
       "      <td>1.178310</td>\n",
       "      <td>0.211892</td>\n",
       "      <td>0.774379</td>\n",
       "      <td>0.438707</td>\n",
       "      <td>.</td>\n",
       "      <td>rs4040617</td>\n",
       "      <td>A</td>\n",
       "      <td>996</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>801536</td>\n",
       "      <td>rs79373928</td>\n",
       "      <td>T</td>\n",
       "      <td>G</td>\n",
       "      <td>G</td>\n",
       "      <td>Y</td>\n",
       "      <td>ADD</td>\n",
       "      <td>998</td>\n",
       "      <td>0.989852</td>\n",
       "      <td>0.479159</td>\n",
       "      <td>-0.021286</td>\n",
       "      <td>0.983018</td>\n",
       "      <td>.</td>\n",
       "      <td>rs79373928</td>\n",
       "      <td>T</td>\n",
       "      <td>998</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>808631</td>\n",
       "      <td>rs11240779</td>\n",
       "      <td>A</td>\n",
       "      <td>G</td>\n",
       "      <td>G</td>\n",
       "      <td>Y</td>\n",
       "      <td>ADD</td>\n",
       "      <td>994</td>\n",
       "      <td>0.880382</td>\n",
       "      <td>0.173114</td>\n",
       "      <td>-0.735930</td>\n",
       "      <td>0.461773</td>\n",
       "      <td>.</td>\n",
       "      <td>rs11240779</td>\n",
       "      <td>A</td>\n",
       "      <td>994</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>542753</th>\n",
       "      <td>22</td>\n",
       "      <td>51174939</td>\n",
       "      <td>rs73174435</td>\n",
       "      <td>C</td>\n",
       "      <td>T</td>\n",
       "      <td>T</td>\n",
       "      <td>Y</td>\n",
       "      <td>ADD</td>\n",
       "      <td>999</td>\n",
       "      <td>0.642727</td>\n",
       "      <td>0.362564</td>\n",
       "      <td>-1.219190</td>\n",
       "      <td>0.222772</td>\n",
       "      <td>.</td>\n",
       "      <td>rs73174435</td>\n",
       "      <td>C</td>\n",
       "      <td>999</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>542754</th>\n",
       "      <td>22</td>\n",
       "      <td>51175626</td>\n",
       "      <td>rs3810648</td>\n",
       "      <td>A</td>\n",
       "      <td>G</td>\n",
       "      <td>G</td>\n",
       "      <td>Y</td>\n",
       "      <td>ADD</td>\n",
       "      <td>996</td>\n",
       "      <td>0.752885</td>\n",
       "      <td>0.286799</td>\n",
       "      <td>-0.989690</td>\n",
       "      <td>0.322326</td>\n",
       "      <td>.</td>\n",
       "      <td>rs3810648</td>\n",
       "      <td>A</td>\n",
       "      <td>996</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>542755</th>\n",
       "      <td>22</td>\n",
       "      <td>51183255</td>\n",
       "      <td>rs5771002</td>\n",
       "      <td>G</td>\n",
       "      <td>A</td>\n",
       "      <td>A</td>\n",
       "      <td>Y</td>\n",
       "      <td>ADD</td>\n",
       "      <td>981</td>\n",
       "      <td>0.792577</td>\n",
       "      <td>0.150356</td>\n",
       "      <td>-1.546100</td>\n",
       "      <td>0.122080</td>\n",
       "      <td>.</td>\n",
       "      <td>rs5771002</td>\n",
       "      <td>G</td>\n",
       "      <td>981</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>542756</th>\n",
       "      <td>22</td>\n",
       "      <td>51185848</td>\n",
       "      <td>rs3865764</td>\n",
       "      <td>A</td>\n",
       "      <td>G</td>\n",
       "      <td>G</td>\n",
       "      <td>Y</td>\n",
       "      <td>ADD</td>\n",
       "      <td>996</td>\n",
       "      <td>1.004930</td>\n",
       "      <td>0.386700</td>\n",
       "      <td>0.012715</td>\n",
       "      <td>0.989855</td>\n",
       "      <td>.</td>\n",
       "      <td>rs3865764</td>\n",
       "      <td>A</td>\n",
       "      <td>996</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>542757</th>\n",
       "      <td>22</td>\n",
       "      <td>51193629</td>\n",
       "      <td>rs142680588</td>\n",
       "      <td>A</td>\n",
       "      <td>G</td>\n",
       "      <td>G</td>\n",
       "      <td>Y</td>\n",
       "      <td>ADD</td>\n",
       "      <td>1000</td>\n",
       "      <td>1.497360</td>\n",
       "      <td>0.267489</td>\n",
       "      <td>1.509230</td>\n",
       "      <td>0.131240</td>\n",
       "      <td>.</td>\n",
       "      <td>rs142680588</td>\n",
       "      <td>A</td>\n",
       "      <td>1000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>542758 rows × 17 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "        #CHROM       POS           ID REF ALT A1 FIRTH? TEST  OBS_CT  \\\n",
       "0            1    756604    rs3131962   G   A  A      Y  ADD     999   \n",
       "1            1    768448   rs12562034   G   A  A      Y  ADD     996   \n",
       "2            1    779322    rs4040617   A   G  G      Y  ADD     996   \n",
       "3            1    801536   rs79373928   T   G  G      Y  ADD     998   \n",
       "4            1    808631   rs11240779   A   G  G      Y  ADD     994   \n",
       "...        ...       ...          ...  ..  .. ..    ...  ...     ...   \n",
       "542753      22  51174939   rs73174435   C   T  T      Y  ADD     999   \n",
       "542754      22  51175626    rs3810648   A   G  G      Y  ADD     996   \n",
       "542755      22  51183255    rs5771002   G   A  A      Y  ADD     981   \n",
       "542756      22  51185848    rs3865764   A   G  G      Y  ADD     996   \n",
       "542757      22  51193629  rs142680588   A   G  G      Y  ADD    1000   \n",
       "\n",
       "              OR  LOG(OR)_SE    Z_STAT         P ERRCODE          SNP A2     N  \n",
       "0       1.241130    0.209870  1.029320  0.303330       .    rs3131962  G   999  \n",
       "1       0.433894    0.285912 -2.920330  0.003497       .   rs12562034  G   996  \n",
       "2       1.178310    0.211892  0.774379  0.438707       .    rs4040617  A   996  \n",
       "3       0.989852    0.479159 -0.021286  0.983018       .   rs79373928  T   998  \n",
       "4       0.880382    0.173114 -0.735930  0.461773       .   rs11240779  A   994  \n",
       "...          ...         ...       ...       ...     ...          ... ..   ...  \n",
       "542753  0.642727    0.362564 -1.219190  0.222772       .   rs73174435  C   999  \n",
       "542754  0.752885    0.286799 -0.989690  0.322326       .    rs3810648  A   996  \n",
       "542755  0.792577    0.150356 -1.546100  0.122080       .    rs5771002  G   981  \n",
       "542756  1.004930    0.386700  0.012715  0.989855       .    rs3865764  A   996  \n",
       "542757  1.497360    0.267489  1.509230  0.131240       .  rs142680588  A  1000  \n",
       "\n",
       "[542758 rows x 17 columns]"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.load_gwas_subsample(pheno = 'body_BALDING1', sample_size = 1000, seed = 1)\n",
    "data.lr_uni"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can also load the full cohort GWAS for these 21 traits via:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>#CHROM</th>\n",
       "      <th>ID</th>\n",
       "      <th>POS</th>\n",
       "      <th>A1</th>\n",
       "      <th>A2</th>\n",
       "      <th>N</th>\n",
       "      <th>AF1</th>\n",
       "      <th>BETA</th>\n",
       "      <th>SE</th>\n",
       "      <th>P</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>rs3131962</td>\n",
       "      <td>756604</td>\n",
       "      <td>A</td>\n",
       "      <td>G</td>\n",
       "      <td>407023</td>\n",
       "      <td>0.129655</td>\n",
       "      <td>0.000286</td>\n",
       "      <td>0.001048</td>\n",
       "      <td>0.784760</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>rs12562034</td>\n",
       "      <td>768448</td>\n",
       "      <td>A</td>\n",
       "      <td>G</td>\n",
       "      <td>407057</td>\n",
       "      <td>0.104966</td>\n",
       "      <td>-0.001491</td>\n",
       "      <td>0.001147</td>\n",
       "      <td>0.193592</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>rs4040617</td>\n",
       "      <td>779322</td>\n",
       "      <td>G</td>\n",
       "      <td>A</td>\n",
       "      <td>406623</td>\n",
       "      <td>0.127520</td>\n",
       "      <td>0.000108</td>\n",
       "      <td>0.001056</td>\n",
       "      <td>0.918404</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>rs79373928</td>\n",
       "      <td>801536</td>\n",
       "      <td>G</td>\n",
       "      <td>T</td>\n",
       "      <td>407517</td>\n",
       "      <td>0.014884</td>\n",
       "      <td>0.004382</td>\n",
       "      <td>0.002904</td>\n",
       "      <td>0.131349</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>rs11240779</td>\n",
       "      <td>808631</td>\n",
       "      <td>G</td>\n",
       "      <td>A</td>\n",
       "      <td>404493</td>\n",
       "      <td>0.224886</td>\n",
       "      <td>-0.001155</td>\n",
       "      <td>0.000846</td>\n",
       "      <td>0.172345</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>542753</th>\n",
       "      <td>22</td>\n",
       "      <td>rs73174435</td>\n",
       "      <td>51174939</td>\n",
       "      <td>T</td>\n",
       "      <td>C</td>\n",
       "      <td>407201</td>\n",
       "      <td>0.053846</td>\n",
       "      <td>-0.001980</td>\n",
       "      <td>0.001559</td>\n",
       "      <td>0.203959</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>542754</th>\n",
       "      <td>22</td>\n",
       "      <td>rs3810648</td>\n",
       "      <td>51175626</td>\n",
       "      <td>G</td>\n",
       "      <td>A</td>\n",
       "      <td>404901</td>\n",
       "      <td>0.060979</td>\n",
       "      <td>0.001922</td>\n",
       "      <td>0.001474</td>\n",
       "      <td>0.192116</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>542755</th>\n",
       "      <td>22</td>\n",
       "      <td>rs5771002</td>\n",
       "      <td>51183255</td>\n",
       "      <td>A</td>\n",
       "      <td>G</td>\n",
       "      <td>401398</td>\n",
       "      <td>0.333603</td>\n",
       "      <td>-0.000165</td>\n",
       "      <td>0.000751</td>\n",
       "      <td>0.826494</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>542756</th>\n",
       "      <td>22</td>\n",
       "      <td>rs3865764</td>\n",
       "      <td>51185848</td>\n",
       "      <td>G</td>\n",
       "      <td>A</td>\n",
       "      <td>406611</td>\n",
       "      <td>0.050601</td>\n",
       "      <td>-0.001311</td>\n",
       "      <td>0.001605</td>\n",
       "      <td>0.413994</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>542757</th>\n",
       "      <td>22</td>\n",
       "      <td>rs142680588</td>\n",
       "      <td>51193629</td>\n",
       "      <td>G</td>\n",
       "      <td>A</td>\n",
       "      <td>407108</td>\n",
       "      <td>0.075912</td>\n",
       "      <td>-0.002861</td>\n",
       "      <td>0.001329</td>\n",
       "      <td>0.031362</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>542758 rows × 10 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "        #CHROM           ID       POS A1 A2       N       AF1      BETA  \\\n",
       "0            1    rs3131962    756604  A  G  407023  0.129655  0.000286   \n",
       "1            1   rs12562034    768448  A  G  407057  0.104966 -0.001491   \n",
       "2            1    rs4040617    779322  G  A  406623  0.127520  0.000108   \n",
       "3            1   rs79373928    801536  G  T  407517  0.014884  0.004382   \n",
       "4            1   rs11240779    808631  G  A  404493  0.224886 -0.001155   \n",
       "...        ...          ...       ... .. ..     ...       ...       ...   \n",
       "542753      22   rs73174435  51174939  T  C  407201  0.053846 -0.001980   \n",
       "542754      22    rs3810648  51175626  G  A  404901  0.060979  0.001922   \n",
       "542755      22    rs5771002  51183255  A  G  401398  0.333603 -0.000165   \n",
       "542756      22    rs3865764  51185848  G  A  406611  0.050601 -0.001311   \n",
       "542757      22  rs142680588  51193629  G  A  407108  0.075912 -0.002861   \n",
       "\n",
       "              SE         P  \n",
       "0       0.001048  0.784760  \n",
       "1       0.001147  0.193592  \n",
       "2       0.001056  0.918404  \n",
       "3       0.002904  0.131349  \n",
       "4       0.000846  0.172345  \n",
       "...          ...       ...  \n",
       "542753  0.001559  0.203959  \n",
       "542754  0.001474  0.192116  \n",
       "542755  0.000751  0.826494  \n",
       "542756  0.001605  0.413994  \n",
       "542757  0.001329  0.031362  \n",
       "\n",
       "[542758 rows x 10 columns]"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.load_full_gwas(pheno = 'body_BALDING1')\n",
    "data.lr_uni"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is the basic KGWAS interface! Check out the other notebooks for other capabilities of KGWAS!"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "a100_env",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}