Diff of /demo/kgwas_101.ipynb [000000] .. [8790ab]

Switch to unified view

a b/demo/kgwas_101.ipynb
1
{
2
 "cells": [
3
  {
4
   "cell_type": "markdown",
5
   "metadata": {},
6
   "source": [
7
    "## Basic API Usage of KGWAS\n",
8
    "\n",
9
    "KGWAS consists of two main class `KGWAS` and `KGWAS_Data`. `KGWAS` is the main class for the KGWAS model, and `KGWAS_Data` is the class for the data manipulation. In default, to ensure fast user experience, we provide a default fast mode of KGWAS, which uses Enformer embedding for variant feature and ESM embedding for gene features (instead of the baselineLD for variant and PoPS for gene since they are large files). For the fast mode, you do not need to download any data, the KGWAS API will automatically download the relevant files. This mode can be used to apply KGWAS to your own GWAS sumstats. "
10
   ]
11
  },
12
  {
13
   "cell_type": "code",
14
   "execution_count": 1,
15
   "metadata": {},
16
   "outputs": [
17
    {
18
     "name": "stdout",
19
     "output_type": "stream",
20
     "text": [
21
      "All required data files are present.\n",
22
      "--loading KG---\n",
23
      "--using enformer SNP embedding--\n",
24
      "--using random go embedding--\n",
25
      "--using ESM gene embedding--\n"
26
     ]
27
    }
28
   ],
29
   "source": [
30
    "import sys\n",
31
    "sys.path.append('../')\n",
32
    "\n",
33
    "from kgwas import KGWAS, KGWAS_Data\n",
34
    "data = KGWAS_Data(data_path = './data/')\n",
35
    "data.load_kg()"
36
   ]
37
  },
38
  {
39
   "cell_type": "markdown",
40
   "metadata": {},
41
   "source": [
42
    "Now, the data needed for training is downloaded from the server and the knowledge graph is loaded. Next, we load the GWAS file. Here, we are using an example GWAS file, which is also automatically downloaded from the server. But you can also use your own GWAS file. The GWAS file should be in the format of a pandas DataFrame with columns `CHR`/`#CHROM`, `SNP`, `P`, `N`. Note that at the moment, our knowledge graph is UKBioBank directly genotyped variant set so it will automatically takes the overlap with the KG. Current efforts are underway for improving the coverage of the KG."
43
   ]
44
  },
45
  {
46
   "cell_type": "code",
47
   "execution_count": 2,
48
   "metadata": {},
49
   "outputs": [
50
    {
51
     "name": "stdout",
52
     "output_type": "stream",
53
     "text": [
54
      "Loading example GWAS file...\n",
55
      "Example file already exists locally.\n",
56
      "Loading GWAS file from ./data/biochemistry_Creatinine_fastgwa_full_10000_1.fastGWA...\n",
57
      "Number of SNPs in the KG: 784256\n",
58
      "Number of SNPs in the GWAS: 542758\n",
59
      "Number of SNPs in the KG variant set: 542758\n",
60
      "Using ldsc weight...\n",
61
      "ldsc_weight mean:  0.9999999999999993\n"
62
     ]
63
    }
64
   ],
65
   "source": [
66
    "data.load_external_gwas(example_file = True)\n",
67
    "data.process_gwas_file()\n",
68
    "data.prepare_split()"
69
   ]
70
  },
71
  {
72
   "cell_type": "code",
73
   "execution_count": 3,
74
   "metadata": {},
75
   "outputs": [
76
    {
77
     "data": {
78
      "text/html": [
79
       "<div>\n",
80
       "<style scoped>\n",
81
       "    .dataframe tbody tr th:only-of-type {\n",
82
       "        vertical-align: middle;\n",
83
       "    }\n",
84
       "\n",
85
       "    .dataframe tbody tr th {\n",
86
       "        vertical-align: top;\n",
87
       "    }\n",
88
       "\n",
89
       "    .dataframe thead th {\n",
90
       "        text-align: right;\n",
91
       "    }\n",
92
       "</style>\n",
93
       "<table border=\"1\" class=\"dataframe\">\n",
94
       "  <thead>\n",
95
       "    <tr style=\"text-align: right;\">\n",
96
       "      <th></th>\n",
97
       "      <th>#CHROM</th>\n",
98
       "      <th>ID</th>\n",
99
       "      <th>POS</th>\n",
100
       "      <th>A1</th>\n",
101
       "      <th>A2</th>\n",
102
       "      <th>N</th>\n",
103
       "      <th>AF1</th>\n",
104
       "      <th>BETA</th>\n",
105
       "      <th>SE</th>\n",
106
       "      <th>P</th>\n",
107
       "      <th>ld_score</th>\n",
108
       "      <th>w_ld_score</th>\n",
109
       "      <th>y</th>\n",
110
       "    </tr>\n",
111
       "  </thead>\n",
112
       "  <tbody>\n",
113
       "    <tr>\n",
114
       "      <th>0</th>\n",
115
       "      <td>1</td>\n",
116
       "      <td>rs3131962</td>\n",
117
       "      <td>756604</td>\n",
118
       "      <td>A</td>\n",
119
       "      <td>G</td>\n",
120
       "      <td>9988</td>\n",
121
       "      <td>0.131007</td>\n",
122
       "      <td>-0.117134</td>\n",
123
       "      <td>0.246231</td>\n",
124
       "      <td>0.634282</td>\n",
125
       "      <td>72.862240</td>\n",
126
       "      <td>4.474788</td>\n",
127
       "      <td>0.226298</td>\n",
128
       "    </tr>\n",
129
       "    <tr>\n",
130
       "      <th>1</th>\n",
131
       "      <td>1</td>\n",
132
       "      <td>rs12562034</td>\n",
133
       "      <td>768448</td>\n",
134
       "      <td>A</td>\n",
135
       "      <td>G</td>\n",
136
       "      <td>9978</td>\n",
137
       "      <td>0.104981</td>\n",
138
       "      <td>-0.064894</td>\n",
139
       "      <td>0.273746</td>\n",
140
       "      <td>0.812611</td>\n",
141
       "      <td>34.749233</td>\n",
142
       "      <td>1.877341</td>\n",
143
       "      <td>0.056197</td>\n",
144
       "    </tr>\n",
145
       "    <tr>\n",
146
       "      <th>2</th>\n",
147
       "      <td>1</td>\n",
148
       "      <td>rs4040617</td>\n",
149
       "      <td>779322</td>\n",
150
       "      <td>G</td>\n",
151
       "      <td>A</td>\n",
152
       "      <td>9975</td>\n",
153
       "      <td>0.129123</td>\n",
154
       "      <td>-0.001462</td>\n",
155
       "      <td>0.247254</td>\n",
156
       "      <td>0.995281</td>\n",
157
       "      <td>72.271390</td>\n",
158
       "      <td>4.208873</td>\n",
159
       "      <td>0.000035</td>\n",
160
       "    </tr>\n",
161
       "    <tr>\n",
162
       "      <th>3</th>\n",
163
       "      <td>1</td>\n",
164
       "      <td>rs79373928</td>\n",
165
       "      <td>801536</td>\n",
166
       "      <td>G</td>\n",
167
       "      <td>T</td>\n",
168
       "      <td>9994</td>\n",
169
       "      <td>0.014659</td>\n",
170
       "      <td>0.081544</td>\n",
171
       "      <td>0.688261</td>\n",
172
       "      <td>0.905688</td>\n",
173
       "      <td>16.740126</td>\n",
174
       "      <td>1.949177</td>\n",
175
       "      <td>0.014037</td>\n",
176
       "    </tr>\n",
177
       "    <tr>\n",
178
       "      <th>4</th>\n",
179
       "      <td>1</td>\n",
180
       "      <td>rs11240779</td>\n",
181
       "      <td>808631</td>\n",
182
       "      <td>G</td>\n",
183
       "      <td>A</td>\n",
184
       "      <td>9919</td>\n",
185
       "      <td>0.226737</td>\n",
186
       "      <td>-0.184268</td>\n",
187
       "      <td>0.198982</td>\n",
188
       "      <td>0.354418</td>\n",
189
       "      <td>50.215000</td>\n",
190
       "      <td>2.825456</td>\n",
191
       "      <td>0.857575</td>\n",
192
       "    </tr>\n",
193
       "    <tr>\n",
194
       "      <th>...</th>\n",
195
       "      <td>...</td>\n",
196
       "      <td>...</td>\n",
197
       "      <td>...</td>\n",
198
       "      <td>...</td>\n",
199
       "      <td>...</td>\n",
200
       "      <td>...</td>\n",
201
       "      <td>...</td>\n",
202
       "      <td>...</td>\n",
203
       "      <td>...</td>\n",
204
       "      <td>...</td>\n",
205
       "      <td>...</td>\n",
206
       "      <td>...</td>\n",
207
       "      <td>...</td>\n",
208
       "    </tr>\n",
209
       "    <tr>\n",
210
       "      <th>542753</th>\n",
211
       "      <td>22</td>\n",
212
       "      <td>rs73174435</td>\n",
213
       "      <td>51174939</td>\n",
214
       "      <td>T</td>\n",
215
       "      <td>C</td>\n",
216
       "      <td>9979</td>\n",
217
       "      <td>0.056118</td>\n",
218
       "      <td>-0.158762</td>\n",
219
       "      <td>0.362390</td>\n",
220
       "      <td>0.661316</td>\n",
221
       "      <td>21.981667</td>\n",
222
       "      <td>1.363001</td>\n",
223
       "      <td>0.191929</td>\n",
224
       "    </tr>\n",
225
       "    <tr>\n",
226
       "      <th>542754</th>\n",
227
       "      <td>22</td>\n",
228
       "      <td>rs3810648</td>\n",
229
       "      <td>51175626</td>\n",
230
       "      <td>G</td>\n",
231
       "      <td>A</td>\n",
232
       "      <td>9931</td>\n",
233
       "      <td>0.058856</td>\n",
234
       "      <td>0.272493</td>\n",
235
       "      <td>0.352508</td>\n",
236
       "      <td>0.439515</td>\n",
237
       "      <td>34.619377</td>\n",
238
       "      <td>1.804193</td>\n",
239
       "      <td>0.597548</td>\n",
240
       "    </tr>\n",
241
       "    <tr>\n",
242
       "      <th>542755</th>\n",
243
       "      <td>22</td>\n",
244
       "      <td>rs5771002</td>\n",
245
       "      <td>51183255</td>\n",
246
       "      <td>A</td>\n",
247
       "      <td>G</td>\n",
248
       "      <td>9840</td>\n",
249
       "      <td>0.333638</td>\n",
250
       "      <td>0.116325</td>\n",
251
       "      <td>0.175675</td>\n",
252
       "      <td>0.507869</td>\n",
253
       "      <td>16.231083</td>\n",
254
       "      <td>1.273770</td>\n",
255
       "      <td>0.438456</td>\n",
256
       "    </tr>\n",
257
       "    <tr>\n",
258
       "      <th>542756</th>\n",
259
       "      <td>22</td>\n",
260
       "      <td>rs3865764</td>\n",
261
       "      <td>51185848</td>\n",
262
       "      <td>G</td>\n",
263
       "      <td>A</td>\n",
264
       "      <td>9974</td>\n",
265
       "      <td>0.051133</td>\n",
266
       "      <td>-0.026670</td>\n",
267
       "      <td>0.376132</td>\n",
268
       "      <td>0.943472</td>\n",
269
       "      <td>18.649513</td>\n",
270
       "      <td>1.010000</td>\n",
271
       "      <td>0.005028</td>\n",
272
       "    </tr>\n",
273
       "    <tr>\n",
274
       "      <th>542757</th>\n",
275
       "      <td>22</td>\n",
276
       "      <td>rs142680588</td>\n",
277
       "      <td>51193629</td>\n",
278
       "      <td>G</td>\n",
279
       "      <td>A</td>\n",
280
       "      <td>9981</td>\n",
281
       "      <td>0.076595</td>\n",
282
       "      <td>-0.109532</td>\n",
283
       "      <td>0.312971</td>\n",
284
       "      <td>0.726358</td>\n",
285
       "      <td>52.471287</td>\n",
286
       "      <td>1.873861</td>\n",
287
       "      <td>0.122482</td>\n",
288
       "    </tr>\n",
289
       "  </tbody>\n",
290
       "</table>\n",
291
       "<p>542758 rows × 13 columns</p>\n",
292
       "</div>"
293
      ],
294
      "text/plain": [
295
       "        #CHROM           ID       POS A1 A2     N       AF1      BETA  \\\n",
296
       "0            1    rs3131962    756604  A  G  9988  0.131007 -0.117134   \n",
297
       "1            1   rs12562034    768448  A  G  9978  0.104981 -0.064894   \n",
298
       "2            1    rs4040617    779322  G  A  9975  0.129123 -0.001462   \n",
299
       "3            1   rs79373928    801536  G  T  9994  0.014659  0.081544   \n",
300
       "4            1   rs11240779    808631  G  A  9919  0.226737 -0.184268   \n",
301
       "...        ...          ...       ... .. ..   ...       ...       ...   \n",
302
       "542753      22   rs73174435  51174939  T  C  9979  0.056118 -0.158762   \n",
303
       "542754      22    rs3810648  51175626  G  A  9931  0.058856  0.272493   \n",
304
       "542755      22    rs5771002  51183255  A  G  9840  0.333638  0.116325   \n",
305
       "542756      22    rs3865764  51185848  G  A  9974  0.051133 -0.026670   \n",
306
       "542757      22  rs142680588  51193629  G  A  9981  0.076595 -0.109532   \n",
307
       "\n",
308
       "              SE         P   ld_score  w_ld_score         y  \n",
309
       "0       0.246231  0.634282  72.862240    4.474788  0.226298  \n",
310
       "1       0.273746  0.812611  34.749233    1.877341  0.056197  \n",
311
       "2       0.247254  0.995281  72.271390    4.208873  0.000035  \n",
312
       "3       0.688261  0.905688  16.740126    1.949177  0.014037  \n",
313
       "4       0.198982  0.354418  50.215000    2.825456  0.857575  \n",
314
       "...          ...       ...        ...         ...       ...  \n",
315
       "542753  0.362390  0.661316  21.981667    1.363001  0.191929  \n",
316
       "542754  0.352508  0.439515  34.619377    1.804193  0.597548  \n",
317
       "542755  0.175675  0.507869  16.231083    1.273770  0.438456  \n",
318
       "542756  0.376132  0.943472  18.649513    1.010000  0.005028  \n",
319
       "542757  0.312971  0.726358  52.471287    1.873861  0.122482  \n",
320
       "\n",
321
       "[542758 rows x 13 columns]"
322
      ]
323
     },
324
     "execution_count": 3,
325
     "metadata": {},
326
     "output_type": "execute_result"
327
    }
328
   ],
329
   "source": [
330
    "data.lr_uni"
331
   ]
332
  },
333
  {
334
   "cell_type": "markdown",
335
   "metadata": {},
336
   "source": [
337
    "Next, we are ready to train the model! Here we are using epoch = 1 for the demo purpose, but in reality, you should use a higher number of epochs for better performance."
338
   ]
339
  },
340
  {
341
   "cell_type": "code",
342
   "execution_count": 4,
343
   "metadata": {},
344
   "outputs": [
345
    {
346
     "name": "stderr",
347
     "output_type": "stream",
348
     "text": [
349
      "Creating data loader...\n",
350
      "Start Training...\n",
351
      "Training Progress Epoch 1/1:  52%|█████▏    | 500/956 [12:56<15:47,  2.08s/it]Epoch 1 Step 501 Train Loss: 1.8115\n",
352
      "Training Progress Epoch 1/1: 100%|██████████| 956/956 [24:26<00:00,  1.53s/it]\n",
353
      "100%|██████████| 50/50 [00:58<00:00,  1.17s/it]\n",
354
      "Epoch 1: Validation MSE: 2.1730 Validation Pearson: 0.0096. \n",
355
      "Saving models to ./data//model/test\n",
356
      "100%|██████████| 54/54 [00:56<00:00,  1.04s/it]\n",
357
      "100%|██████████| 1061/1061 [05:40<00:00,  3.11it/s]\n"
358
     ]
359
    },
360
    {
361
     "name": "stdout",
362
     "output_type": "stream",
363
     "text": [
364
      "KGWAS prediction and p-values saved to ./data//model_pred/new_experiments/test_pred.csv\n"
365
     ]
366
    }
367
   ],
368
   "source": [
369
    "run = KGWAS(data, device = 'cuda:9', exp_name = 'test')\n",
370
    "run.initialize_model()\n",
371
    "run.train(epoch = 1)"
372
   ]
373
  },
374
  {
375
   "cell_type": "markdown",
376
   "metadata": {},
377
   "source": [
378
    "The output of the model is saved to `/model_pred/new_experiments/{exp_name}_pred.csv`. You can also load it via `run.kgwas_res`. The model is also saved to `/model/{exp_name}`."
379
   ]
380
  },
381
  {
382
   "cell_type": "code",
383
   "execution_count": 5,
384
   "metadata": {},
385
   "outputs": [
386
    {
387
     "data": {
388
      "text/html": [
389
       "<div>\n",
390
       "<style scoped>\n",
391
       "    .dataframe tbody tr th:only-of-type {\n",
392
       "        vertical-align: middle;\n",
393
       "    }\n",
394
       "\n",
395
       "    .dataframe tbody tr th {\n",
396
       "        vertical-align: top;\n",
397
       "    }\n",
398
       "\n",
399
       "    .dataframe thead th {\n",
400
       "        text-align: right;\n",
401
       "    }\n",
402
       "</style>\n",
403
       "<table border=\"1\" class=\"dataframe\">\n",
404
       "  <thead>\n",
405
       "    <tr style=\"text-align: right;\">\n",
406
       "      <th></th>\n",
407
       "      <th>#CHROM</th>\n",
408
       "      <th>ID</th>\n",
409
       "      <th>POS</th>\n",
410
       "      <th>A1</th>\n",
411
       "      <th>A2</th>\n",
412
       "      <th>N</th>\n",
413
       "      <th>AF1</th>\n",
414
       "      <th>BETA</th>\n",
415
       "      <th>SE</th>\n",
416
       "      <th>P</th>\n",
417
       "      <th>ld_score</th>\n",
418
       "      <th>w_ld_score</th>\n",
419
       "      <th>y</th>\n",
420
       "      <th>pred</th>\n",
421
       "      <th>P_weighted</th>\n",
422
       "      <th>KGWAS_P</th>\n",
423
       "    </tr>\n",
424
       "  </thead>\n",
425
       "  <tbody>\n",
426
       "    <tr>\n",
427
       "      <th>0</th>\n",
428
       "      <td>1</td>\n",
429
       "      <td>rs3131962</td>\n",
430
       "      <td>756604</td>\n",
431
       "      <td>A</td>\n",
432
       "      <td>G</td>\n",
433
       "      <td>9988</td>\n",
434
       "      <td>0.131007</td>\n",
435
       "      <td>-0.117134</td>\n",
436
       "      <td>0.246231</td>\n",
437
       "      <td>0.634282</td>\n",
438
       "      <td>72.862240</td>\n",
439
       "      <td>4.474788</td>\n",
440
       "      <td>0.226298</td>\n",
441
       "      <td>1.082365</td>\n",
442
       "      <td>0.234167</td>\n",
443
       "      <td>0.346428</td>\n",
444
       "    </tr>\n",
445
       "    <tr>\n",
446
       "      <th>1</th>\n",
447
       "      <td>1</td>\n",
448
       "      <td>rs12562034</td>\n",
449
       "      <td>768448</td>\n",
450
       "      <td>A</td>\n",
451
       "      <td>G</td>\n",
452
       "      <td>9978</td>\n",
453
       "      <td>0.104981</td>\n",
454
       "      <td>-0.064894</td>\n",
455
       "      <td>0.273746</td>\n",
456
       "      <td>0.812611</td>\n",
457
       "      <td>34.749233</td>\n",
458
       "      <td>1.877341</td>\n",
459
       "      <td>0.056197</td>\n",
460
       "      <td>1.087724</td>\n",
461
       "      <td>0.382894</td>\n",
462
       "      <td>0.566456</td>\n",
463
       "    </tr>\n",
464
       "    <tr>\n",
465
       "      <th>2</th>\n",
466
       "      <td>1</td>\n",
467
       "      <td>rs4040617</td>\n",
468
       "      <td>779322</td>\n",
469
       "      <td>G</td>\n",
470
       "      <td>A</td>\n",
471
       "      <td>9975</td>\n",
472
       "      <td>0.129123</td>\n",
473
       "      <td>-0.001462</td>\n",
474
       "      <td>0.247254</td>\n",
475
       "      <td>0.995281</td>\n",
476
       "      <td>72.271390</td>\n",
477
       "      <td>4.208873</td>\n",
478
       "      <td>0.000035</td>\n",
479
       "      <td>1.058530</td>\n",
480
       "      <td>0.995281</td>\n",
481
       "      <td>1</td>\n",
482
       "    </tr>\n",
483
       "    <tr>\n",
484
       "      <th>3</th>\n",
485
       "      <td>1</td>\n",
486
       "      <td>rs79373928</td>\n",
487
       "      <td>801536</td>\n",
488
       "      <td>G</td>\n",
489
       "      <td>T</td>\n",
490
       "      <td>9994</td>\n",
491
       "      <td>0.014659</td>\n",
492
       "      <td>0.081544</td>\n",
493
       "      <td>0.688261</td>\n",
494
       "      <td>0.905688</td>\n",
495
       "      <td>16.740126</td>\n",
496
       "      <td>1.949177</td>\n",
497
       "      <td>0.014037</td>\n",
498
       "      <td>1.105125</td>\n",
499
       "      <td>0.225107</td>\n",
500
       "      <td>0.333025</td>\n",
501
       "    </tr>\n",
502
       "    <tr>\n",
503
       "      <th>4</th>\n",
504
       "      <td>1</td>\n",
505
       "      <td>rs11240779</td>\n",
506
       "      <td>808631</td>\n",
507
       "      <td>G</td>\n",
508
       "      <td>A</td>\n",
509
       "      <td>9919</td>\n",
510
       "      <td>0.226737</td>\n",
511
       "      <td>-0.184268</td>\n",
512
       "      <td>0.198982</td>\n",
513
       "      <td>0.354418</td>\n",
514
       "      <td>50.215000</td>\n",
515
       "      <td>2.825456</td>\n",
516
       "      <td>0.857575</td>\n",
517
       "      <td>1.081468</td>\n",
518
       "      <td>0.041646</td>\n",
519
       "      <td>0.061612</td>\n",
520
       "    </tr>\n",
521
       "    <tr>\n",
522
       "      <th>...</th>\n",
523
       "      <td>...</td>\n",
524
       "      <td>...</td>\n",
525
       "      <td>...</td>\n",
526
       "      <td>...</td>\n",
527
       "      <td>...</td>\n",
528
       "      <td>...</td>\n",
529
       "      <td>...</td>\n",
530
       "      <td>...</td>\n",
531
       "      <td>...</td>\n",
532
       "      <td>...</td>\n",
533
       "      <td>...</td>\n",
534
       "      <td>...</td>\n",
535
       "      <td>...</td>\n",
536
       "      <td>...</td>\n",
537
       "      <td>...</td>\n",
538
       "      <td>...</td>\n",
539
       "    </tr>\n",
540
       "    <tr>\n",
541
       "      <th>542753</th>\n",
542
       "      <td>22</td>\n",
543
       "      <td>rs73174435</td>\n",
544
       "      <td>51174939</td>\n",
545
       "      <td>T</td>\n",
546
       "      <td>C</td>\n",
547
       "      <td>9979</td>\n",
548
       "      <td>0.056118</td>\n",
549
       "      <td>-0.158762</td>\n",
550
       "      <td>0.362390</td>\n",
551
       "      <td>0.661316</td>\n",
552
       "      <td>21.981667</td>\n",
553
       "      <td>1.363001</td>\n",
554
       "      <td>0.191929</td>\n",
555
       "      <td>1.008835</td>\n",
556
       "      <td>0.233609</td>\n",
557
       "      <td>0.345602</td>\n",
558
       "    </tr>\n",
559
       "    <tr>\n",
560
       "      <th>542754</th>\n",
561
       "      <td>22</td>\n",
562
       "      <td>rs3810648</td>\n",
563
       "      <td>51175626</td>\n",
564
       "      <td>G</td>\n",
565
       "      <td>A</td>\n",
566
       "      <td>9931</td>\n",
567
       "      <td>0.058856</td>\n",
568
       "      <td>0.272493</td>\n",
569
       "      <td>0.352508</td>\n",
570
       "      <td>0.439515</td>\n",
571
       "      <td>34.619377</td>\n",
572
       "      <td>1.804193</td>\n",
573
       "      <td>0.597548</td>\n",
574
       "      <td>1.034187</td>\n",
575
       "      <td>0.439515</td>\n",
576
       "      <td>0.650221</td>\n",
577
       "    </tr>\n",
578
       "    <tr>\n",
579
       "      <th>542755</th>\n",
580
       "      <td>22</td>\n",
581
       "      <td>rs5771002</td>\n",
582
       "      <td>51183255</td>\n",
583
       "      <td>A</td>\n",
584
       "      <td>G</td>\n",
585
       "      <td>9840</td>\n",
586
       "      <td>0.333638</td>\n",
587
       "      <td>0.116325</td>\n",
588
       "      <td>0.175675</td>\n",
589
       "      <td>0.507869</td>\n",
590
       "      <td>16.231083</td>\n",
591
       "      <td>1.273770</td>\n",
592
       "      <td>0.438456</td>\n",
593
       "      <td>1.093221</td>\n",
594
       "      <td>0.449038</td>\n",
595
       "      <td>0.66431</td>\n",
596
       "    </tr>\n",
597
       "    <tr>\n",
598
       "      <th>542756</th>\n",
599
       "      <td>22</td>\n",
600
       "      <td>rs3865764</td>\n",
601
       "      <td>51185848</td>\n",
602
       "      <td>G</td>\n",
603
       "      <td>A</td>\n",
604
       "      <td>9974</td>\n",
605
       "      <td>0.051133</td>\n",
606
       "      <td>-0.026670</td>\n",
607
       "      <td>0.376132</td>\n",
608
       "      <td>0.943472</td>\n",
609
       "      <td>18.649513</td>\n",
610
       "      <td>1.010000</td>\n",
611
       "      <td>0.005028</td>\n",
612
       "      <td>0.987747</td>\n",
613
       "      <td>0.943472</td>\n",
614
       "      <td>1</td>\n",
615
       "    </tr>\n",
616
       "    <tr>\n",
617
       "      <th>542757</th>\n",
618
       "      <td>22</td>\n",
619
       "      <td>rs142680588</td>\n",
620
       "      <td>51193629</td>\n",
621
       "      <td>G</td>\n",
622
       "      <td>A</td>\n",
623
       "      <td>9981</td>\n",
624
       "      <td>0.076595</td>\n",
625
       "      <td>-0.109532</td>\n",
626
       "      <td>0.312971</td>\n",
627
       "      <td>0.726358</td>\n",
628
       "      <td>52.471287</td>\n",
629
       "      <td>1.873861</td>\n",
630
       "      <td>0.122482</td>\n",
631
       "      <td>1.082649</td>\n",
632
       "      <td>0.26816</td>\n",
633
       "      <td>0.396718</td>\n",
634
       "    </tr>\n",
635
       "  </tbody>\n",
636
       "</table>\n",
637
       "<p>542758 rows × 16 columns</p>\n",
638
       "</div>"
639
      ],
640
      "text/plain": [
641
       "        #CHROM           ID       POS A1 A2     N       AF1      BETA  \\\n",
642
       "0            1    rs3131962    756604  A  G  9988  0.131007 -0.117134   \n",
643
       "1            1   rs12562034    768448  A  G  9978  0.104981 -0.064894   \n",
644
       "2            1    rs4040617    779322  G  A  9975  0.129123 -0.001462   \n",
645
       "3            1   rs79373928    801536  G  T  9994  0.014659  0.081544   \n",
646
       "4            1   rs11240779    808631  G  A  9919  0.226737 -0.184268   \n",
647
       "...        ...          ...       ... .. ..   ...       ...       ...   \n",
648
       "542753      22   rs73174435  51174939  T  C  9979  0.056118 -0.158762   \n",
649
       "542754      22    rs3810648  51175626  G  A  9931  0.058856  0.272493   \n",
650
       "542755      22    rs5771002  51183255  A  G  9840  0.333638  0.116325   \n",
651
       "542756      22    rs3865764  51185848  G  A  9974  0.051133 -0.026670   \n",
652
       "542757      22  rs142680588  51193629  G  A  9981  0.076595 -0.109532   \n",
653
       "\n",
654
       "              SE         P   ld_score  w_ld_score         y      pred  \\\n",
655
       "0       0.246231  0.634282  72.862240    4.474788  0.226298  1.082365   \n",
656
       "1       0.273746  0.812611  34.749233    1.877341  0.056197  1.087724   \n",
657
       "2       0.247254  0.995281  72.271390    4.208873  0.000035  1.058530   \n",
658
       "3       0.688261  0.905688  16.740126    1.949177  0.014037  1.105125   \n",
659
       "4       0.198982  0.354418  50.215000    2.825456  0.857575  1.081468   \n",
660
       "...          ...       ...        ...         ...       ...       ...   \n",
661
       "542753  0.362390  0.661316  21.981667    1.363001  0.191929  1.008835   \n",
662
       "542754  0.352508  0.439515  34.619377    1.804193  0.597548  1.034187   \n",
663
       "542755  0.175675  0.507869  16.231083    1.273770  0.438456  1.093221   \n",
664
       "542756  0.376132  0.943472  18.649513    1.010000  0.005028  0.987747   \n",
665
       "542757  0.312971  0.726358  52.471287    1.873861  0.122482  1.082649   \n",
666
       "\n",
667
       "       P_weighted   KGWAS_P  \n",
668
       "0        0.234167  0.346428  \n",
669
       "1        0.382894  0.566456  \n",
670
       "2        0.995281         1  \n",
671
       "3        0.225107  0.333025  \n",
672
       "4        0.041646  0.061612  \n",
673
       "...           ...       ...  \n",
674
       "542753   0.233609  0.345602  \n",
675
       "542754   0.439515  0.650221  \n",
676
       "542755   0.449038   0.66431  \n",
677
       "542756   0.943472         1  \n",
678
       "542757    0.26816  0.396718  \n",
679
       "\n",
680
       "[542758 rows x 16 columns]"
681
      ]
682
     },
683
     "execution_count": 5,
684
     "metadata": {},
685
     "output_type": "execute_result"
686
    }
687
   ],
688
   "source": [
689
    "run.kgwas_res"
690
   ]
691
  },
692
  {
693
   "cell_type": "markdown",
694
   "metadata": {},
695
   "source": [
696
    "If needed, you can load the pre-trained model via `run.load_pretrained()`."
697
   ]
698
  },
699
  {
700
   "cell_type": "code",
701
   "execution_count": null,
702
   "metadata": {},
703
   "outputs": [],
704
   "source": [
705
    "run.load_pretrained('./data/model/test')"
706
   ]
707
  },
708
  {
709
   "cell_type": "markdown",
710
   "metadata": {},
711
   "source": [
712
    "If you want to (1) use the full mode of KGWAS (i.e. larger node embeddings) or (2) access the null/causal simulations or (3) access the 21 subsampled GWAS sumstats across various sample sizes or (4) analyze the KGWAS sumstats for subsampled data or (5) analyze the KGWAS sumstats for all UKBB ICD10 diseases, please use [this link](https://drive.google.com/file/d/14UcHzPRIbdMmnLPZCHx_4G-gz2pipeg9/view?usp=sharing). Note that this file is large (around 45GB) and may take a while to download. After unzipping it, you can use that directory as the data directory for the KGWAS API."
713
   ]
714
  },
715
  {
716
   "cell_type": "code",
717
   "execution_count": 2,
718
   "metadata": {},
719
   "outputs": [
720
    {
721
     "name": "stdout",
722
     "output_type": "stream",
723
     "text": [
724
      "All required data files are present.\n"
725
     ]
726
    }
727
   ],
728
   "source": [
729
    "from kgwas import KGWAS, KGWAS_Data\n",
730
    "data = KGWAS_Data(data_path = '/dfs/project/datasets/20220524-ukbiobank/data/kgwas_data/')"
731
   ]
732
  },
733
  {
734
   "cell_type": "markdown",
735
   "metadata": {},
736
   "source": [
737
    "Now that you can use various variant, gene, and program embeddings. For example, for the result in the paper, we use the baselineLD for variant and PoPS for gene."
738
   ]
739
  },
740
  {
741
   "cell_type": "code",
742
   "execution_count": 3,
743
   "metadata": {},
744
   "outputs": [
745
    {
746
     "name": "stdout",
747
     "output_type": "stream",
748
     "text": [
749
      "--loading KG---\n",
750
      "--using baselineLD SNP embedding--\n",
751
      "--using random go embedding--\n",
752
      "--using PoPs expression+PPI+pathways gene embedding--\n"
753
     ]
754
    }
755
   ],
756
   "source": [
757
    "data.load_kg(snp_init_emb = 'baselineLD', \n",
758
    "             go_init_emb = 'random',\n",
759
    "             gene_init_emb = 'pops')"
760
   ]
761
  },
762
  {
763
   "cell_type": "markdown",
764
   "metadata": {},
765
   "source": [
766
    "There are many alternative embeddings as well. \n",
767
    "- For variant: `enformer` (default), `baselineLD`, `SLDSC`, `cadd`, `kg`, `random`\n",
768
    "- For gene: `esm` (default), `pops_expression`, `pops`, `kg`, `random`\n",
769
    "- For program/go: `random` (default), `biogpt`, `kg`\n",
770
    "\n",
771
    "In additional to more embeddings, the full data folder contains summary statistics used in each analysis in the paper. For example, for the simulations, you can load it via:"
772
   ]
773
  },
774
  {
775
   "cell_type": "code",
776
   "execution_count": null,
777
   "metadata": {},
778
   "outputs": [
779
    {
780
     "name": "stdout",
781
     "output_type": "stream",
782
     "text": [
783
      "All required data files are present.\n",
784
      "Using simulation data....\n"
785
     ]
786
    },
787
    {
788
     "data": {
789
      "text/html": [
790
       "<div>\n",
791
       "<style scoped>\n",
792
       "    .dataframe tbody tr th:only-of-type {\n",
793
       "        vertical-align: middle;\n",
794
       "    }\n",
795
       "\n",
796
       "    .dataframe tbody tr th {\n",
797
       "        vertical-align: top;\n",
798
       "    }\n",
799
       "\n",
800
       "    .dataframe thead th {\n",
801
       "        text-align: right;\n",
802
       "    }\n",
803
       "</style>\n",
804
       "<table border=\"1\" class=\"dataframe\">\n",
805
       "  <thead>\n",
806
       "    <tr style=\"text-align: right;\">\n",
807
       "      <th></th>\n",
808
       "      <th>#CHROM</th>\n",
809
       "      <th>ID</th>\n",
810
       "      <th>POS</th>\n",
811
       "      <th>A1</th>\n",
812
       "      <th>A2</th>\n",
813
       "      <th>N</th>\n",
814
       "      <th>AF1</th>\n",
815
       "      <th>BETA</th>\n",
816
       "      <th>SE</th>\n",
817
       "      <th>P</th>\n",
818
       "    </tr>\n",
819
       "  </thead>\n",
820
       "  <tbody>\n",
821
       "    <tr>\n",
822
       "      <th>0</th>\n",
823
       "      <td>1</td>\n",
824
       "      <td>rs3131962</td>\n",
825
       "      <td>756604</td>\n",
826
       "      <td>A</td>\n",
827
       "      <td>G</td>\n",
828
       "      <td>4993</td>\n",
829
       "      <td>0.129882</td>\n",
830
       "      <td>14.559400</td>\n",
831
       "      <td>17.1871</td>\n",
832
       "      <td>0.396933</td>\n",
833
       "    </tr>\n",
834
       "    <tr>\n",
835
       "      <th>1</th>\n",
836
       "      <td>1</td>\n",
837
       "      <td>rs12562034</td>\n",
838
       "      <td>768448</td>\n",
839
       "      <td>A</td>\n",
840
       "      <td>G</td>\n",
841
       "      <td>4994</td>\n",
842
       "      <td>0.103124</td>\n",
843
       "      <td>-15.034400</td>\n",
844
       "      <td>19.0234</td>\n",
845
       "      <td>0.429345</td>\n",
846
       "    </tr>\n",
847
       "    <tr>\n",
848
       "      <th>2</th>\n",
849
       "      <td>1</td>\n",
850
       "      <td>rs4040617</td>\n",
851
       "      <td>779322</td>\n",
852
       "      <td>G</td>\n",
853
       "      <td>A</td>\n",
854
       "      <td>4979</td>\n",
855
       "      <td>0.127435</td>\n",
856
       "      <td>15.537200</td>\n",
857
       "      <td>17.3933</td>\n",
858
       "      <td>0.371704</td>\n",
859
       "    </tr>\n",
860
       "    <tr>\n",
861
       "      <th>3</th>\n",
862
       "      <td>1</td>\n",
863
       "      <td>rs79373928</td>\n",
864
       "      <td>801536</td>\n",
865
       "      <td>G</td>\n",
866
       "      <td>T</td>\n",
867
       "      <td>4996</td>\n",
868
       "      <td>0.015012</td>\n",
869
       "      <td>16.142600</td>\n",
870
       "      <td>47.7752</td>\n",
871
       "      <td>0.735448</td>\n",
872
       "    </tr>\n",
873
       "    <tr>\n",
874
       "      <th>4</th>\n",
875
       "      <td>1</td>\n",
876
       "      <td>rs11240779</td>\n",
877
       "      <td>808631</td>\n",
878
       "      <td>G</td>\n",
879
       "      <td>A</td>\n",
880
       "      <td>4961</td>\n",
881
       "      <td>0.222233</td>\n",
882
       "      <td>0.859838</td>\n",
883
       "      <td>13.9158</td>\n",
884
       "      <td>0.950731</td>\n",
885
       "    </tr>\n",
886
       "    <tr>\n",
887
       "      <th>...</th>\n",
888
       "      <td>...</td>\n",
889
       "      <td>...</td>\n",
890
       "      <td>...</td>\n",
891
       "      <td>...</td>\n",
892
       "      <td>...</td>\n",
893
       "      <td>...</td>\n",
894
       "      <td>...</td>\n",
895
       "      <td>...</td>\n",
896
       "      <td>...</td>\n",
897
       "      <td>...</td>\n",
898
       "    </tr>\n",
899
       "    <tr>\n",
900
       "      <th>542753</th>\n",
901
       "      <td>22</td>\n",
902
       "      <td>rs73174435</td>\n",
903
       "      <td>51174939</td>\n",
904
       "      <td>T</td>\n",
905
       "      <td>C</td>\n",
906
       "      <td>4991</td>\n",
907
       "      <td>0.057103</td>\n",
908
       "      <td>53.082400</td>\n",
909
       "      <td>24.8130</td>\n",
910
       "      <td>0.032412</td>\n",
911
       "    </tr>\n",
912
       "    <tr>\n",
913
       "      <th>542754</th>\n",
914
       "      <td>22</td>\n",
915
       "      <td>rs3810648</td>\n",
916
       "      <td>51175626</td>\n",
917
       "      <td>G</td>\n",
918
       "      <td>A</td>\n",
919
       "      <td>4959</td>\n",
920
       "      <td>0.066243</td>\n",
921
       "      <td>17.689800</td>\n",
922
       "      <td>23.2562</td>\n",
923
       "      <td>0.446867</td>\n",
924
       "    </tr>\n",
925
       "    <tr>\n",
926
       "      <th>542755</th>\n",
927
       "      <td>22</td>\n",
928
       "      <td>rs5771002</td>\n",
929
       "      <td>51183255</td>\n",
930
       "      <td>A</td>\n",
931
       "      <td>G</td>\n",
932
       "      <td>4937</td>\n",
933
       "      <td>0.334414</td>\n",
934
       "      <td>-12.170400</td>\n",
935
       "      <td>12.3314</td>\n",
936
       "      <td>0.323670</td>\n",
937
       "    </tr>\n",
938
       "    <tr>\n",
939
       "      <th>542756</th>\n",
940
       "      <td>22</td>\n",
941
       "      <td>rs3865764</td>\n",
942
       "      <td>51185848</td>\n",
943
       "      <td>G</td>\n",
944
       "      <td>A</td>\n",
945
       "      <td>4984</td>\n",
946
       "      <td>0.050662</td>\n",
947
       "      <td>-43.871900</td>\n",
948
       "      <td>26.3007</td>\n",
949
       "      <td>0.095299</td>\n",
950
       "    </tr>\n",
951
       "    <tr>\n",
952
       "      <th>542757</th>\n",
953
       "      <td>22</td>\n",
954
       "      <td>rs142680588</td>\n",
955
       "      <td>51193629</td>\n",
956
       "      <td>G</td>\n",
957
       "      <td>A</td>\n",
958
       "      <td>4994</td>\n",
959
       "      <td>0.073388</td>\n",
960
       "      <td>11.338700</td>\n",
961
       "      <td>22.2066</td>\n",
962
       "      <td>0.609630</td>\n",
963
       "    </tr>\n",
964
       "  </tbody>\n",
965
       "</table>\n",
966
       "<p>542758 rows × 10 columns</p>\n",
967
       "</div>"
968
      ],
969
      "text/plain": [
970
       "        #CHROM           ID       POS A1 A2     N       AF1       BETA  \\\n",
971
       "0            1    rs3131962    756604  A  G  4993  0.129882  14.559400   \n",
972
       "1            1   rs12562034    768448  A  G  4994  0.103124 -15.034400   \n",
973
       "2            1    rs4040617    779322  G  A  4979  0.127435  15.537200   \n",
974
       "3            1   rs79373928    801536  G  T  4996  0.015012  16.142600   \n",
975
       "4            1   rs11240779    808631  G  A  4961  0.222233   0.859838   \n",
976
       "...        ...          ...       ... .. ..   ...       ...        ...   \n",
977
       "542753      22   rs73174435  51174939  T  C  4991  0.057103  53.082400   \n",
978
       "542754      22    rs3810648  51175626  G  A  4959  0.066243  17.689800   \n",
979
       "542755      22    rs5771002  51183255  A  G  4937  0.334414 -12.170400   \n",
980
       "542756      22    rs3865764  51185848  G  A  4984  0.050662 -43.871900   \n",
981
       "542757      22  rs142680588  51193629  G  A  4994  0.073388  11.338700   \n",
982
       "\n",
983
       "             SE         P  \n",
984
       "0       17.1871  0.396933  \n",
985
       "1       19.0234  0.429345  \n",
986
       "2       17.3933  0.371704  \n",
987
       "3       47.7752  0.735448  \n",
988
       "4       13.9158  0.950731  \n",
989
       "...         ...       ...  \n",
990
       "542753  24.8130  0.032412  \n",
991
       "542754  23.2562  0.446867  \n",
992
       "542755  12.3314  0.323670  \n",
993
       "542756  26.3007  0.095299  \n",
994
       "542757  22.2066  0.609630  \n",
995
       "\n",
996
       "[542758 rows x 10 columns]"
997
      ]
998
     },
999
     "execution_count": 1,
1000
     "metadata": {},
1001
     "output_type": "execute_result"
1002
    }
1003
   ],
1004
   "source": [
1005
    "data.load_simulation_gwas('causal', seed = 1) # seed can range from 1-500\n",
1006
    "data.lr_uni"
1007
   ]
1008
  },
1009
  {
1010
   "cell_type": "markdown",
1011
   "metadata": {},
1012
   "source": [
1013
    "Similarly for null simulations, you can load it via:"
1014
   ]
1015
  },
1016
  {
1017
   "cell_type": "code",
1018
   "execution_count": 2,
1019
   "metadata": {},
1020
   "outputs": [
1021
    {
1022
     "name": "stdout",
1023
     "output_type": "stream",
1024
     "text": [
1025
      "Using simulation data....\n"
1026
     ]
1027
    },
1028
    {
1029
     "data": {
1030
      "text/html": [
1031
       "<div>\n",
1032
       "<style scoped>\n",
1033
       "    .dataframe tbody tr th:only-of-type {\n",
1034
       "        vertical-align: middle;\n",
1035
       "    }\n",
1036
       "\n",
1037
       "    .dataframe tbody tr th {\n",
1038
       "        vertical-align: top;\n",
1039
       "    }\n",
1040
       "\n",
1041
       "    .dataframe thead th {\n",
1042
       "        text-align: right;\n",
1043
       "    }\n",
1044
       "</style>\n",
1045
       "<table border=\"1\" class=\"dataframe\">\n",
1046
       "  <thead>\n",
1047
       "    <tr style=\"text-align: right;\">\n",
1048
       "      <th></th>\n",
1049
       "      <th>#CHROM</th>\n",
1050
       "      <th>ID</th>\n",
1051
       "      <th>POS</th>\n",
1052
       "      <th>A1</th>\n",
1053
       "      <th>A2</th>\n",
1054
       "      <th>N</th>\n",
1055
       "      <th>AF1</th>\n",
1056
       "      <th>BETA</th>\n",
1057
       "      <th>SE</th>\n",
1058
       "      <th>P</th>\n",
1059
       "    </tr>\n",
1060
       "  </thead>\n",
1061
       "  <tbody>\n",
1062
       "    <tr>\n",
1063
       "      <th>0</th>\n",
1064
       "      <td>1</td>\n",
1065
       "      <td>rs3131962</td>\n",
1066
       "      <td>756604</td>\n",
1067
       "      <td>A</td>\n",
1068
       "      <td>G</td>\n",
1069
       "      <td>4993</td>\n",
1070
       "      <td>0.129882</td>\n",
1071
       "      <td>-2.960260</td>\n",
1072
       "      <td>7.66276</td>\n",
1073
       "      <td>0.699261</td>\n",
1074
       "    </tr>\n",
1075
       "    <tr>\n",
1076
       "      <th>1</th>\n",
1077
       "      <td>1</td>\n",
1078
       "      <td>rs12562034</td>\n",
1079
       "      <td>768448</td>\n",
1080
       "      <td>A</td>\n",
1081
       "      <td>G</td>\n",
1082
       "      <td>4994</td>\n",
1083
       "      <td>0.103124</td>\n",
1084
       "      <td>-19.335700</td>\n",
1085
       "      <td>8.47710</td>\n",
1086
       "      <td>0.022552</td>\n",
1087
       "    </tr>\n",
1088
       "    <tr>\n",
1089
       "      <th>2</th>\n",
1090
       "      <td>1</td>\n",
1091
       "      <td>rs4040617</td>\n",
1092
       "      <td>779322</td>\n",
1093
       "      <td>G</td>\n",
1094
       "      <td>A</td>\n",
1095
       "      <td>4979</td>\n",
1096
       "      <td>0.127435</td>\n",
1097
       "      <td>-3.287600</td>\n",
1098
       "      <td>7.75475</td>\n",
1099
       "      <td>0.671605</td>\n",
1100
       "    </tr>\n",
1101
       "    <tr>\n",
1102
       "      <th>3</th>\n",
1103
       "      <td>1</td>\n",
1104
       "      <td>rs79373928</td>\n",
1105
       "      <td>801536</td>\n",
1106
       "      <td>G</td>\n",
1107
       "      <td>T</td>\n",
1108
       "      <td>4996</td>\n",
1109
       "      <td>0.015012</td>\n",
1110
       "      <td>-12.530000</td>\n",
1111
       "      <td>21.29860</td>\n",
1112
       "      <td>0.556329</td>\n",
1113
       "    </tr>\n",
1114
       "    <tr>\n",
1115
       "      <th>4</th>\n",
1116
       "      <td>1</td>\n",
1117
       "      <td>rs11240779</td>\n",
1118
       "      <td>808631</td>\n",
1119
       "      <td>G</td>\n",
1120
       "      <td>A</td>\n",
1121
       "      <td>4961</td>\n",
1122
       "      <td>0.222233</td>\n",
1123
       "      <td>-8.564830</td>\n",
1124
       "      <td>6.20273</td>\n",
1125
       "      <td>0.167335</td>\n",
1126
       "    </tr>\n",
1127
       "    <tr>\n",
1128
       "      <th>...</th>\n",
1129
       "      <td>...</td>\n",
1130
       "      <td>...</td>\n",
1131
       "      <td>...</td>\n",
1132
       "      <td>...</td>\n",
1133
       "      <td>...</td>\n",
1134
       "      <td>...</td>\n",
1135
       "      <td>...</td>\n",
1136
       "      <td>...</td>\n",
1137
       "      <td>...</td>\n",
1138
       "      <td>...</td>\n",
1139
       "    </tr>\n",
1140
       "    <tr>\n",
1141
       "      <th>542753</th>\n",
1142
       "      <td>22</td>\n",
1143
       "      <td>rs73174435</td>\n",
1144
       "      <td>51174939</td>\n",
1145
       "      <td>T</td>\n",
1146
       "      <td>C</td>\n",
1147
       "      <td>4991</td>\n",
1148
       "      <td>0.057103</td>\n",
1149
       "      <td>-24.859400</td>\n",
1150
       "      <td>11.06160</td>\n",
1151
       "      <td>0.024617</td>\n",
1152
       "    </tr>\n",
1153
       "    <tr>\n",
1154
       "      <th>542754</th>\n",
1155
       "      <td>22</td>\n",
1156
       "      <td>rs3810648</td>\n",
1157
       "      <td>51175626</td>\n",
1158
       "      <td>G</td>\n",
1159
       "      <td>A</td>\n",
1160
       "      <td>4959</td>\n",
1161
       "      <td>0.066243</td>\n",
1162
       "      <td>-0.725793</td>\n",
1163
       "      <td>10.36870</td>\n",
1164
       "      <td>0.944195</td>\n",
1165
       "    </tr>\n",
1166
       "    <tr>\n",
1167
       "      <th>542755</th>\n",
1168
       "      <td>22</td>\n",
1169
       "      <td>rs5771002</td>\n",
1170
       "      <td>51183255</td>\n",
1171
       "      <td>A</td>\n",
1172
       "      <td>G</td>\n",
1173
       "      <td>4937</td>\n",
1174
       "      <td>0.334414</td>\n",
1175
       "      <td>-5.555300</td>\n",
1176
       "      <td>5.49753</td>\n",
1177
       "      <td>0.312251</td>\n",
1178
       "    </tr>\n",
1179
       "    <tr>\n",
1180
       "      <th>542756</th>\n",
1181
       "      <td>22</td>\n",
1182
       "      <td>rs3865764</td>\n",
1183
       "      <td>51185848</td>\n",
1184
       "      <td>G</td>\n",
1185
       "      <td>A</td>\n",
1186
       "      <td>4984</td>\n",
1187
       "      <td>0.050662</td>\n",
1188
       "      <td>12.588200</td>\n",
1189
       "      <td>11.72730</td>\n",
1190
       "      <td>0.283085</td>\n",
1191
       "    </tr>\n",
1192
       "    <tr>\n",
1193
       "      <th>542757</th>\n",
1194
       "      <td>22</td>\n",
1195
       "      <td>rs142680588</td>\n",
1196
       "      <td>51193629</td>\n",
1197
       "      <td>G</td>\n",
1198
       "      <td>A</td>\n",
1199
       "      <td>4994</td>\n",
1200
       "      <td>0.073388</td>\n",
1201
       "      <td>-13.533700</td>\n",
1202
       "      <td>9.89851</td>\n",
1203
       "      <td>0.171548</td>\n",
1204
       "    </tr>\n",
1205
       "  </tbody>\n",
1206
       "</table>\n",
1207
       "<p>542758 rows × 10 columns</p>\n",
1208
       "</div>"
1209
      ],
1210
      "text/plain": [
1211
       "        #CHROM           ID       POS A1 A2     N       AF1       BETA  \\\n",
1212
       "0            1    rs3131962    756604  A  G  4993  0.129882  -2.960260   \n",
1213
       "1            1   rs12562034    768448  A  G  4994  0.103124 -19.335700   \n",
1214
       "2            1    rs4040617    779322  G  A  4979  0.127435  -3.287600   \n",
1215
       "3            1   rs79373928    801536  G  T  4996  0.015012 -12.530000   \n",
1216
       "4            1   rs11240779    808631  G  A  4961  0.222233  -8.564830   \n",
1217
       "...        ...          ...       ... .. ..   ...       ...        ...   \n",
1218
       "542753      22   rs73174435  51174939  T  C  4991  0.057103 -24.859400   \n",
1219
       "542754      22    rs3810648  51175626  G  A  4959  0.066243  -0.725793   \n",
1220
       "542755      22    rs5771002  51183255  A  G  4937  0.334414  -5.555300   \n",
1221
       "542756      22    rs3865764  51185848  G  A  4984  0.050662  12.588200   \n",
1222
       "542757      22  rs142680588  51193629  G  A  4994  0.073388 -13.533700   \n",
1223
       "\n",
1224
       "              SE         P  \n",
1225
       "0        7.66276  0.699261  \n",
1226
       "1        8.47710  0.022552  \n",
1227
       "2        7.75475  0.671605  \n",
1228
       "3       21.29860  0.556329  \n",
1229
       "4        6.20273  0.167335  \n",
1230
       "...          ...       ...  \n",
1231
       "542753  11.06160  0.024617  \n",
1232
       "542754  10.36870  0.944195  \n",
1233
       "542755   5.49753  0.312251  \n",
1234
       "542756  11.72730  0.283085  \n",
1235
       "542757   9.89851  0.171548  \n",
1236
       "\n",
1237
       "[542758 rows x 10 columns]"
1238
      ]
1239
     },
1240
     "execution_count": 2,
1241
     "metadata": {},
1242
     "output_type": "execute_result"
1243
    }
1244
   ],
1245
   "source": [
1246
    "data.load_simulation_gwas('null', seed = 1)# seed can range from 1-500\n",
1247
    "data.lr_uni"
1248
   ]
1249
  },
1250
  {
1251
   "cell_type": "markdown",
1252
   "metadata": {},
1253
   "source": [
1254
    "Now, for the subsampling analysis, you can load any trait out of the 21 subsampled traits in various sample sizes across 5 replicates. The phenotype list can be accessed via:"
1255
   ]
1256
  },
1257
  {
1258
   "cell_type": "code",
1259
   "execution_count": 3,
1260
   "metadata": {},
1261
   "outputs": [
1262
    {
1263
     "data": {
1264
      "text/plain": [
1265
       "['body_BALDING1',\n",
1266
       " 'disease_ALLERGY_ECZEMA_DIAGNOSED',\n",
1267
       " 'disease_HYPOTHYROIDISM_SELF_REP',\n",
1268
       " 'pigment_SUNBURN',\n",
1269
       " '21001',\n",
1270
       " '50',\n",
1271
       " '30080',\n",
1272
       " '30070',\n",
1273
       " '30010',\n",
1274
       " '30000',\n",
1275
       " 'biochemistry_AlkalinePhosphatase',\n",
1276
       " 'biochemistry_AspartateAminotransferase',\n",
1277
       " 'biochemistry_Cholesterol',\n",
1278
       " 'biochemistry_Creatinine',\n",
1279
       " 'biochemistry_IGF1',\n",
1280
       " 'biochemistry_Phosphate',\n",
1281
       " 'biochemistry_Testosterone_Male',\n",
1282
       " 'biochemistry_TotalBilirubin',\n",
1283
       " 'biochemistry_TotalProtein',\n",
1284
       " 'biochemistry_VitaminD',\n",
1285
       " 'bmd_HEEL_TSCOREz']"
1286
      ]
1287
     },
1288
     "execution_count": 3,
1289
     "metadata": {},
1290
     "output_type": "execute_result"
1291
    }
1292
   ],
1293
   "source": [
1294
    "data.get_pheno_list()['21_indep_traits']"
1295
   ]
1296
  },
1297
  {
1298
   "cell_type": "markdown",
1299
   "metadata": {},
1300
   "source": [
1301
    "Usually each trait has the following sample sizes available: 1000, 2500, 5000, 7500, 10000, 50000, 100000, 200000. For example, to load body_BALDING1 at sample size 1000 at replicate 1, you can use:"
1302
   ]
1303
  },
1304
  {
1305
   "cell_type": "code",
1306
   "execution_count": 4,
1307
   "metadata": {},
1308
   "outputs": [
1309
    {
1310
     "data": {
1311
      "text/html": [
1312
       "<div>\n",
1313
       "<style scoped>\n",
1314
       "    .dataframe tbody tr th:only-of-type {\n",
1315
       "        vertical-align: middle;\n",
1316
       "    }\n",
1317
       "\n",
1318
       "    .dataframe tbody tr th {\n",
1319
       "        vertical-align: top;\n",
1320
       "    }\n",
1321
       "\n",
1322
       "    .dataframe thead th {\n",
1323
       "        text-align: right;\n",
1324
       "    }\n",
1325
       "</style>\n",
1326
       "<table border=\"1\" class=\"dataframe\">\n",
1327
       "  <thead>\n",
1328
       "    <tr style=\"text-align: right;\">\n",
1329
       "      <th></th>\n",
1330
       "      <th>#CHROM</th>\n",
1331
       "      <th>POS</th>\n",
1332
       "      <th>ID</th>\n",
1333
       "      <th>REF</th>\n",
1334
       "      <th>ALT</th>\n",
1335
       "      <th>A1</th>\n",
1336
       "      <th>FIRTH?</th>\n",
1337
       "      <th>TEST</th>\n",
1338
       "      <th>OBS_CT</th>\n",
1339
       "      <th>OR</th>\n",
1340
       "      <th>LOG(OR)_SE</th>\n",
1341
       "      <th>Z_STAT</th>\n",
1342
       "      <th>P</th>\n",
1343
       "      <th>ERRCODE</th>\n",
1344
       "      <th>SNP</th>\n",
1345
       "      <th>A2</th>\n",
1346
       "      <th>N</th>\n",
1347
       "    </tr>\n",
1348
       "  </thead>\n",
1349
       "  <tbody>\n",
1350
       "    <tr>\n",
1351
       "      <th>0</th>\n",
1352
       "      <td>1</td>\n",
1353
       "      <td>756604</td>\n",
1354
       "      <td>rs3131962</td>\n",
1355
       "      <td>G</td>\n",
1356
       "      <td>A</td>\n",
1357
       "      <td>A</td>\n",
1358
       "      <td>Y</td>\n",
1359
       "      <td>ADD</td>\n",
1360
       "      <td>999</td>\n",
1361
       "      <td>1.241130</td>\n",
1362
       "      <td>0.209870</td>\n",
1363
       "      <td>1.029320</td>\n",
1364
       "      <td>0.303330</td>\n",
1365
       "      <td>.</td>\n",
1366
       "      <td>rs3131962</td>\n",
1367
       "      <td>G</td>\n",
1368
       "      <td>999</td>\n",
1369
       "    </tr>\n",
1370
       "    <tr>\n",
1371
       "      <th>1</th>\n",
1372
       "      <td>1</td>\n",
1373
       "      <td>768448</td>\n",
1374
       "      <td>rs12562034</td>\n",
1375
       "      <td>G</td>\n",
1376
       "      <td>A</td>\n",
1377
       "      <td>A</td>\n",
1378
       "      <td>Y</td>\n",
1379
       "      <td>ADD</td>\n",
1380
       "      <td>996</td>\n",
1381
       "      <td>0.433894</td>\n",
1382
       "      <td>0.285912</td>\n",
1383
       "      <td>-2.920330</td>\n",
1384
       "      <td>0.003497</td>\n",
1385
       "      <td>.</td>\n",
1386
       "      <td>rs12562034</td>\n",
1387
       "      <td>G</td>\n",
1388
       "      <td>996</td>\n",
1389
       "    </tr>\n",
1390
       "    <tr>\n",
1391
       "      <th>2</th>\n",
1392
       "      <td>1</td>\n",
1393
       "      <td>779322</td>\n",
1394
       "      <td>rs4040617</td>\n",
1395
       "      <td>A</td>\n",
1396
       "      <td>G</td>\n",
1397
       "      <td>G</td>\n",
1398
       "      <td>Y</td>\n",
1399
       "      <td>ADD</td>\n",
1400
       "      <td>996</td>\n",
1401
       "      <td>1.178310</td>\n",
1402
       "      <td>0.211892</td>\n",
1403
       "      <td>0.774379</td>\n",
1404
       "      <td>0.438707</td>\n",
1405
       "      <td>.</td>\n",
1406
       "      <td>rs4040617</td>\n",
1407
       "      <td>A</td>\n",
1408
       "      <td>996</td>\n",
1409
       "    </tr>\n",
1410
       "    <tr>\n",
1411
       "      <th>3</th>\n",
1412
       "      <td>1</td>\n",
1413
       "      <td>801536</td>\n",
1414
       "      <td>rs79373928</td>\n",
1415
       "      <td>T</td>\n",
1416
       "      <td>G</td>\n",
1417
       "      <td>G</td>\n",
1418
       "      <td>Y</td>\n",
1419
       "      <td>ADD</td>\n",
1420
       "      <td>998</td>\n",
1421
       "      <td>0.989852</td>\n",
1422
       "      <td>0.479159</td>\n",
1423
       "      <td>-0.021286</td>\n",
1424
       "      <td>0.983018</td>\n",
1425
       "      <td>.</td>\n",
1426
       "      <td>rs79373928</td>\n",
1427
       "      <td>T</td>\n",
1428
       "      <td>998</td>\n",
1429
       "    </tr>\n",
1430
       "    <tr>\n",
1431
       "      <th>4</th>\n",
1432
       "      <td>1</td>\n",
1433
       "      <td>808631</td>\n",
1434
       "      <td>rs11240779</td>\n",
1435
       "      <td>A</td>\n",
1436
       "      <td>G</td>\n",
1437
       "      <td>G</td>\n",
1438
       "      <td>Y</td>\n",
1439
       "      <td>ADD</td>\n",
1440
       "      <td>994</td>\n",
1441
       "      <td>0.880382</td>\n",
1442
       "      <td>0.173114</td>\n",
1443
       "      <td>-0.735930</td>\n",
1444
       "      <td>0.461773</td>\n",
1445
       "      <td>.</td>\n",
1446
       "      <td>rs11240779</td>\n",
1447
       "      <td>A</td>\n",
1448
       "      <td>994</td>\n",
1449
       "    </tr>\n",
1450
       "    <tr>\n",
1451
       "      <th>...</th>\n",
1452
       "      <td>...</td>\n",
1453
       "      <td>...</td>\n",
1454
       "      <td>...</td>\n",
1455
       "      <td>...</td>\n",
1456
       "      <td>...</td>\n",
1457
       "      <td>...</td>\n",
1458
       "      <td>...</td>\n",
1459
       "      <td>...</td>\n",
1460
       "      <td>...</td>\n",
1461
       "      <td>...</td>\n",
1462
       "      <td>...</td>\n",
1463
       "      <td>...</td>\n",
1464
       "      <td>...</td>\n",
1465
       "      <td>...</td>\n",
1466
       "      <td>...</td>\n",
1467
       "      <td>...</td>\n",
1468
       "      <td>...</td>\n",
1469
       "    </tr>\n",
1470
       "    <tr>\n",
1471
       "      <th>542753</th>\n",
1472
       "      <td>22</td>\n",
1473
       "      <td>51174939</td>\n",
1474
       "      <td>rs73174435</td>\n",
1475
       "      <td>C</td>\n",
1476
       "      <td>T</td>\n",
1477
       "      <td>T</td>\n",
1478
       "      <td>Y</td>\n",
1479
       "      <td>ADD</td>\n",
1480
       "      <td>999</td>\n",
1481
       "      <td>0.642727</td>\n",
1482
       "      <td>0.362564</td>\n",
1483
       "      <td>-1.219190</td>\n",
1484
       "      <td>0.222772</td>\n",
1485
       "      <td>.</td>\n",
1486
       "      <td>rs73174435</td>\n",
1487
       "      <td>C</td>\n",
1488
       "      <td>999</td>\n",
1489
       "    </tr>\n",
1490
       "    <tr>\n",
1491
       "      <th>542754</th>\n",
1492
       "      <td>22</td>\n",
1493
       "      <td>51175626</td>\n",
1494
       "      <td>rs3810648</td>\n",
1495
       "      <td>A</td>\n",
1496
       "      <td>G</td>\n",
1497
       "      <td>G</td>\n",
1498
       "      <td>Y</td>\n",
1499
       "      <td>ADD</td>\n",
1500
       "      <td>996</td>\n",
1501
       "      <td>0.752885</td>\n",
1502
       "      <td>0.286799</td>\n",
1503
       "      <td>-0.989690</td>\n",
1504
       "      <td>0.322326</td>\n",
1505
       "      <td>.</td>\n",
1506
       "      <td>rs3810648</td>\n",
1507
       "      <td>A</td>\n",
1508
       "      <td>996</td>\n",
1509
       "    </tr>\n",
1510
       "    <tr>\n",
1511
       "      <th>542755</th>\n",
1512
       "      <td>22</td>\n",
1513
       "      <td>51183255</td>\n",
1514
       "      <td>rs5771002</td>\n",
1515
       "      <td>G</td>\n",
1516
       "      <td>A</td>\n",
1517
       "      <td>A</td>\n",
1518
       "      <td>Y</td>\n",
1519
       "      <td>ADD</td>\n",
1520
       "      <td>981</td>\n",
1521
       "      <td>0.792577</td>\n",
1522
       "      <td>0.150356</td>\n",
1523
       "      <td>-1.546100</td>\n",
1524
       "      <td>0.122080</td>\n",
1525
       "      <td>.</td>\n",
1526
       "      <td>rs5771002</td>\n",
1527
       "      <td>G</td>\n",
1528
       "      <td>981</td>\n",
1529
       "    </tr>\n",
1530
       "    <tr>\n",
1531
       "      <th>542756</th>\n",
1532
       "      <td>22</td>\n",
1533
       "      <td>51185848</td>\n",
1534
       "      <td>rs3865764</td>\n",
1535
       "      <td>A</td>\n",
1536
       "      <td>G</td>\n",
1537
       "      <td>G</td>\n",
1538
       "      <td>Y</td>\n",
1539
       "      <td>ADD</td>\n",
1540
       "      <td>996</td>\n",
1541
       "      <td>1.004930</td>\n",
1542
       "      <td>0.386700</td>\n",
1543
       "      <td>0.012715</td>\n",
1544
       "      <td>0.989855</td>\n",
1545
       "      <td>.</td>\n",
1546
       "      <td>rs3865764</td>\n",
1547
       "      <td>A</td>\n",
1548
       "      <td>996</td>\n",
1549
       "    </tr>\n",
1550
       "    <tr>\n",
1551
       "      <th>542757</th>\n",
1552
       "      <td>22</td>\n",
1553
       "      <td>51193629</td>\n",
1554
       "      <td>rs142680588</td>\n",
1555
       "      <td>A</td>\n",
1556
       "      <td>G</td>\n",
1557
       "      <td>G</td>\n",
1558
       "      <td>Y</td>\n",
1559
       "      <td>ADD</td>\n",
1560
       "      <td>1000</td>\n",
1561
       "      <td>1.497360</td>\n",
1562
       "      <td>0.267489</td>\n",
1563
       "      <td>1.509230</td>\n",
1564
       "      <td>0.131240</td>\n",
1565
       "      <td>.</td>\n",
1566
       "      <td>rs142680588</td>\n",
1567
       "      <td>A</td>\n",
1568
       "      <td>1000</td>\n",
1569
       "    </tr>\n",
1570
       "  </tbody>\n",
1571
       "</table>\n",
1572
       "<p>542758 rows × 17 columns</p>\n",
1573
       "</div>"
1574
      ],
1575
      "text/plain": [
1576
       "        #CHROM       POS           ID REF ALT A1 FIRTH? TEST  OBS_CT  \\\n",
1577
       "0            1    756604    rs3131962   G   A  A      Y  ADD     999   \n",
1578
       "1            1    768448   rs12562034   G   A  A      Y  ADD     996   \n",
1579
       "2            1    779322    rs4040617   A   G  G      Y  ADD     996   \n",
1580
       "3            1    801536   rs79373928   T   G  G      Y  ADD     998   \n",
1581
       "4            1    808631   rs11240779   A   G  G      Y  ADD     994   \n",
1582
       "...        ...       ...          ...  ..  .. ..    ...  ...     ...   \n",
1583
       "542753      22  51174939   rs73174435   C   T  T      Y  ADD     999   \n",
1584
       "542754      22  51175626    rs3810648   A   G  G      Y  ADD     996   \n",
1585
       "542755      22  51183255    rs5771002   G   A  A      Y  ADD     981   \n",
1586
       "542756      22  51185848    rs3865764   A   G  G      Y  ADD     996   \n",
1587
       "542757      22  51193629  rs142680588   A   G  G      Y  ADD    1000   \n",
1588
       "\n",
1589
       "              OR  LOG(OR)_SE    Z_STAT         P ERRCODE          SNP A2     N  \n",
1590
       "0       1.241130    0.209870  1.029320  0.303330       .    rs3131962  G   999  \n",
1591
       "1       0.433894    0.285912 -2.920330  0.003497       .   rs12562034  G   996  \n",
1592
       "2       1.178310    0.211892  0.774379  0.438707       .    rs4040617  A   996  \n",
1593
       "3       0.989852    0.479159 -0.021286  0.983018       .   rs79373928  T   998  \n",
1594
       "4       0.880382    0.173114 -0.735930  0.461773       .   rs11240779  A   994  \n",
1595
       "...          ...         ...       ...       ...     ...          ... ..   ...  \n",
1596
       "542753  0.642727    0.362564 -1.219190  0.222772       .   rs73174435  C   999  \n",
1597
       "542754  0.752885    0.286799 -0.989690  0.322326       .    rs3810648  A   996  \n",
1598
       "542755  0.792577    0.150356 -1.546100  0.122080       .    rs5771002  G   981  \n",
1599
       "542756  1.004930    0.386700  0.012715  0.989855       .    rs3865764  A   996  \n",
1600
       "542757  1.497360    0.267489  1.509230  0.131240       .  rs142680588  A  1000  \n",
1601
       "\n",
1602
       "[542758 rows x 17 columns]"
1603
      ]
1604
     },
1605
     "execution_count": 4,
1606
     "metadata": {},
1607
     "output_type": "execute_result"
1608
    }
1609
   ],
1610
   "source": [
1611
    "data.load_gwas_subsample(pheno = 'body_BALDING1', sample_size = 1000, seed = 1)\n",
1612
    "data.lr_uni"
1613
   ]
1614
  },
1615
  {
1616
   "cell_type": "markdown",
1617
   "metadata": {},
1618
   "source": [
1619
    "You can also load the full cohort GWAS for these 21 traits via:"
1620
   ]
1621
  },
1622
  {
1623
   "cell_type": "code",
1624
   "execution_count": null,
1625
   "metadata": {},
1626
   "outputs": [
1627
    {
1628
     "data": {
1629
      "text/html": [
1630
       "<div>\n",
1631
       "<style scoped>\n",
1632
       "    .dataframe tbody tr th:only-of-type {\n",
1633
       "        vertical-align: middle;\n",
1634
       "    }\n",
1635
       "\n",
1636
       "    .dataframe tbody tr th {\n",
1637
       "        vertical-align: top;\n",
1638
       "    }\n",
1639
       "\n",
1640
       "    .dataframe thead th {\n",
1641
       "        text-align: right;\n",
1642
       "    }\n",
1643
       "</style>\n",
1644
       "<table border=\"1\" class=\"dataframe\">\n",
1645
       "  <thead>\n",
1646
       "    <tr style=\"text-align: right;\">\n",
1647
       "      <th></th>\n",
1648
       "      <th>#CHROM</th>\n",
1649
       "      <th>ID</th>\n",
1650
       "      <th>POS</th>\n",
1651
       "      <th>A1</th>\n",
1652
       "      <th>A2</th>\n",
1653
       "      <th>N</th>\n",
1654
       "      <th>AF1</th>\n",
1655
       "      <th>BETA</th>\n",
1656
       "      <th>SE</th>\n",
1657
       "      <th>P</th>\n",
1658
       "    </tr>\n",
1659
       "  </thead>\n",
1660
       "  <tbody>\n",
1661
       "    <tr>\n",
1662
       "      <th>0</th>\n",
1663
       "      <td>1</td>\n",
1664
       "      <td>rs3131962</td>\n",
1665
       "      <td>756604</td>\n",
1666
       "      <td>A</td>\n",
1667
       "      <td>G</td>\n",
1668
       "      <td>407023</td>\n",
1669
       "      <td>0.129655</td>\n",
1670
       "      <td>0.000286</td>\n",
1671
       "      <td>0.001048</td>\n",
1672
       "      <td>0.784760</td>\n",
1673
       "    </tr>\n",
1674
       "    <tr>\n",
1675
       "      <th>1</th>\n",
1676
       "      <td>1</td>\n",
1677
       "      <td>rs12562034</td>\n",
1678
       "      <td>768448</td>\n",
1679
       "      <td>A</td>\n",
1680
       "      <td>G</td>\n",
1681
       "      <td>407057</td>\n",
1682
       "      <td>0.104966</td>\n",
1683
       "      <td>-0.001491</td>\n",
1684
       "      <td>0.001147</td>\n",
1685
       "      <td>0.193592</td>\n",
1686
       "    </tr>\n",
1687
       "    <tr>\n",
1688
       "      <th>2</th>\n",
1689
       "      <td>1</td>\n",
1690
       "      <td>rs4040617</td>\n",
1691
       "      <td>779322</td>\n",
1692
       "      <td>G</td>\n",
1693
       "      <td>A</td>\n",
1694
       "      <td>406623</td>\n",
1695
       "      <td>0.127520</td>\n",
1696
       "      <td>0.000108</td>\n",
1697
       "      <td>0.001056</td>\n",
1698
       "      <td>0.918404</td>\n",
1699
       "    </tr>\n",
1700
       "    <tr>\n",
1701
       "      <th>3</th>\n",
1702
       "      <td>1</td>\n",
1703
       "      <td>rs79373928</td>\n",
1704
       "      <td>801536</td>\n",
1705
       "      <td>G</td>\n",
1706
       "      <td>T</td>\n",
1707
       "      <td>407517</td>\n",
1708
       "      <td>0.014884</td>\n",
1709
       "      <td>0.004382</td>\n",
1710
       "      <td>0.002904</td>\n",
1711
       "      <td>0.131349</td>\n",
1712
       "    </tr>\n",
1713
       "    <tr>\n",
1714
       "      <th>4</th>\n",
1715
       "      <td>1</td>\n",
1716
       "      <td>rs11240779</td>\n",
1717
       "      <td>808631</td>\n",
1718
       "      <td>G</td>\n",
1719
       "      <td>A</td>\n",
1720
       "      <td>404493</td>\n",
1721
       "      <td>0.224886</td>\n",
1722
       "      <td>-0.001155</td>\n",
1723
       "      <td>0.000846</td>\n",
1724
       "      <td>0.172345</td>\n",
1725
       "    </tr>\n",
1726
       "    <tr>\n",
1727
       "      <th>...</th>\n",
1728
       "      <td>...</td>\n",
1729
       "      <td>...</td>\n",
1730
       "      <td>...</td>\n",
1731
       "      <td>...</td>\n",
1732
       "      <td>...</td>\n",
1733
       "      <td>...</td>\n",
1734
       "      <td>...</td>\n",
1735
       "      <td>...</td>\n",
1736
       "      <td>...</td>\n",
1737
       "      <td>...</td>\n",
1738
       "    </tr>\n",
1739
       "    <tr>\n",
1740
       "      <th>542753</th>\n",
1741
       "      <td>22</td>\n",
1742
       "      <td>rs73174435</td>\n",
1743
       "      <td>51174939</td>\n",
1744
       "      <td>T</td>\n",
1745
       "      <td>C</td>\n",
1746
       "      <td>407201</td>\n",
1747
       "      <td>0.053846</td>\n",
1748
       "      <td>-0.001980</td>\n",
1749
       "      <td>0.001559</td>\n",
1750
       "      <td>0.203959</td>\n",
1751
       "    </tr>\n",
1752
       "    <tr>\n",
1753
       "      <th>542754</th>\n",
1754
       "      <td>22</td>\n",
1755
       "      <td>rs3810648</td>\n",
1756
       "      <td>51175626</td>\n",
1757
       "      <td>G</td>\n",
1758
       "      <td>A</td>\n",
1759
       "      <td>404901</td>\n",
1760
       "      <td>0.060979</td>\n",
1761
       "      <td>0.001922</td>\n",
1762
       "      <td>0.001474</td>\n",
1763
       "      <td>0.192116</td>\n",
1764
       "    </tr>\n",
1765
       "    <tr>\n",
1766
       "      <th>542755</th>\n",
1767
       "      <td>22</td>\n",
1768
       "      <td>rs5771002</td>\n",
1769
       "      <td>51183255</td>\n",
1770
       "      <td>A</td>\n",
1771
       "      <td>G</td>\n",
1772
       "      <td>401398</td>\n",
1773
       "      <td>0.333603</td>\n",
1774
       "      <td>-0.000165</td>\n",
1775
       "      <td>0.000751</td>\n",
1776
       "      <td>0.826494</td>\n",
1777
       "    </tr>\n",
1778
       "    <tr>\n",
1779
       "      <th>542756</th>\n",
1780
       "      <td>22</td>\n",
1781
       "      <td>rs3865764</td>\n",
1782
       "      <td>51185848</td>\n",
1783
       "      <td>G</td>\n",
1784
       "      <td>A</td>\n",
1785
       "      <td>406611</td>\n",
1786
       "      <td>0.050601</td>\n",
1787
       "      <td>-0.001311</td>\n",
1788
       "      <td>0.001605</td>\n",
1789
       "      <td>0.413994</td>\n",
1790
       "    </tr>\n",
1791
       "    <tr>\n",
1792
       "      <th>542757</th>\n",
1793
       "      <td>22</td>\n",
1794
       "      <td>rs142680588</td>\n",
1795
       "      <td>51193629</td>\n",
1796
       "      <td>G</td>\n",
1797
       "      <td>A</td>\n",
1798
       "      <td>407108</td>\n",
1799
       "      <td>0.075912</td>\n",
1800
       "      <td>-0.002861</td>\n",
1801
       "      <td>0.001329</td>\n",
1802
       "      <td>0.031362</td>\n",
1803
       "    </tr>\n",
1804
       "  </tbody>\n",
1805
       "</table>\n",
1806
       "<p>542758 rows × 10 columns</p>\n",
1807
       "</div>"
1808
      ],
1809
      "text/plain": [
1810
       "        #CHROM           ID       POS A1 A2       N       AF1      BETA  \\\n",
1811
       "0            1    rs3131962    756604  A  G  407023  0.129655  0.000286   \n",
1812
       "1            1   rs12562034    768448  A  G  407057  0.104966 -0.001491   \n",
1813
       "2            1    rs4040617    779322  G  A  406623  0.127520  0.000108   \n",
1814
       "3            1   rs79373928    801536  G  T  407517  0.014884  0.004382   \n",
1815
       "4            1   rs11240779    808631  G  A  404493  0.224886 -0.001155   \n",
1816
       "...        ...          ...       ... .. ..     ...       ...       ...   \n",
1817
       "542753      22   rs73174435  51174939  T  C  407201  0.053846 -0.001980   \n",
1818
       "542754      22    rs3810648  51175626  G  A  404901  0.060979  0.001922   \n",
1819
       "542755      22    rs5771002  51183255  A  G  401398  0.333603 -0.000165   \n",
1820
       "542756      22    rs3865764  51185848  G  A  406611  0.050601 -0.001311   \n",
1821
       "542757      22  rs142680588  51193629  G  A  407108  0.075912 -0.002861   \n",
1822
       "\n",
1823
       "              SE         P  \n",
1824
       "0       0.001048  0.784760  \n",
1825
       "1       0.001147  0.193592  \n",
1826
       "2       0.001056  0.918404  \n",
1827
       "3       0.002904  0.131349  \n",
1828
       "4       0.000846  0.172345  \n",
1829
       "...          ...       ...  \n",
1830
       "542753  0.001559  0.203959  \n",
1831
       "542754  0.001474  0.192116  \n",
1832
       "542755  0.000751  0.826494  \n",
1833
       "542756  0.001605  0.413994  \n",
1834
       "542757  0.001329  0.031362  \n",
1835
       "\n",
1836
       "[542758 rows x 10 columns]"
1837
      ]
1838
     },
1839
     "execution_count": 7,
1840
     "metadata": {},
1841
     "output_type": "execute_result"
1842
    }
1843
   ],
1844
   "source": [
1845
    "data.load_full_gwas(pheno = 'body_BALDING1')\n",
1846
    "data.lr_uni"
1847
   ]
1848
  },
1849
  {
1850
   "cell_type": "markdown",
1851
   "metadata": {},
1852
   "source": [
1853
    "This is the basic KGWAS interface! Check out the other notebooks for other capabilities of KGWAS!"
1854
   ]
1855
  }
1856
 ],
1857
 "metadata": {
1858
  "kernelspec": {
1859
   "display_name": "a100_env",
1860
   "language": "python",
1861
   "name": "python3"
1862
  },
1863
  "language_info": {
1864
   "codemirror_mode": {
1865
    "name": "ipython",
1866
    "version": 3
1867
   },
1868
   "file_extension": ".py",
1869
   "mimetype": "text/x-python",
1870
   "name": "python",
1871
   "nbconvert_exporter": "python",
1872
   "pygments_lexer": "ipython3",
1873
   "version": "3.8.0"
1874
  }
1875
 },
1876
 "nbformat": 4,
1877
 "nbformat_minor": 2
1878
}