1824 lines (1823 with data), 69.3 kB
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Building a Drug Response Report\n",
"\n",
"For this notebook, you are going to focus on a single region in the genome, defined as chromosome 22, for all 2,548 samples in the Thousand Genomes dataset. As chromosome 22 was the first chromosome to be sequenced as part of the Human Genome Project, it is your first here as well. \n",
"\n",
"What small molecules/drugs are most likely to affect a subpopulation of individuals (ancestry, age, etc.) based on their genomic information?\n",
"\n",
"In this query, assume that you have some phenotype data about your population. In this case, also assume that all samples sharing the pattern “NA12” are part of a specific demographic.\n",
"\n",
"**NOTE: Declare the names of the \"variant\" and \"annotation\" tables in the \"Define Variables\" section based on the names given at the \"Create Resource Link\" stage of the solution**\n",
"\n",
"In this query, use sampleid as your predicate pushdown. The general steps are:\n",
"\n",
"1. Filter by the samples in your subpopulation\n",
"2. Aggregate variant frequencies for the subpopulation-of-interest\n",
"3. Join on ClinVar dataset\n",
"4. Filter by variants that have been implicated in drug-response\n",
"5. Order by highest frequency variants\n",
"\n",
"The raw clinvar data and a parquet version of chromosome 22 of 1000 genomes, partitioned by sample id, are in your data lake. You also have a VCF in your data lake for chromosome 22 of 1000 genomes."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Import Dependencies"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import boto3, os\n",
"\n",
"s3 = boto3.resource('s3')\n",
"glue = boto3.client('glue')\n",
"cfn = boto3.client('cloudformation')"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[33mWARNING: You are using pip version 22.0.4; however, version 22.3.1 is available.\r\n",
"You should consider upgrading via the '/home/ec2-user/anaconda3/envs/python3/bin/python -m pip install --upgrade pip' command.\u001b[0m\u001b[33m\r\n",
"\u001b[0m"
]
}
],
"source": [
"import sys\n",
"!{sys.executable} -m pip install PyAthena --quiet"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [],
"source": [
"from pyathena import connect\n",
"import pandas as pd\n",
"from pyathena.pandas.util import as_pandas"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Define Variables"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"us-east-1\n",
"Project Name: GenomicsAnalysis\n",
"Database Name: genomicsanalysis\n",
"Workgroup Name: genomicsanalysis-us-east-1\n",
"Data lake bucket: genomicsanalysis-pipeline-datalakebucket-mdr2tq03e5w4\n"
]
}
],
"source": [
"import jmespath\n",
"\n",
"session = boto3.session.Session()\n",
"region = session.region_name\n",
"print(region)\n",
"\n",
"project_name = os.environ.get('RESOURCE_PREFIX')\n",
"database_name = project_name.lower()\n",
"work_group_name = project_name.lower() + '-' + region\n",
"print(f'Project Name: {project_name}')\n",
"print(f'Database Name: {database_name}')\n",
"print(f'Workgroup Name: {work_group_name}')\n",
"\n",
"resources = cfn.describe_stacks(StackName='{0}-Pipeline'.format(project_name))\n",
"query = 'Stacks[].Outputs[?OutputKey==`DataLakeBucket`].OutputValue'\n",
"data_lake_bucket = path = jmespath.search(query, resources)[0][0]\n",
"print(f'Data lake bucket: {data_lake_bucket}')\n",
"\n",
"variant_table_name = 'variants'\n",
"annotation_table_name = 'annotations'\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create drug response result set"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>genotypefrequency</th>\n",
" <th>rs_id</th>\n",
" <th>clinvar_disease_name</th>\n",
" <th>clinical_significance</th>\n",
" <th>contigname</th>\n",
" <th>start</th>\n",
" <th>end</th>\n",
" <th>referenceallele</th>\n",
" <th>alternatealleles</th>\n",
" <th>calls</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1.000000</td>\n",
" <td>6267</td>\n",
" <td>Schizophrenia,_susceptibility_to|Tramadol_resp...</td>\n",
" <td>drug_response|_risk_factor</td>\n",
" <td>22</td>\n",
" <td>19962739</td>\n",
" <td>19962740</td>\n",
" <td>G</td>\n",
" <td>[T]</td>\n",
" <td>[0, 0]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1.000000</td>\n",
" <td>554056486</td>\n",
" <td>Tramadol_response</td>\n",
" <td>drug_response</td>\n",
" <td>22</td>\n",
" <td>19970008</td>\n",
" <td>19970009</td>\n",
" <td>G</td>\n",
" <td>[A]</td>\n",
" <td>[0, 0]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1.000000</td>\n",
" <td>544846648</td>\n",
" <td>Tramadol_response</td>\n",
" <td>drug_response</td>\n",
" <td>22</td>\n",
" <td>19962831</td>\n",
" <td>19962832</td>\n",
" <td>C</td>\n",
" <td>[T]</td>\n",
" <td>[0, 0]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1.000000</td>\n",
" <td>11569716</td>\n",
" <td>Tramadol_response</td>\n",
" <td>drug_response</td>\n",
" <td>22</td>\n",
" <td>19962223</td>\n",
" <td>19962224</td>\n",
" <td>T</td>\n",
" <td>[C]</td>\n",
" <td>[0, 0]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1.000000</td>\n",
" <td>201225516</td>\n",
" <td>Tramadol_response</td>\n",
" <td>drug_response</td>\n",
" <td>22</td>\n",
" <td>19964186</td>\n",
" <td>19964187</td>\n",
" <td>C</td>\n",
" <td>[T]</td>\n",
" <td>[0, 0]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>1.000000</td>\n",
" <td>561536243</td>\n",
" <td>Tramadol_response</td>\n",
" <td>drug_response</td>\n",
" <td>22</td>\n",
" <td>19962540</td>\n",
" <td>19962541</td>\n",
" <td>G</td>\n",
" <td>[A]</td>\n",
" <td>[0, 0]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>1.000000</td>\n",
" <td>548235125</td>\n",
" <td>Tramadol_response</td>\n",
" <td>drug_response</td>\n",
" <td>22</td>\n",
" <td>19969904</td>\n",
" <td>19969905</td>\n",
" <td>T</td>\n",
" <td>[C]</td>\n",
" <td>[0, 0]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>0.984615</td>\n",
" <td>35481270</td>\n",
" <td>Tramadol_response</td>\n",
" <td>drug_response</td>\n",
" <td>22</td>\n",
" <td>19969443</td>\n",
" <td>19969444</td>\n",
" <td>C</td>\n",
" <td>[T]</td>\n",
" <td>[0, 0]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>0.969231</td>\n",
" <td>188159376</td>\n",
" <td>Tramadol_response</td>\n",
" <td>drug_response</td>\n",
" <td>22</td>\n",
" <td>19951706</td>\n",
" <td>19951707</td>\n",
" <td>C</td>\n",
" <td>[T]</td>\n",
" <td>[0, 0]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>0.953846</td>\n",
" <td>35478083</td>\n",
" <td>Tramadol_response</td>\n",
" <td>drug_response</td>\n",
" <td>22</td>\n",
" <td>19969361</td>\n",
" <td>19969362</td>\n",
" <td>T</td>\n",
" <td>[C]</td>\n",
" <td>[0, 0]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>0.923077</td>\n",
" <td>36082074</td>\n",
" <td>Tramadol_response</td>\n",
" <td>drug_response</td>\n",
" <td>22</td>\n",
" <td>19969339</td>\n",
" <td>19969340</td>\n",
" <td>G</td>\n",
" <td>[A]</td>\n",
" <td>[0, 0]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>0.892308</td>\n",
" <td>165728</td>\n",
" <td>Tramadol_response</td>\n",
" <td>drug_response</td>\n",
" <td>22</td>\n",
" <td>19969499</td>\n",
" <td>19969500</td>\n",
" <td>C</td>\n",
" <td>[T]</td>\n",
" <td>[1, 1]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>0.446154</td>\n",
" <td>165599</td>\n",
" <td>Tramadol_response</td>\n",
" <td>drug_response</td>\n",
" <td>22</td>\n",
" <td>19969257</td>\n",
" <td>19969258</td>\n",
" <td>G</td>\n",
" <td>[A]</td>\n",
" <td>[1, 1]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>0.230769</td>\n",
" <td>165599</td>\n",
" <td>Tramadol_response</td>\n",
" <td>drug_response</td>\n",
" <td>22</td>\n",
" <td>19969257</td>\n",
" <td>19969258</td>\n",
" <td>G</td>\n",
" <td>[A]</td>\n",
" <td>[1, 0]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>0.215385</td>\n",
" <td>165599</td>\n",
" <td>Tramadol_response</td>\n",
" <td>drug_response</td>\n",
" <td>22</td>\n",
" <td>19969257</td>\n",
" <td>19969258</td>\n",
" <td>G</td>\n",
" <td>[A]</td>\n",
" <td>[0, 1]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>0.107692</td>\n",
" <td>165599</td>\n",
" <td>Tramadol_response</td>\n",
" <td>drug_response</td>\n",
" <td>22</td>\n",
" <td>19969257</td>\n",
" <td>19969258</td>\n",
" <td>G</td>\n",
" <td>[A]</td>\n",
" <td>[0, 0]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>0.046154</td>\n",
" <td>165728</td>\n",
" <td>Tramadol_response</td>\n",
" <td>drug_response</td>\n",
" <td>22</td>\n",
" <td>19969499</td>\n",
" <td>19969500</td>\n",
" <td>C</td>\n",
" <td>[T]</td>\n",
" <td>[0, 1]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>0.046154</td>\n",
" <td>165728</td>\n",
" <td>Tramadol_response</td>\n",
" <td>drug_response</td>\n",
" <td>22</td>\n",
" <td>19969499</td>\n",
" <td>19969500</td>\n",
" <td>C</td>\n",
" <td>[T]</td>\n",
" <td>[1, 0]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>0.046154</td>\n",
" <td>36082074</td>\n",
" <td>Tramadol_response</td>\n",
" <td>drug_response</td>\n",
" <td>22</td>\n",
" <td>19969339</td>\n",
" <td>19969340</td>\n",
" <td>G</td>\n",
" <td>[A]</td>\n",
" <td>[1, 0]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>0.030769</td>\n",
" <td>36082074</td>\n",
" <td>Tramadol_response</td>\n",
" <td>drug_response</td>\n",
" <td>22</td>\n",
" <td>19969339</td>\n",
" <td>19969340</td>\n",
" <td>G</td>\n",
" <td>[A]</td>\n",
" <td>[0, 1]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td>0.030769</td>\n",
" <td>35478083</td>\n",
" <td>Tramadol_response</td>\n",
" <td>drug_response</td>\n",
" <td>22</td>\n",
" <td>19969361</td>\n",
" <td>19969362</td>\n",
" <td>T</td>\n",
" <td>[C]</td>\n",
" <td>[0, 1]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21</th>\n",
" <td>0.015385</td>\n",
" <td>165728</td>\n",
" <td>Tramadol_response</td>\n",
" <td>drug_response</td>\n",
" <td>22</td>\n",
" <td>19969499</td>\n",
" <td>19969500</td>\n",
" <td>C</td>\n",
" <td>[T]</td>\n",
" <td>[0, 0]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22</th>\n",
" <td>0.015385</td>\n",
" <td>188159376</td>\n",
" <td>Tramadol_response</td>\n",
" <td>drug_response</td>\n",
" <td>22</td>\n",
" <td>19951706</td>\n",
" <td>19951707</td>\n",
" <td>C</td>\n",
" <td>[T]</td>\n",
" <td>[0, 1]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23</th>\n",
" <td>0.015385</td>\n",
" <td>188159376</td>\n",
" <td>Tramadol_response</td>\n",
" <td>drug_response</td>\n",
" <td>22</td>\n",
" <td>19951706</td>\n",
" <td>19951707</td>\n",
" <td>C</td>\n",
" <td>[T]</td>\n",
" <td>[1, 0]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>24</th>\n",
" <td>0.015385</td>\n",
" <td>35478083</td>\n",
" <td>Tramadol_response</td>\n",
" <td>drug_response</td>\n",
" <td>22</td>\n",
" <td>19969361</td>\n",
" <td>19969362</td>\n",
" <td>T</td>\n",
" <td>[C]</td>\n",
" <td>[1, 0]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25</th>\n",
" <td>0.015385</td>\n",
" <td>35481270</td>\n",
" <td>Tramadol_response</td>\n",
" <td>drug_response</td>\n",
" <td>22</td>\n",
" <td>19969443</td>\n",
" <td>19969444</td>\n",
" <td>C</td>\n",
" <td>[T]</td>\n",
" <td>[1, 0]</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" genotypefrequency rs_id \\\n",
"0 1.000000 6267 \n",
"1 1.000000 554056486 \n",
"2 1.000000 544846648 \n",
"3 1.000000 11569716 \n",
"4 1.000000 201225516 \n",
"5 1.000000 561536243 \n",
"6 1.000000 548235125 \n",
"7 0.984615 35481270 \n",
"8 0.969231 188159376 \n",
"9 0.953846 35478083 \n",
"10 0.923077 36082074 \n",
"11 0.892308 165728 \n",
"12 0.446154 165599 \n",
"13 0.230769 165599 \n",
"14 0.215385 165599 \n",
"15 0.107692 165599 \n",
"16 0.046154 165728 \n",
"17 0.046154 165728 \n",
"18 0.046154 36082074 \n",
"19 0.030769 36082074 \n",
"20 0.030769 35478083 \n",
"21 0.015385 165728 \n",
"22 0.015385 188159376 \n",
"23 0.015385 188159376 \n",
"24 0.015385 35478083 \n",
"25 0.015385 35481270 \n",
"\n",
" clinvar_disease_name \\\n",
"0 Schizophrenia,_susceptibility_to|Tramadol_resp... \n",
"1 Tramadol_response \n",
"2 Tramadol_response \n",
"3 Tramadol_response \n",
"4 Tramadol_response \n",
"5 Tramadol_response \n",
"6 Tramadol_response \n",
"7 Tramadol_response \n",
"8 Tramadol_response \n",
"9 Tramadol_response \n",
"10 Tramadol_response \n",
"11 Tramadol_response \n",
"12 Tramadol_response \n",
"13 Tramadol_response \n",
"14 Tramadol_response \n",
"15 Tramadol_response \n",
"16 Tramadol_response \n",
"17 Tramadol_response \n",
"18 Tramadol_response \n",
"19 Tramadol_response \n",
"20 Tramadol_response \n",
"21 Tramadol_response \n",
"22 Tramadol_response \n",
"23 Tramadol_response \n",
"24 Tramadol_response \n",
"25 Tramadol_response \n",
"\n",
" clinical_significance contigname start end referenceallele \\\n",
"0 drug_response|_risk_factor 22 19962739 19962740 G \n",
"1 drug_response 22 19970008 19970009 G \n",
"2 drug_response 22 19962831 19962832 C \n",
"3 drug_response 22 19962223 19962224 T \n",
"4 drug_response 22 19964186 19964187 C \n",
"5 drug_response 22 19962540 19962541 G \n",
"6 drug_response 22 19969904 19969905 T \n",
"7 drug_response 22 19969443 19969444 C \n",
"8 drug_response 22 19951706 19951707 C \n",
"9 drug_response 22 19969361 19969362 T \n",
"10 drug_response 22 19969339 19969340 G \n",
"11 drug_response 22 19969499 19969500 C \n",
"12 drug_response 22 19969257 19969258 G \n",
"13 drug_response 22 19969257 19969258 G \n",
"14 drug_response 22 19969257 19969258 G \n",
"15 drug_response 22 19969257 19969258 G \n",
"16 drug_response 22 19969499 19969500 C \n",
"17 drug_response 22 19969499 19969500 C \n",
"18 drug_response 22 19969339 19969340 G \n",
"19 drug_response 22 19969339 19969340 G \n",
"20 drug_response 22 19969361 19969362 T \n",
"21 drug_response 22 19969499 19969500 C \n",
"22 drug_response 22 19951706 19951707 C \n",
"23 drug_response 22 19951706 19951707 C \n",
"24 drug_response 22 19969361 19969362 T \n",
"25 drug_response 22 19969443 19969444 C \n",
"\n",
" alternatealleles calls \n",
"0 [T] [0, 0] \n",
"1 [A] [0, 0] \n",
"2 [T] [0, 0] \n",
"3 [C] [0, 0] \n",
"4 [T] [0, 0] \n",
"5 [A] [0, 0] \n",
"6 [C] [0, 0] \n",
"7 [T] [0, 0] \n",
"8 [T] [0, 0] \n",
"9 [C] [0, 0] \n",
"10 [A] [0, 0] \n",
"11 [T] [1, 1] \n",
"12 [A] [1, 1] \n",
"13 [A] [1, 0] \n",
"14 [A] [0, 1] \n",
"15 [A] [0, 0] \n",
"16 [T] [0, 1] \n",
"17 [T] [1, 0] \n",
"18 [A] [1, 0] \n",
"19 [A] [0, 1] \n",
"20 [C] [0, 1] \n",
"21 [T] [0, 0] \n",
"22 [T] [0, 1] \n",
"23 [T] [1, 0] \n",
"24 [C] [1, 0] \n",
"25 [T] [1, 0] "
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"conn = connect(s3_staging_dir='s3://%s/results/drug_response' % data_lake_bucket, region_name=region, schema_name=database_name)\n",
"cursor = conn.cursor(work_group=work_group_name)\n",
"query = f\"\"\"\n",
"SELECT count(*)/cast(numsamples AS DOUBLE) AS genotypefrequency \n",
" ,cv.attributes['RS'] as rs_id\n",
" ,cv.attributes['CLNDN'] as clinvar_disease_name\n",
" ,cv.attributes['CLNSIG'] as clinical_significance\n",
" ,sv.contigname\n",
" ,sv.start\n",
" ,sv.\"end\"\n",
" ,sv.referenceallele\n",
" ,sv.alternatealleles\n",
" ,sv.calls\n",
" FROM {variant_table_name} sv \n",
" CROSS JOIN \n",
" (SELECT count(1) AS numsamples \n",
" FROM \n",
" (SELECT DISTINCT vs.sampleid \n",
" FROM {variant_table_name} vs\n",
" WHERE vs.sampleid LIKE 'NA12%')) \n",
" JOIN {annotation_table_name} cv \n",
" ON sv.contigname = cv.contigname \n",
" AND sv.start = cv.start \n",
" AND sv.\"end\" = cv.\"end\" \n",
" AND sv.referenceallele = cv.referenceallele \n",
" AND sv.alternatealleles = cv.alternatealleles\n",
" AND cv.attributes['CLNSIG'] LIKE '%response%' \n",
" AND sv.sampleid LIKE 'NA12%' \n",
" GROUP BY sv.contigname \n",
" ,sv.start \n",
" ,sv.\"end\" \n",
" ,sv.referenceallele \n",
" ,sv.alternatealleles\n",
" ,sv.calls\n",
" ,cv.attributes['RS']\n",
" ,cv.attributes['CLNDN']\n",
" ,cv.attributes['CLNSIG'] \n",
" ,numsamples \n",
" ORDER BY genotypefrequency DESC LIMIT 50 \n",
" \"\"\"\n",
"cursor.execute(query)\n",
"\n",
"df = as_pandas(cursor)\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Query annotation dataset"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>importjobid</th>\n",
" <th>contigname</th>\n",
" <th>start</th>\n",
" <th>end</th>\n",
" <th>names</th>\n",
" <th>referenceallele</th>\n",
" <th>alternatealleles</th>\n",
" <th>qual</th>\n",
" <th>filters</th>\n",
" <th>splitfrommultiallelic</th>\n",
" <th>...</th>\n",
" <th>calls</th>\n",
" <th>genotypelikelihoods</th>\n",
" <th>phredlikelihoods</th>\n",
" <th>alleledepths</th>\n",
" <th>conditionalquality</th>\n",
" <th>spl</th>\n",
" <th>depth</th>\n",
" <th>ps</th>\n",
" <th>sampleid</th>\n",
" <th>information</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>efdbfdf9-81b8-43e7-bca8-60bc3834a2bc</td>\n",
" <td>11</td>\n",
" <td>17442745</td>\n",
" <td>17442746</td>\n",
" <td>[370657]</td>\n",
" <td>G</td>\n",
" <td>[GT]</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>False</td>\n",
" <td>...</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>efdbfdf9-81b8-43e7-bca8-60bc3834a2bc</td>\n",
" <td>12</td>\n",
" <td>132638015</td>\n",
" <td>132638016</td>\n",
" <td>[1008494]</td>\n",
" <td>G</td>\n",
" <td>[A]</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>False</td>\n",
" <td>...</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>efdbfdf9-81b8-43e7-bca8-60bc3834a2bc</td>\n",
" <td>4</td>\n",
" <td>177439635</td>\n",
" <td>177439637</td>\n",
" <td>[371375]</td>\n",
" <td>TA</td>\n",
" <td>[T]</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>False</td>\n",
" <td>...</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>efdbfdf9-81b8-43e7-bca8-60bc3834a2bc</td>\n",
" <td>11</td>\n",
" <td>17442747</td>\n",
" <td>17442748</td>\n",
" <td>[1107489]</td>\n",
" <td>C</td>\n",
" <td>[T]</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>False</td>\n",
" <td>...</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>efdbfdf9-81b8-43e7-bca8-60bc3834a2bc</td>\n",
" <td>11</td>\n",
" <td>17442751</td>\n",
" <td>17442752</td>\n",
" <td>[1502031]</td>\n",
" <td>A</td>\n",
" <td>[G]</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>False</td>\n",
" <td>...</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>efdbfdf9-81b8-43e7-bca8-60bc3834a2bc</td>\n",
" <td>15</td>\n",
" <td>89776452</td>\n",
" <td>89776453</td>\n",
" <td>[1575155]</td>\n",
" <td>C</td>\n",
" <td>[T]</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>False</td>\n",
" <td>...</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>efdbfdf9-81b8-43e7-bca8-60bc3834a2bc</td>\n",
" <td>1</td>\n",
" <td>925951</td>\n",
" <td>925952</td>\n",
" <td>[1019397]</td>\n",
" <td>G</td>\n",
" <td>[A]</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>False</td>\n",
" <td>...</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>efdbfdf9-81b8-43e7-bca8-60bc3834a2bc</td>\n",
" <td>12</td>\n",
" <td>132638016</td>\n",
" <td>132638020</td>\n",
" <td>[484509]</td>\n",
" <td>CTGG</td>\n",
" <td>[C]</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>False</td>\n",
" <td>...</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>efdbfdf9-81b8-43e7-bca8-60bc3834a2bc</td>\n",
" <td>12</td>\n",
" <td>132638019</td>\n",
" <td>132638020</td>\n",
" <td>[1405963]</td>\n",
" <td>G</td>\n",
" <td>[C]</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>False</td>\n",
" <td>...</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>efdbfdf9-81b8-43e7-bca8-60bc3834a2bc</td>\n",
" <td>19</td>\n",
" <td>42293924</td>\n",
" <td>42293925</td>\n",
" <td>[717577]</td>\n",
" <td>G</td>\n",
" <td>[A]</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>False</td>\n",
" <td>...</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>10 rows × 22 columns</p>\n",
"</div>"
],
"text/plain": [
" importjobid contigname start end \\\n",
"0 efdbfdf9-81b8-43e7-bca8-60bc3834a2bc 11 17442745 17442746 \n",
"1 efdbfdf9-81b8-43e7-bca8-60bc3834a2bc 12 132638015 132638016 \n",
"2 efdbfdf9-81b8-43e7-bca8-60bc3834a2bc 4 177439635 177439637 \n",
"3 efdbfdf9-81b8-43e7-bca8-60bc3834a2bc 11 17442747 17442748 \n",
"4 efdbfdf9-81b8-43e7-bca8-60bc3834a2bc 11 17442751 17442752 \n",
"5 efdbfdf9-81b8-43e7-bca8-60bc3834a2bc 15 89776452 89776453 \n",
"6 efdbfdf9-81b8-43e7-bca8-60bc3834a2bc 1 925951 925952 \n",
"7 efdbfdf9-81b8-43e7-bca8-60bc3834a2bc 12 132638016 132638020 \n",
"8 efdbfdf9-81b8-43e7-bca8-60bc3834a2bc 12 132638019 132638020 \n",
"9 efdbfdf9-81b8-43e7-bca8-60bc3834a2bc 19 42293924 42293925 \n",
"\n",
" names referenceallele alternatealleles qual filters \\\n",
"0 [370657] G [GT] None None \n",
"1 [1008494] G [A] None None \n",
"2 [371375] TA [T] None None \n",
"3 [1107489] C [T] None None \n",
"4 [1502031] A [G] None None \n",
"5 [1575155] C [T] None None \n",
"6 [1019397] G [A] None None \n",
"7 [484509] CTGG [C] None None \n",
"8 [1405963] G [C] None None \n",
"9 [717577] G [A] None None \n",
"\n",
" splitfrommultiallelic ... calls genotypelikelihoods phredlikelihoods \\\n",
"0 False ... None None None \n",
"1 False ... None None None \n",
"2 False ... None None None \n",
"3 False ... None None None \n",
"4 False ... None None None \n",
"5 False ... None None None \n",
"6 False ... None None None \n",
"7 False ... None None None \n",
"8 False ... None None None \n",
"9 False ... None None None \n",
"\n",
" alleledepths conditionalquality spl depth ps sampleid information \n",
"0 None None None None None None None \n",
"1 None None None None None None None \n",
"2 None None None None None None None \n",
"3 None None None None None None None \n",
"4 None None None None None None None \n",
"5 None None None None None None None \n",
"6 None None None None None None None \n",
"7 None None None None None None None \n",
"8 None None None None None None None \n",
"9 None None None None None None None \n",
"\n",
"[10 rows x 22 columns]"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"conn = connect(s3_staging_dir='s3://%s/results/annotation/clinvar' % data_lake_bucket, region_name=region, schema_name=database_name)\n",
"cursor = conn.cursor(work_group=work_group_name)\n",
"cursor.execute(f'SELECT * FROM {annotation_table_name} limit 10')\n",
"df = as_pandas(cursor)\n",
"df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Query cohort dataset"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>importjobid</th>\n",
" <th>contigname</th>\n",
" <th>start</th>\n",
" <th>end</th>\n",
" <th>names</th>\n",
" <th>referenceallele</th>\n",
" <th>alternatealleles</th>\n",
" <th>qual</th>\n",
" <th>filters</th>\n",
" <th>splitfrommultiallelic</th>\n",
" <th>...</th>\n",
" <th>calls</th>\n",
" <th>genotypelikelihoods</th>\n",
" <th>phredlikelihoods</th>\n",
" <th>alleledepths</th>\n",
" <th>conditionalquality</th>\n",
" <th>spl</th>\n",
" <th>depth</th>\n",
" <th>ps</th>\n",
" <th>sampleid</th>\n",
" <th>information</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>204afc74-76ae-44db-9428-89cc546c3c7c</td>\n",
" <td>22</td>\n",
" <td>41061122</td>\n",
" <td>41061123</td>\n",
" <td>None</td>\n",
" <td>T</td>\n",
" <td>[C]</td>\n",
" <td>None</td>\n",
" <td>[PASS]</td>\n",
" <td>False</td>\n",
" <td>...</td>\n",
" <td>[0, 0]</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>NA12003</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>204afc74-76ae-44db-9428-89cc546c3c7c</td>\n",
" <td>22</td>\n",
" <td>41061122</td>\n",
" <td>41061123</td>\n",
" <td>None</td>\n",
" <td>T</td>\n",
" <td>[C]</td>\n",
" <td>None</td>\n",
" <td>[PASS]</td>\n",
" <td>False</td>\n",
" <td>...</td>\n",
" <td>[0, 0]</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>NA12004</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>204afc74-76ae-44db-9428-89cc546c3c7c</td>\n",
" <td>22</td>\n",
" <td>41061122</td>\n",
" <td>41061123</td>\n",
" <td>None</td>\n",
" <td>T</td>\n",
" <td>[C]</td>\n",
" <td>None</td>\n",
" <td>[PASS]</td>\n",
" <td>False</td>\n",
" <td>...</td>\n",
" <td>[0, 0]</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>NA12005</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>204afc74-76ae-44db-9428-89cc546c3c7c</td>\n",
" <td>22</td>\n",
" <td>41061122</td>\n",
" <td>41061123</td>\n",
" <td>None</td>\n",
" <td>T</td>\n",
" <td>[C]</td>\n",
" <td>None</td>\n",
" <td>[PASS]</td>\n",
" <td>False</td>\n",
" <td>...</td>\n",
" <td>[0, 0]</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>NA12006</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>204afc74-76ae-44db-9428-89cc546c3c7c</td>\n",
" <td>22</td>\n",
" <td>41061122</td>\n",
" <td>41061123</td>\n",
" <td>None</td>\n",
" <td>T</td>\n",
" <td>[C]</td>\n",
" <td>None</td>\n",
" <td>[PASS]</td>\n",
" <td>False</td>\n",
" <td>...</td>\n",
" <td>[0, 0]</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>NA12043</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>204afc74-76ae-44db-9428-89cc546c3c7c</td>\n",
" <td>22</td>\n",
" <td>41061122</td>\n",
" <td>41061123</td>\n",
" <td>None</td>\n",
" <td>T</td>\n",
" <td>[C]</td>\n",
" <td>None</td>\n",
" <td>[PASS]</td>\n",
" <td>False</td>\n",
" <td>...</td>\n",
" <td>[0, 0]</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>NA12044</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>204afc74-76ae-44db-9428-89cc546c3c7c</td>\n",
" <td>22</td>\n",
" <td>41061122</td>\n",
" <td>41061123</td>\n",
" <td>None</td>\n",
" <td>T</td>\n",
" <td>[C]</td>\n",
" <td>None</td>\n",
" <td>[PASS]</td>\n",
" <td>False</td>\n",
" <td>...</td>\n",
" <td>[0, 0]</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>NA12045</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>204afc74-76ae-44db-9428-89cc546c3c7c</td>\n",
" <td>22</td>\n",
" <td>41061122</td>\n",
" <td>41061123</td>\n",
" <td>None</td>\n",
" <td>T</td>\n",
" <td>[C]</td>\n",
" <td>None</td>\n",
" <td>[PASS]</td>\n",
" <td>False</td>\n",
" <td>...</td>\n",
" <td>[0, 0]</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>NA12046</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>204afc74-76ae-44db-9428-89cc546c3c7c</td>\n",
" <td>22</td>\n",
" <td>41061122</td>\n",
" <td>41061123</td>\n",
" <td>None</td>\n",
" <td>T</td>\n",
" <td>[C]</td>\n",
" <td>None</td>\n",
" <td>[PASS]</td>\n",
" <td>False</td>\n",
" <td>...</td>\n",
" <td>[0, 0]</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>NA12058</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>204afc74-76ae-44db-9428-89cc546c3c7c</td>\n",
" <td>22</td>\n",
" <td>41061122</td>\n",
" <td>41061123</td>\n",
" <td>None</td>\n",
" <td>T</td>\n",
" <td>[C]</td>\n",
" <td>None</td>\n",
" <td>[PASS]</td>\n",
" <td>False</td>\n",
" <td>...</td>\n",
" <td>[0, 0]</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>None</td>\n",
" <td>NA12144</td>\n",
" <td>None</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>10 rows × 22 columns</p>\n",
"</div>"
],
"text/plain": [
" importjobid contigname start end names \\\n",
"0 204afc74-76ae-44db-9428-89cc546c3c7c 22 41061122 41061123 None \n",
"1 204afc74-76ae-44db-9428-89cc546c3c7c 22 41061122 41061123 None \n",
"2 204afc74-76ae-44db-9428-89cc546c3c7c 22 41061122 41061123 None \n",
"3 204afc74-76ae-44db-9428-89cc546c3c7c 22 41061122 41061123 None \n",
"4 204afc74-76ae-44db-9428-89cc546c3c7c 22 41061122 41061123 None \n",
"5 204afc74-76ae-44db-9428-89cc546c3c7c 22 41061122 41061123 None \n",
"6 204afc74-76ae-44db-9428-89cc546c3c7c 22 41061122 41061123 None \n",
"7 204afc74-76ae-44db-9428-89cc546c3c7c 22 41061122 41061123 None \n",
"8 204afc74-76ae-44db-9428-89cc546c3c7c 22 41061122 41061123 None \n",
"9 204afc74-76ae-44db-9428-89cc546c3c7c 22 41061122 41061123 None \n",
"\n",
" referenceallele alternatealleles qual filters splitfrommultiallelic ... \\\n",
"0 T [C] None [PASS] False ... \n",
"1 T [C] None [PASS] False ... \n",
"2 T [C] None [PASS] False ... \n",
"3 T [C] None [PASS] False ... \n",
"4 T [C] None [PASS] False ... \n",
"5 T [C] None [PASS] False ... \n",
"6 T [C] None [PASS] False ... \n",
"7 T [C] None [PASS] False ... \n",
"8 T [C] None [PASS] False ... \n",
"9 T [C] None [PASS] False ... \n",
"\n",
" calls genotypelikelihoods phredlikelihoods alleledepths \\\n",
"0 [0, 0] None None None \n",
"1 [0, 0] None None None \n",
"2 [0, 0] None None None \n",
"3 [0, 0] None None None \n",
"4 [0, 0] None None None \n",
"5 [0, 0] None None None \n",
"6 [0, 0] None None None \n",
"7 [0, 0] None None None \n",
"8 [0, 0] None None None \n",
"9 [0, 0] None None None \n",
"\n",
" conditionalquality spl depth ps sampleid information \n",
"0 None None None None NA12003 None \n",
"1 None None None None NA12004 None \n",
"2 None None None None NA12005 None \n",
"3 None None None None NA12006 None \n",
"4 None None None None NA12043 None \n",
"5 None None None None NA12044 None \n",
"6 None None None None NA12045 None \n",
"7 None None None None NA12046 None \n",
"8 None None None None NA12058 None \n",
"9 None None None None NA12144 None \n",
"\n",
"[10 rows x 22 columns]"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"conn = connect(s3_staging_dir='s3://%s/results/variants/' % data_lake_bucket,region_name=region, schema_name=database_name)\n",
"cursor = conn.cursor(work_group=work_group_name)\n",
"cursor.execute(f\"SELECT * FROM {variant_table_name} WHERE sampleid LIKE 'NA12%' limit 10\")\n",
"df = as_pandas(cursor)\n",
"df\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Query individual variant dataset"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>importjobid</th>\n",
" <th>contigname</th>\n",
" <th>start</th>\n",
" <th>end</th>\n",
" <th>names</th>\n",
" <th>referenceallele</th>\n",
" <th>alternatealleles</th>\n",
" <th>qual</th>\n",
" <th>filters</th>\n",
" <th>splitfrommultiallelic</th>\n",
" <th>...</th>\n",
" <th>calls</th>\n",
" <th>genotypelikelihoods</th>\n",
" <th>phredlikelihoods</th>\n",
" <th>alleledepths</th>\n",
" <th>conditionalquality</th>\n",
" <th>spl</th>\n",
" <th>depth</th>\n",
" <th>ps</th>\n",
" <th>sampleid</th>\n",
" <th>information</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>571ce61e-f0b5-40b1-993b-650ab2af8284</td>\n",
" <td>chr6</td>\n",
" <td>118833153</td>\n",
" <td>118833154</td>\n",
" <td>None</td>\n",
" <td>C</td>\n",
" <td>[T]</td>\n",
" <td>75.0</td>\n",
" <td>[LowGQX]</td>\n",
" <td>False</td>\n",
" <td>...</td>\n",
" <td>[1, 1]</td>\n",
" <td>None</td>\n",
" <td>[111, 9, 0]</td>\n",
" <td>[0, 3]</td>\n",
" <td>7</td>\n",
" <td>None</td>\n",
" <td>3</td>\n",
" <td>None</td>\n",
" <td>default</td>\n",
" <td>{min_dp=null, adf=[0, 2], genotype_filters=[Lo...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>571ce61e-f0b5-40b1-993b-650ab2af8284</td>\n",
" <td>chr6</td>\n",
" <td>118840547</td>\n",
" <td>118840548</td>\n",
" <td>None</td>\n",
" <td>G</td>\n",
" <td>[T]</td>\n",
" <td>11.0</td>\n",
" <td>[LowDepth, LowGQX]</td>\n",
" <td>False</td>\n",
" <td>...</td>\n",
" <td>[0, 1]</td>\n",
" <td>None</td>\n",
" <td>[46, 3, 0]</td>\n",
" <td>[0, 1]</td>\n",
" <td>3</td>\n",
" <td>None</td>\n",
" <td>1</td>\n",
" <td>None</td>\n",
" <td>default</td>\n",
" <td>{min_dp=null, adf=[0, 1], genotype_filters=[Lo...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>571ce61e-f0b5-40b1-993b-650ab2af8284</td>\n",
" <td>chr6</td>\n",
" <td>118850771</td>\n",
" <td>118850772</td>\n",
" <td>None</td>\n",
" <td>G</td>\n",
" <td>[T]</td>\n",
" <td>10.0</td>\n",
" <td>[LowDepth, LowGQX]</td>\n",
" <td>False</td>\n",
" <td>...</td>\n",
" <td>[0, 1]</td>\n",
" <td>None</td>\n",
" <td>[44, 3, 0]</td>\n",
" <td>[0, 1]</td>\n",
" <td>3</td>\n",
" <td>None</td>\n",
" <td>1</td>\n",
" <td>None</td>\n",
" <td>default</td>\n",
" <td>{min_dp=null, adf=[0, 0], genotype_filters=[Lo...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>571ce61e-f0b5-40b1-993b-650ab2af8284</td>\n",
" <td>chr6</td>\n",
" <td>118856568</td>\n",
" <td>118856569</td>\n",
" <td>None</td>\n",
" <td>C</td>\n",
" <td>[T]</td>\n",
" <td>11.0</td>\n",
" <td>[LowDepth, LowGQX]</td>\n",
" <td>False</td>\n",
" <td>...</td>\n",
" <td>[0, 1]</td>\n",
" <td>None</td>\n",
" <td>[46, 3, 0]</td>\n",
" <td>[0, 1]</td>\n",
" <td>3</td>\n",
" <td>None</td>\n",
" <td>1</td>\n",
" <td>None</td>\n",
" <td>default</td>\n",
" <td>{min_dp=null, adf=[0, 0], genotype_filters=[Lo...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>571ce61e-f0b5-40b1-993b-650ab2af8284</td>\n",
" <td>chr6</td>\n",
" <td>118872336</td>\n",
" <td>118872337</td>\n",
" <td>None</td>\n",
" <td>G</td>\n",
" <td>[A]</td>\n",
" <td>11.0</td>\n",
" <td>[LowDepth, LowGQX]</td>\n",
" <td>False</td>\n",
" <td>...</td>\n",
" <td>[0, 1]</td>\n",
" <td>None</td>\n",
" <td>[46, 3, 0]</td>\n",
" <td>[0, 1]</td>\n",
" <td>3</td>\n",
" <td>None</td>\n",
" <td>1</td>\n",
" <td>None</td>\n",
" <td>default</td>\n",
" <td>{min_dp=null, adf=[0, 0], genotype_filters=[Lo...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>571ce61e-f0b5-40b1-993b-650ab2af8284</td>\n",
" <td>chr6</td>\n",
" <td>118874817</td>\n",
" <td>118874818</td>\n",
" <td>None</td>\n",
" <td>T</td>\n",
" <td>[C]</td>\n",
" <td>7.0</td>\n",
" <td>[LowDepth, LowGQX]</td>\n",
" <td>False</td>\n",
" <td>...</td>\n",
" <td>[0, 1]</td>\n",
" <td>None</td>\n",
" <td>[41, 3, 0]</td>\n",
" <td>[0, 1]</td>\n",
" <td>3</td>\n",
" <td>None</td>\n",
" <td>1</td>\n",
" <td>None</td>\n",
" <td>default</td>\n",
" <td>{min_dp=null, adf=[0, 0], genotype_filters=[Lo...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>571ce61e-f0b5-40b1-993b-650ab2af8284</td>\n",
" <td>chr6</td>\n",
" <td>118890083</td>\n",
" <td>118890084</td>\n",
" <td>None</td>\n",
" <td>G</td>\n",
" <td>[T]</td>\n",
" <td>11.0</td>\n",
" <td>[LowDepth, LowGQX]</td>\n",
" <td>False</td>\n",
" <td>...</td>\n",
" <td>[0, 1]</td>\n",
" <td>None</td>\n",
" <td>[46, 3, 0]</td>\n",
" <td>[0, 1]</td>\n",
" <td>3</td>\n",
" <td>None</td>\n",
" <td>1</td>\n",
" <td>None</td>\n",
" <td>default</td>\n",
" <td>{min_dp=null, adf=[0, 1], genotype_filters=[Lo...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>571ce61e-f0b5-40b1-993b-650ab2af8284</td>\n",
" <td>chr6</td>\n",
" <td>118893858</td>\n",
" <td>118893859</td>\n",
" <td>None</td>\n",
" <td>A</td>\n",
" <td>[C]</td>\n",
" <td>4.0</td>\n",
" <td>[LowDepth, LowGQX]</td>\n",
" <td>False</td>\n",
" <td>...</td>\n",
" <td>[0, 1]</td>\n",
" <td>None</td>\n",
" <td>[37, 3, 0]</td>\n",
" <td>[0, 1]</td>\n",
" <td>3</td>\n",
" <td>None</td>\n",
" <td>1</td>\n",
" <td>None</td>\n",
" <td>default</td>\n",
" <td>{min_dp=null, adf=[0, 1], genotype_filters=[Lo...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>571ce61e-f0b5-40b1-993b-650ab2af8284</td>\n",
" <td>chr6</td>\n",
" <td>118894541</td>\n",
" <td>118894542</td>\n",
" <td>None</td>\n",
" <td>G</td>\n",
" <td>[A]</td>\n",
" <td>1.0</td>\n",
" <td>[LowGQX]</td>\n",
" <td>False</td>\n",
" <td>...</td>\n",
" <td>[0, 1]</td>\n",
" <td>None</td>\n",
" <td>[27, 0, 81]</td>\n",
" <td>[4, 1]</td>\n",
" <td>25</td>\n",
" <td>None</td>\n",
" <td>5</td>\n",
" <td>None</td>\n",
" <td>default</td>\n",
" <td>{min_dp=null, adf=[0, 0], genotype_filters=[Lo...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>571ce61e-f0b5-40b1-993b-650ab2af8284</td>\n",
" <td>chr6</td>\n",
" <td>118894557</td>\n",
" <td>118894558</td>\n",
" <td>None</td>\n",
" <td>G</td>\n",
" <td>[A]</td>\n",
" <td>26.0</td>\n",
" <td>[LowGQX]</td>\n",
" <td>False</td>\n",
" <td>...</td>\n",
" <td>[0, 1]</td>\n",
" <td>None</td>\n",
" <td>[60, 0, 82]</td>\n",
" <td>[5, 4]</td>\n",
" <td>58</td>\n",
" <td>None</td>\n",
" <td>9</td>\n",
" <td>None</td>\n",
" <td>default</td>\n",
" <td>{min_dp=null, adf=[0, 0], genotype_filters=[Lo...</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>10 rows × 22 columns</p>\n",
"</div>"
],
"text/plain": [
" importjobid contigname start end \\\n",
"0 571ce61e-f0b5-40b1-993b-650ab2af8284 chr6 118833153 118833154 \n",
"1 571ce61e-f0b5-40b1-993b-650ab2af8284 chr6 118840547 118840548 \n",
"2 571ce61e-f0b5-40b1-993b-650ab2af8284 chr6 118850771 118850772 \n",
"3 571ce61e-f0b5-40b1-993b-650ab2af8284 chr6 118856568 118856569 \n",
"4 571ce61e-f0b5-40b1-993b-650ab2af8284 chr6 118872336 118872337 \n",
"5 571ce61e-f0b5-40b1-993b-650ab2af8284 chr6 118874817 118874818 \n",
"6 571ce61e-f0b5-40b1-993b-650ab2af8284 chr6 118890083 118890084 \n",
"7 571ce61e-f0b5-40b1-993b-650ab2af8284 chr6 118893858 118893859 \n",
"8 571ce61e-f0b5-40b1-993b-650ab2af8284 chr6 118894541 118894542 \n",
"9 571ce61e-f0b5-40b1-993b-650ab2af8284 chr6 118894557 118894558 \n",
"\n",
" names referenceallele alternatealleles qual filters \\\n",
"0 None C [T] 75.0 [LowGQX] \n",
"1 None G [T] 11.0 [LowDepth, LowGQX] \n",
"2 None G [T] 10.0 [LowDepth, LowGQX] \n",
"3 None C [T] 11.0 [LowDepth, LowGQX] \n",
"4 None G [A] 11.0 [LowDepth, LowGQX] \n",
"5 None T [C] 7.0 [LowDepth, LowGQX] \n",
"6 None G [T] 11.0 [LowDepth, LowGQX] \n",
"7 None A [C] 4.0 [LowDepth, LowGQX] \n",
"8 None G [A] 1.0 [LowGQX] \n",
"9 None G [A] 26.0 [LowGQX] \n",
"\n",
" splitfrommultiallelic ... calls genotypelikelihoods phredlikelihoods \\\n",
"0 False ... [1, 1] None [111, 9, 0] \n",
"1 False ... [0, 1] None [46, 3, 0] \n",
"2 False ... [0, 1] None [44, 3, 0] \n",
"3 False ... [0, 1] None [46, 3, 0] \n",
"4 False ... [0, 1] None [46, 3, 0] \n",
"5 False ... [0, 1] None [41, 3, 0] \n",
"6 False ... [0, 1] None [46, 3, 0] \n",
"7 False ... [0, 1] None [37, 3, 0] \n",
"8 False ... [0, 1] None [27, 0, 81] \n",
"9 False ... [0, 1] None [60, 0, 82] \n",
"\n",
" alleledepths conditionalquality spl depth ps sampleid \\\n",
"0 [0, 3] 7 None 3 None default \n",
"1 [0, 1] 3 None 1 None default \n",
"2 [0, 1] 3 None 1 None default \n",
"3 [0, 1] 3 None 1 None default \n",
"4 [0, 1] 3 None 1 None default \n",
"5 [0, 1] 3 None 1 None default \n",
"6 [0, 1] 3 None 1 None default \n",
"7 [0, 1] 3 None 1 None default \n",
"8 [4, 1] 25 None 5 None default \n",
"9 [5, 4] 58 None 9 None default \n",
"\n",
" information \n",
"0 {min_dp=null, adf=[0, 2], genotype_filters=[Lo... \n",
"1 {min_dp=null, adf=[0, 1], genotype_filters=[Lo... \n",
"2 {min_dp=null, adf=[0, 0], genotype_filters=[Lo... \n",
"3 {min_dp=null, adf=[0, 0], genotype_filters=[Lo... \n",
"4 {min_dp=null, adf=[0, 0], genotype_filters=[Lo... \n",
"5 {min_dp=null, adf=[0, 0], genotype_filters=[Lo... \n",
"6 {min_dp=null, adf=[0, 1], genotype_filters=[Lo... \n",
"7 {min_dp=null, adf=[0, 1], genotype_filters=[Lo... \n",
"8 {min_dp=null, adf=[0, 0], genotype_filters=[Lo... \n",
"9 {min_dp=null, adf=[0, 0], genotype_filters=[Lo... \n",
"\n",
"[10 rows x 22 columns]"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"conn = connect(s3_staging_dir='s3://%s/results/vcf/' % data_lake_bucket,region_name=region, schema_name=database_name)\n",
"cursor = conn.cursor(work_group=work_group_name)\n",
"cursor.execute(f\"SELECT * FROM {variant_table_name} where sampleid='default' limit 10\")\n",
"df = as_pandas(cursor)\n",
"df\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"celltoolbar": "Raw Cell Format",
"kernelspec": {
"display_name": "conda_python3",
"language": "python",
"name": "conda_python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.12"
}
},
"nbformat": 4,
"nbformat_minor": 2
}