3822 lines (3821 with data), 89.5 kB
{
"cells": [
{
"cell_type": "markdown",
"id": "44dece74-3ef8-411c-abfc-53f43efe531d",
"metadata": {},
"source": [
"# Clinical Trial Embedding Tutorial\n",
"\n",
"In this tutorial I will show you how to obtain clinical trial information and use embeddings for different types of clinical trial data.\n",
"\n",
"Agenda:\n",
"- Collect all clinical trial records from clinicaltrials.gov\n",
"- Read and parse the obtained XML files \n",
"- Embed **disease indications** using 'nlpie/tiny-biobert', a compact version of BioBERT\n",
"- Embed clinical trial **inclusion-/exclusion criteria** using 'nlpie/tiny-biobert', a compact version of BioBERT\n",
"- Embed **sponsor information** using 'all-MiniLM-L6-v2', a powerful pre-trained sentence encoder\n",
"- Convert **drug names** to their SMILES representation and then to their Morgan fingerprint\n",
"\n",
"Let's start!"
]
},
{
"cell_type": "markdown",
"id": "f4d7eab1-02de-4df6-a2a3-330fbb021bc4",
"metadata": {},
"source": [
"# Import libraries"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "3a40705f-1f55-4295-8681-dde080015885",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"from tqdm import tqdm\n",
"import pickle\n",
"from functools import reduce"
]
},
{
"cell_type": "markdown",
"id": "4e920522-40f7-4425-a5e1-684b6598ce7e",
"metadata": {},
"source": [
"# Collect all the clinical trial records"
]
},
{
"cell_type": "markdown",
"id": "fdfc28e2-ce4b-411c-8fc7-ea214be329b8",
"metadata": {},
"source": [
"I suggest running the whole process in the command line since it is time- and space consuming. "
]
},
{
"cell_type": "markdown",
"id": "9ebfecbe-a157-4c9d-b697-0161700150fb",
"metadata": {},
"source": [
"### 1. Download data\n",
"mkdir -p raw_data \\\n",
"cd raw_data \\\n",
"wget https://clinicaltrials.gov/AllPublicXML.zip # This will take 10-20 minutes to download\n",
"\n",
"### 2. Unzip the ZIP file.\n",
"### The unzipped file occupies approximately 11 GB. Please make sure you have enough space. \n",
"unzip AllPublicXML.zip # This might take over an hour to run, depending on your system \\\n",
"cd ../\n",
"\n",
"### 3. Collect and sort all the XML files\n",
"find raw_data/ -name NCT*.xml | sort > data/all_xml \\\n",
"head -3 data/all_xml\n",
"\n",
"### Output:\n",
"raw_data/NCT0000xxxx/NCT00000102.xml \\\n",
"raw_data/NCT0000xxxx/NCT00000104.xml \\\n",
"raw_data/NCT0000xxxx/NCT00000105.xml \n",
"\n",
"NCTID is the identifier of a clinical trial. `NCT00000102`, `NCT00000104`, `NCT00000105` are all NCTIDs. \n",
"\n",
"### 4. Remove ZIP file to recover some disk space\n",
"rm raw_data/AllPublicXML.zip"
]
},
{
"cell_type": "markdown",
"id": "4eabf5a9-570f-4c42-a378-0f11ffcea651",
"metadata": {},
"source": [
"# Parse XML clinical trial files"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "7f8ffad7-d3cb-46d9-a2c8-0d422dd90fa5",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>nctid</th>\n",
" <th>study_type</th>\n",
" <th>drug_interventions</th>\n",
" <th>overall_status</th>\n",
" <th>why_stopped</th>\n",
" <th>phase</th>\n",
" <th>indications</th>\n",
" <th>criteria</th>\n",
" <th>enrollment</th>\n",
" <th>lead_sponsor</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>NCT00040014</td>\n",
" <td>Interventional</td>\n",
" <td>[exemestane]</td>\n",
" <td>Terminated</td>\n",
" <td></td>\n",
" <td>Phase 2</td>\n",
" <td>[Breast Neoplasms]</td>\n",
" <td>\\n Inclusion Criteria:\\r\\n\\r\\n ...</td>\n",
" <td>100</td>\n",
" <td>Pfizer</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" nctid study_type drug_interventions overall_status why_stopped \\\n",
"0 NCT00040014 Interventional [exemestane] Terminated \n",
"\n",
" phase indications \\\n",
"0 Phase 2 [Breast Neoplasms] \n",
"\n",
" criteria enrollment lead_sponsor \n",
"0 \\n Inclusion Criteria:\\r\\n\\r\\n ... 100 Pfizer "
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from xml.etree import ElementTree as ET\n",
"# function adapted from https://github.com/futianfan/clinical-trial-outcome-prediction\n",
"def xmlfile2results(xml_file):\n",
" tree = ET.parse(xml_file)\n",
" root = tree.getroot()\n",
" nctid = root.find('id_info').find('nct_id').text\t### nctid: 'NCT00000102'\n",
" # print(\"nctid is\", nctid)\n",
" study_type = root.find('study_type').text\n",
" # print(\"study type is\", study_type)\n",
" interventions = [i for i in root.findall('intervention')]\n",
" drug_interventions = [i.find('intervention_name').text for i in interventions \\\n",
"\t\t\t\t\t\t\t\t\t\t\t\t\t\tif i.find('intervention_type').text=='Drug']\n",
" # print(\"drug intervention:\", drug_interventions)\n",
" ### remove 'biologics', \n",
" ### non-interventions \n",
" if len(drug_interventions)==0:\n",
" return (None,)\n",
"\n",
" try:\n",
" status = root.find('overall_status').text \n",
" # print(\"status:\", status)\n",
" except:\n",
" status = ''\n",
"\n",
" try:\n",
" why_stop = root.find('why_stopped').text\n",
" # print(\"why stop:\", why_stop)\n",
" except:\n",
" why_stop = ''\n",
"\n",
" try:\n",
" phase = root.find('phase').text\n",
" # print(\"phase:\", phase)\n",
" except:\n",
" phase = ''\n",
" conditions = [i.text for i in root.findall('condition')] ### disease \n",
" # print(\"disease\", conditions)\n",
"\n",
" try:\n",
" criteria = root.find('eligibility').find('criteria').find('textblock').text\n",
" # print('found criteria')\n",
" except:\n",
" criteria = ''\n",
"\n",
" try:\n",
" enrollment = root.find('enrollment').text\n",
" # print(\"enrollment:\", enrollment)\n",
" except:\n",
" enrollment = ''\n",
"\n",
" try:\n",
" lead_sponsor = root.find('sponsors').find('lead_sponsor').find('agency').text \n",
" # print(\"lead_sponsor:\", lead_sponsor)\n",
" except:\n",
" lead_sponsor = ''\n",
"\n",
" data = {'nctid':nctid,\n",
" 'study_type':study_type,\n",
" 'drug_interventions':[drug_interventions],\n",
" 'overall_status':status,\n",
" 'why_stopped':why_stop,\n",
" 'phase':phase,\n",
" 'indications':[conditions],\n",
" 'criteria':criteria,\n",
" 'enrollment':enrollment,\n",
" 'lead_sponsor':lead_sponsor}\n",
" return pd.DataFrame(data)\n",
" \n",
"xmlfile = \"data/NCT00040014.xml\"\n",
"df = xmlfile2results(xmlfile)\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "96508ffb",
"metadata": {},
"outputs": [],
"source": [
"# We will only use a limited selection of trials, the same as the HINT paper (https://www.cell.com/patterns/pdf/S2666-3899(22)00018-6.pdf)\n",
"# This way, we can later on compare performaces of the clinical trial outcome prediction\n",
"df_selected = pd.read_pickle('data/selected_trials_df.pkl')\n",
"toy_nctids = df_selected[df_selected['dataset']=='toy']['nctid'].tolist()\n",
"toy_df = pd.DataFrame()\n",
"\n",
"#Parse the XML file for each selected trial and save resulting dataframe\n",
"for nctid in tqdm(toy_nctids):\n",
" try:\n",
" xml_file = 'raw_data/'+nctid[:7]+'xxxx/'+nctid+'.xml'\n",
" df = xmlfile2results(xml_file)\n",
" toy_df = pd.concat([toy_df, df], axis=0) \n",
" except FileNotFoundError:\n",
" print(f\"The file {file} does not exist.\")\n",
" continue\n",
"\n",
"toy_df = toy_df.merge(df_selected[df_selected['dataset']=='toy'], on='nctid', how='left')\n",
"pickle.dump(toy_df, open('data/toy_df.pkl', 'wb')) \n",
"toy_df.head()"
]
},
{
"cell_type": "markdown",
"id": "2bb3b091-fdaa-4da2-8eee-5c907086906a",
"metadata": {},
"source": [
"# Using sentence-transformers to embed information - Example"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3eccb526-ab65-4a82-8d5d-d69548cecd8b",
"metadata": {},
"outputs": [],
"source": [
"!pip install -U sentence-transformers"
]
},
{
"cell_type": "code",
"execution_count": 32,
"id": "b9d84339",
"metadata": {},
"outputs": [],
"source": [
"from sentence_transformers import SentenceTransformer"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "f4b23bd8-a8ad-4673-b54a-4066703f71f4",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"No sentence-transformers model found with name C:\\Users\\Lennart/.cache\\torch\\sentence_transformers\\nlpie_tiny-biobert. Creating a new one with MEAN pooling.\n",
"Some weights of BertModel were not initialized from the model checkpoint at C:\\Users\\Lennart/.cache\\torch\\sentence_transformers\\nlpie_tiny-biobert and are newly initialized: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']\n",
"You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"(2, 312)\n"
]
}
],
"source": [
"sentences = [\"This is an example sentence\", \"Each sentence is converted\"]\n",
"model = SentenceTransformer('all-MiniLM-L6-v2')\n",
"embeddings = model.encode(sentences)\n",
"print(embeddings.shape)\n",
"print(embeddings)"
]
},
{
"cell_type": "markdown",
"id": "4fc0db9c",
"metadata": {},
"source": [
"# Indication Embedding \n",
"### Create indication2embedding_dict using nlpie/tiny-biobert"
]
},
{
"cell_type": "code",
"execution_count": 56,
"id": "492df441",
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"No sentence-transformers model found with name C:\\Users\\Lennart/.cache\\torch\\sentence_transformers\\nlpie_tiny-biobert. Creating a new one with MEAN pooling.\n",
"Some weights of BertModel were not initialized from the model checkpoint at C:\\Users\\Lennart/.cache\\torch\\sentence_transformers\\nlpie_tiny-biobert and are newly initialized: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']\n",
"You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "7eb382096c5c49b1be8d593dc159104f",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Batches: 0%| | 0/43 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
" 57%|███████████████████████████████████████████▌ | 831/1469 [00:00<00:00, 3910.09it/s]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"(2, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(9, 312)\n",
"(312,)\n",
"(5, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(8, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(5, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(5, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(4, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(4, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(5, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(5, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(6, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(7, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(4, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(5, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(4, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(6, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(5, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(4, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(4, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(4, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(7, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(4, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(4, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(4, 312)\n",
"(312,)\n",
"(4, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(8, 312)\n",
"(312,)\n",
"(4, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(4, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(6, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(4, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(5, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(5, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(4, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(8, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(10, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(4, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(5, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(6, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(4, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(4, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(5, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(4, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(4, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(6, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(4, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(6, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(4, 312)\n",
"(312,)\n",
"(8, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(4, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(8, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(4, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(9, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(4, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(10, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(5, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(5, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(4, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(5, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(4, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(4, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(7, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(14, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(4, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(34, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(21, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(16, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(5, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(4, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(5, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(7, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(4, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(4, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|████████████████████████████████████████████████████████████████████████████| 1469/1469 [00:00<00:00, 3675.90it/s]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"(1, 312)\n",
"(312,)\n",
"(5, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(4, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(5, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(4, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(5, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(5, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(4, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(5, 312)\n",
"(312,)\n",
"(5, 312)\n",
"(312,)\n",
"(6, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(4, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(5, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(4, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(14, 312)\n",
"(312,)\n",
"(4, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(4, 312)\n",
"(312,)\n",
"(27, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(17, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(4, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(15, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(6, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(6, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(5, 312)\n",
"(312,)\n",
"(4, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(5, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(6, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(7, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(7, 312)\n",
"(312,)\n",
"(4, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(29, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(5, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(5, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(27, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(5, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(14, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(5, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(7, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(19, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(4, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(18, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(7, 312)\n",
"(312,)\n",
"(4, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(5, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(4, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(5, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(9, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(6, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(4, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(4, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(3, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(4, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(1, 312)\n",
"(312,)\n",
"(2, 312)\n",
"(312,)\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"(1469, 312)\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n"
]
}
],
"source": [
"def create_indication2embedding_dict():\n",
" # Import toy dataset\n",
" toy_df = pd.read_pickle('data/toy_df.pkl')\n",
"\n",
" # Create list with all indications and encode each one into a 312-dimensional vector\n",
" all_indications = sorted(set(reduce(lambda x, y: x + y, toy_df['indications'].tolist()))) \n",
"\n",
" # Using 'nlpie/tiny-biobert', a smaller version of BioBERT\n",
" model = SentenceTransformer('nlpie/tiny-biobert')\n",
" embeddings = model.encode(all_indications, show_progress_bar=True)\n",
"\n",
" # Create dictionary mapping indications to embeddings\n",
" indication2embedding_dict = {}\n",
" for key, row in zip(all_indications, embeddings):\n",
" indication2embedding_dict[key] = row\n",
" pickle.dump(indication2embedding_dict, open('data/indication2embedding_dict.pkl', 'wb')) \n",
" \n",
" embedding = []\n",
" for indication_lst in tqdm(toy_df['indications'].tolist()):\n",
" vec = []\n",
" for indication in indication_lst:\n",
" vec.append(indication2embedding_dict[indication])\n",
" print(np.array(vec).shape) # DEBUG\n",
" vec = np.mean(np.array(vec), axis=0)\n",
" print(vec.shape) # DEBUG\n",
" embedding.append(vec)\n",
" print(np.array(embedding).shape)\n",
" \n",
" dict = zip(toy_df['nctid'], np.array(embedding))\n",
" nctid2disease_embedding_dict = {}\n",
" for key, row in zip(toy_df['nctid'], np.array(embedding)):\n",
" nctid2disease_embedding_dict[key] = row\n",
" pickle.dump(nctid2disease_embedding_dict, open('data/nctid2disease_embedding_dict.pkl', 'wb')) \n",
" \n",
"create_indication2embedding_dict()"
]
},
{
"cell_type": "markdown",
"id": "ec8eed52",
"metadata": {},
"source": [
"# Sponsor Embedding \n",
"### Create sponsor2embedding_dict using all-MiniLM-L6-v2"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "e8f7c027",
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "3eb038b50679442b9e95a04928db5170",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Batches: 0%| | 0/15 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"(459, 384)\n"
]
}
],
"source": [
"def create_sponsor2embedding_dict():\n",
" # Import toy dataset\n",
" toy_df = pd.read_pickle('data/toy_df.pkl')\n",
"\n",
" # Create list with all indications and encode each one into a 384-dimensional vector\n",
" all_sponsors = sorted(set(toy_df['lead_sponsor'].tolist())) \n",
"\n",
" # Using 'all-MiniLM-L6-v2', a pre-trained model with excellent performance and speed\n",
" model = SentenceTransformer('all-MiniLM-L6-v2')\n",
" embeddings = model.encode(all_sponsors, show_progress_bar=True)\n",
" print(embeddings.shape)\n",
"\n",
" # Create dictionary mapping indications to embeddings\n",
" sponsor2embedding_dict = {}\n",
" for key, row in zip(all_sponsors, embeddings):\n",
" sponsor2embedding_dict[key] = row\n",
" pickle.dump(sponsor2embedding_dict, open('data/sponsor2embedding_dict.pkl', 'wb'))\n",
" \n",
"create_sponsor2embedding_dict()"
]
},
{
"cell_type": "markdown",
"id": "40baac38",
"metadata": {},
"source": [
"# Protocol Embedding"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "f5757802",
"metadata": {},
"outputs": [],
"source": [
"# Helper functions to clean up protocols from https://github.com/futianfan/clinical-trial-outcome-prediction/blob/main/HINT/protocol_encode.py\n",
"def clean_protocol(protocol):\n",
" protocol = protocol.lower()\n",
" protocol_split = protocol.split('\\n')\n",
" filter_out_empty_fn = lambda x: len(x.strip())>0\n",
" strip_fn = lambda x:x.strip()\n",
" protocol_split = list(filter(filter_out_empty_fn, protocol_split))\n",
" protocol_split = list(map(strip_fn, protocol_split))\n",
" return protocol_split \n",
"\n",
"def split_protocol(protocol):\n",
" protocol_split = clean_protocol(protocol)\n",
" inclusion_idx, exclusion_idx = len(protocol_split), len(protocol_split)\n",
" for idx, sentence in enumerate(protocol_split):\n",
" if \"inclusion\" in sentence:\n",
" inclusion_idx = idx\n",
" break\n",
" for idx, sentence in enumerate(protocol_split):\n",
" if \"exclusion\" in sentence:\n",
" exclusion_idx = idx \n",
" break \t\t\n",
" if inclusion_idx + 1 < exclusion_idx + 1 < len(protocol_split):\n",
" inclusion_criteria = protocol_split[inclusion_idx:exclusion_idx]\n",
" exclusion_criteria = protocol_split[exclusion_idx:]\n",
" if not (len(inclusion_criteria) > 0 and len(exclusion_criteria) > 0):\n",
" print(len(inclusion_criteria), len(exclusion_criteria), len(protocol_split))\n",
" exit()\n",
" return inclusion_criteria, exclusion_criteria ## list, list \n",
" else:\n",
" return protocol_split, "
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "b4c611d3",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(['inclusion criteria:',\n",
" '-',\n",
" 'patients must have:',\n",
" 'unipolar major depression (per diagnostic and statistical manuel-iv criteria) with or',\n",
" 'without melancholia.'],\n",
" ['exclusion criteria:',\n",
" '-',\n",
" 'patients with the following symptoms or conditions are excluded:',\n",
" 'psychotic or atypical subtype of unipolar major depression.'])"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Example of clean-up functions\n",
"# Import toy dataset\n",
"toy_df = pd.read_pickle('data/toy_df.pkl')\n",
"# split_protocol() cleans and splits web-scraped criteria into lists of inclusion and exclusion criteria\n",
"split_protocol(toy_df['criteria'][0])"
]
},
{
"cell_type": "markdown",
"id": "289f0ae8",
"metadata": {},
"source": [
"### Create nctid2protocol_embedding_dict using nlpie/tiny-biobert"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6d584a3e-a0ec-4804-93b8-164d9ab21f25",
"metadata": {
"jp-MarkdownHeadingCollapsed": true
},
"outputs": [],
"source": [
"def create_nctid2protocol_embedding_dict():\n",
" # Import toy dataset\n",
" toy_df = pd.read_pickle('data/toy_df.pkl')\n",
" \n",
" # Using 'nlpie/tiny-biobert', a smaller version of BioBERT\n",
" model = SentenceTransformer('nlpie/tiny-biobert')\n",
" \n",
" def criteria2vec(criteria):\n",
" embeddings = model.encode(criteria)\n",
"# print(embeddings.shape) # DEBUG\n",
" embeddings_avg = np.mean(embeddings, axis=0)\n",
"# print(embeddings_avg.shape) # DEBUG\n",
" return embeddings_avg\n",
" \n",
" nctid_2_protocol_embedding = dict()\n",
" print(f\"Embedding {len(toy_df)*2} inclusion/exclusion criteria..\")\n",
" for nctid, protocol in tqdm(zip(toy_df['nctid'].tolist(), toy_df['criteria'].tolist())): \n",
"# if(nctid == 'NCT00003567'): break #DEBUG\n",
" split = split_protocol(protocol)\n",
" if len(split)==2:\n",
" embedding = np.concatenate((criteria2vec(split[0]), criteria2vec(split[1])))\n",
" else: \n",
" embedding = np.concatenate((criteria2vec(split[0]), np.zeros(312)))\n",
" nctid_2_protocol_embedding[nctid] = embedding\n",
"# for key in nctid_2_protocol_embedding: #DEBUG\n",
"# print(f\"{key}:{nctid_2_protocol_embedding[key].shape}\") #DEBUG\n",
" pickle.dump(nctid_2_protocol_embedding, open('data/nctid_2_protocol_embedding_dict.pkl', 'wb')) \n",
" return \n",
"\n",
"create_nctid2protocol_embedding_dict()"
]
},
{
"cell_type": "markdown",
"id": "f8f7abe3-5b22-4e84-b1d9-f9b5284fc4a4",
"metadata": {},
"source": [
"# Drug molecule embedding\n",
"### Converting drug names to their SMILES representation"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "4e4ec86b-2024-44bf-8482-5d24763315f9",
"metadata": {
"jp-MarkdownHeadingCollapsed": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Drug Name: aspirin\n",
"SMILES: CC(=O)Oc1ccccc1C(O)=O\n"
]
},
{
"data": {
"text/plain": [
"'CC(=O)Oc1ccccc1C(O)=O'"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import requests\n",
"\n",
"def get_smiles(drug_name):\n",
" # URL for the CIR API\n",
" base_url = \"https://cactus.nci.nih.gov/chemical/structure\"\n",
" url = f\"{base_url}/{drug_name}/smiles\"\n",
" \n",
" try:\n",
" # Send a GET request to retrieve the SMILES representation\n",
" response = requests.get(url)\n",
" \n",
" if response.status_code == 200:\n",
" smiles = response.text.strip() # Get the SMILES string\n",
" print(f\"Drug Name: {drug_name}\")\n",
" print(f\"SMILES: {smiles}\")\n",
" else:\n",
" print(f\"Failed to retrieve SMILES for {drug_name}. Status code: {response.status_code}\")\n",
" smiles = ''\n",
" \n",
" except requests.exceptions.RequestException as e:\n",
" print(f\"An error occurred: {e}\")\n",
"\n",
" return smiles\n",
"\n",
"# Define the drug name you want to convert\n",
"drug_name = \"aspirin\" # Replace with the drug name of your choice\n",
"get_smiles(drug_name)"
]
},
{
"cell_type": "markdown",
"id": "1e28e892-c63e-4b89-a9f4-46e7f54fe9a0",
"metadata": {},
"source": [
"### Create drug2smiles_dict"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c7045a68-e1b4-4bf9-b1c0-4c61306bcdcd",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"from functools import reduce\n",
"\n",
"# Import toy dataset\n",
"toy_df = pd.read_pickle('data/toy_df.pkl')\n",
"\n",
"# Create list with all drugs and encode each one into its SMILES representation\n",
"all_drugs = sorted(set(reduce(lambda x, y: x + y, toy_df['drug_interventions'].tolist()))) \n",
"\n",
"# Create dictionary mapping indications to embeddings\n",
"drug2smiles_dict = {}\n",
"for drug in all_drugs:\n",
" drug2smiles_dict[drug] = get_smiles(drug)\n",
"pickle.dump(drug2smiles_dict, open('data/drug2smiles_dict.pkl', 'wb')) "
]
},
{
"cell_type": "markdown",
"id": "403e4d9f-2a09-4999-a072-8271933006e7",
"metadata": {},
"source": [
"### Converting SMILES to Morgan Fingerprint"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1541bc23-3c7c-45be-b764-0db5398fb0a8",
"metadata": {},
"outputs": [],
"source": [
"!pip install DeepPurpose"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5c7a4897-3f2a-4d74-8607-355962f77df2",
"metadata": {},
"outputs": [],
"source": [
"from DeepPurpose.utils import encode_drug \n",
"import pandas as pd\n",
"\n",
"# Example list of SMILES strings representing drug molecules\n",
"smiles_list = pd.DataFrame(['O=C(C)Oc1ccccc1C(=O)O', 'CC(CC1=CC=CC=C1)C(=O)O', 'CN1CCN(CC1)C2=C(C=CC(=C2)OC)OC'], columns=['SMILES'])\n",
"\n",
"# Encode the drug molecules\n",
"drug_encodings = encode_drug(smiles_list, drug_encoding='Morgan', column_name = 'SMILES', save_column_name = 'drug_encoding')\n",
"\n",
"# Print the encoded representations\n",
"for x in drug_encodings['drug_encoding']:\n",
" print(x.shape)\n",
"\n",
"drug_encodings.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6b03ebcc-bcdc-4d92-800b-a38935a6ab45",
"metadata": {},
"outputs": [],
"source": [
"# Helper function to clean up protocols from https://github.com/futianfan/clinical-trial-outcome-prediction/blob/main/HINT/protocol_encode.py\n",
"def txt_to_lst(text):\n",
" \"\"\"\n",
" \"['CN[C@H]1CC[C@@H](C2=CC(Cl)=C(Cl)C=C2)C2=CC=CC=C12', 'CNCCC=C1C2=CC=CC=C2CCC2=CC=CC=C12']\" \n",
" \"\"\"\n",
" text = text[1:-1]\n",
" lst = [i.strip()[1:-1] for i in text.split(',')]\n",
" return lst \n",
"\n",
"def create_smiles2morgan_dict():\n",
" from DeepPurpose.utils import smiles2morgan \n",
"\n",
" # Import toy dataset\n",
" toy_df = pd.read_csv('data/toy_df.csv')\n",
" \n",
" smiles_lst = list(map(txt_to_lst, toy_df['smiless'].tolist()))\n",
" unique_smiles = set(reduce(lambda x, y: x + y, smiles_lst))\n",
" \n",
" morgan = pd.Series(list(unique_smiles)).apply(smiles2morgan)\n",
" smiles2morgan_dict = dict(zip(unique_smiles, morgan))\n",
" pickle.dump(smiles2morgan_dict, open('data/smiles2morgan_dict.pkl', 'wb')) \n",
"\n",
"create_smiles2morgan_dict()\n",
"\n",
"def load_smiles2morgan_dict():\n",
" with open('data/smiles2morgan_dict.pkl', 'rb') as pickle_file:\n",
" return pickle.load(pickle_file)"
]
},
{
"cell_type": "markdown",
"id": "f0005502-efaa-4008-8bc0-bd2f19bb6b2c",
"metadata": {},
"source": [
"### Create nctid2molecule_embedding_dict"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "22cbd6ae-3ce2-4ce7-8841-e27332919079",
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"from tqdm import tqdm\n",
"\n",
"def create_nctid2molecule_embedding_dict():\n",
" # Import toy dataset\n",
" toy_df = pd.read_csv('data/toy_df.csv')\n",
" smiles_lst = list(map(txt_to_lst, toy_df['smiless'].tolist()))\n",
" smiles2morgan_dict = load_smiles2morgan_dict()\n",
" \n",
" embedding = []\n",
" for drugs in tqdm(smiles_lst):\n",
" vec = []\n",
" for drug in drugs:\n",
" vec.append(smiles2morgan_dict[drug])\n",
" # print(np.array(vec).shape) # DEBUG\n",
" vec = np.mean(np.array(vec), axis=0)\n",
" # print(vec.shape) # DEBUG\n",
" embedding.append(vec)\n",
" print(np.array(embedding).shape)\n",
" \n",
" dict = zip(toy_df['nctid'], np.array(embedding))\n",
" nctid2molecule_embedding_dict = {}\n",
" for key, row in zip(toy_df['nctid'], np.array(embedding)):\n",
" nctid2molecule_embedding_dict[key] = row\n",
" pickle.dump(nctid2molecule_embedding_dict, open('data/nctid2molecule_embedding_dict.pkl', 'wb')) \n",
"\n",
"create_nctid2molecule_embedding_dict()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}