[fefe56]: / clinical_trial_embedding_tutorial.ipynb

Download this file

3822 lines (3821 with data), 89.5 kB

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "44dece74-3ef8-411c-abfc-53f43efe531d",
   "metadata": {},
   "source": [
    "# Clinical Trial Embedding Tutorial\n",
    "\n",
    "In this tutorial I will show you how to obtain clinical trial information and use embeddings for different types of clinical trial data.\n",
    "\n",
    "Agenda:\n",
    "- Collect all clinical trial records from clinicaltrials.gov\n",
    "- Read and parse the obtained XML files \n",
    "- Embed **disease indications** using 'nlpie/tiny-biobert', a compact version of BioBERT\n",
    "- Embed clinical trial **inclusion-/exclusion criteria** using 'nlpie/tiny-biobert', a compact version of BioBERT\n",
    "- Embed **sponsor information** using 'all-MiniLM-L6-v2', a powerful pre-trained sentence encoder\n",
    "- Convert **drug names** to their SMILES representation and then to their Morgan fingerprint\n",
    "\n",
    "Let's start!"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f4d7eab1-02de-4df6-a2a3-330fbb021bc4",
   "metadata": {},
   "source": [
    "# Import libraries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "3a40705f-1f55-4295-8681-dde080015885",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "from tqdm import tqdm\n",
    "import pickle\n",
    "from functools import reduce"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4e920522-40f7-4425-a5e1-684b6598ce7e",
   "metadata": {},
   "source": [
    "# Collect all the clinical trial records"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fdfc28e2-ce4b-411c-8fc7-ea214be329b8",
   "metadata": {},
   "source": [
    "I suggest running the whole process in the command line since it is time- and space consuming. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9ebfecbe-a157-4c9d-b697-0161700150fb",
   "metadata": {},
   "source": [
    "### 1. Download data\n",
    "mkdir -p raw_data \\\n",
    "cd raw_data \\\n",
    "wget https://clinicaltrials.gov/AllPublicXML.zip # This will take 10-20 minutes to download\n",
    "\n",
    "### 2. Unzip the ZIP file.\n",
    "### The unzipped file occupies approximately 11 GB. Please make sure you have enough space. \n",
    "unzip AllPublicXML.zip # This might take over an hour to run, depending on your system \\\n",
    "cd ../\n",
    "\n",
    "### 3. Collect and sort all the XML files\n",
    "find raw_data/ -name NCT*.xml | sort > data/all_xml \\\n",
    "head -3 data/all_xml\n",
    "\n",
    "### Output:\n",
    "raw_data/NCT0000xxxx/NCT00000102.xml \\\n",
    "raw_data/NCT0000xxxx/NCT00000104.xml \\\n",
    "raw_data/NCT0000xxxx/NCT00000105.xml \n",
    "\n",
    "NCTID is the identifier of a clinical trial. `NCT00000102`, `NCT00000104`, `NCT00000105` are all NCTIDs. \n",
    "\n",
    "### 4. Remove ZIP file to recover some disk space\n",
    "rm raw_data/AllPublicXML.zip"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4eabf5a9-570f-4c42-a378-0f11ffcea651",
   "metadata": {},
   "source": [
    "# Parse XML clinical trial files"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "7f8ffad7-d3cb-46d9-a2c8-0d422dd90fa5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>nctid</th>\n",
       "      <th>study_type</th>\n",
       "      <th>drug_interventions</th>\n",
       "      <th>overall_status</th>\n",
       "      <th>why_stopped</th>\n",
       "      <th>phase</th>\n",
       "      <th>indications</th>\n",
       "      <th>criteria</th>\n",
       "      <th>enrollment</th>\n",
       "      <th>lead_sponsor</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>NCT00040014</td>\n",
       "      <td>Interventional</td>\n",
       "      <td>[exemestane]</td>\n",
       "      <td>Terminated</td>\n",
       "      <td></td>\n",
       "      <td>Phase 2</td>\n",
       "      <td>[Breast Neoplasms]</td>\n",
       "      <td>\\n        Inclusion Criteria:\\r\\n\\r\\n         ...</td>\n",
       "      <td>100</td>\n",
       "      <td>Pfizer</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         nctid      study_type drug_interventions overall_status why_stopped  \\\n",
       "0  NCT00040014  Interventional       [exemestane]     Terminated               \n",
       "\n",
       "     phase         indications  \\\n",
       "0  Phase 2  [Breast Neoplasms]   \n",
       "\n",
       "                                            criteria enrollment lead_sponsor  \n",
       "0  \\n        Inclusion Criteria:\\r\\n\\r\\n         ...        100       Pfizer  "
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from xml.etree import ElementTree as ET\n",
    "# function adapted from https://github.com/futianfan/clinical-trial-outcome-prediction\n",
    "def xmlfile2results(xml_file):\n",
    "    tree = ET.parse(xml_file)\n",
    "    root = tree.getroot()\n",
    "    nctid = root.find('id_info').find('nct_id').text\t### nctid: 'NCT00000102'\n",
    "    # print(\"nctid is\", nctid)\n",
    "    study_type = root.find('study_type').text\n",
    "    # print(\"study type is\", study_type)\n",
    "    interventions = [i for i in root.findall('intervention')]\n",
    "    drug_interventions = [i.find('intervention_name').text for i in interventions \\\n",
    "\t\t\t\t\t\t\t\t\t\t\t\t\t\tif i.find('intervention_type').text=='Drug']\n",
    "    # print(\"drug intervention:\", drug_interventions)\n",
    "    ### remove 'biologics', \n",
    "    ### non-interventions \n",
    "    if len(drug_interventions)==0:\n",
    "        return (None,)\n",
    "\n",
    "    try:\n",
    "        status = root.find('overall_status').text \n",
    "        # print(\"status:\", status)\n",
    "    except:\n",
    "        status = ''\n",
    "\n",
    "    try:\n",
    "        why_stop = root.find('why_stopped').text\n",
    "        # print(\"why stop:\", why_stop)\n",
    "    except:\n",
    "        why_stop = ''\n",
    "\n",
    "    try:\n",
    "        phase = root.find('phase').text\n",
    "        # print(\"phase:\", phase)\n",
    "    except:\n",
    "        phase = ''\n",
    "    conditions = [i.text for i in root.findall('condition')] ### disease \n",
    "    # print(\"disease\", conditions)\n",
    "\n",
    "    try:\n",
    "        criteria = root.find('eligibility').find('criteria').find('textblock').text\n",
    "        # print('found criteria')\n",
    "    except:\n",
    "        criteria = ''\n",
    "\n",
    "    try:\n",
    "        enrollment = root.find('enrollment').text\n",
    "        # print(\"enrollment:\", enrollment)\n",
    "    except:\n",
    "        enrollment = ''\n",
    "\n",
    "    try:\n",
    "        lead_sponsor = root.find('sponsors').find('lead_sponsor').find('agency').text \n",
    "        # print(\"lead_sponsor:\", lead_sponsor)\n",
    "    except:\n",
    "        lead_sponsor = ''\n",
    "\n",
    "    data = {'nctid':nctid,\n",
    "           'study_type':study_type,\n",
    "           'drug_interventions':[drug_interventions],\n",
    "           'overall_status':status,\n",
    "           'why_stopped':why_stop,\n",
    "           'phase':phase,\n",
    "           'indications':[conditions],\n",
    "           'criteria':criteria,\n",
    "           'enrollment':enrollment,\n",
    "           'lead_sponsor':lead_sponsor}\n",
    "    return pd.DataFrame(data)\n",
    "    \n",
    "xmlfile = \"data/NCT00040014.xml\"\n",
    "df = xmlfile2results(xmlfile)\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "96508ffb",
   "metadata": {},
   "outputs": [],
   "source": [
    "# We will only use a limited selection of trials, the same as the HINT paper (https://www.cell.com/patterns/pdf/S2666-3899(22)00018-6.pdf)\n",
    "# This way, we can later on compare performaces of the clinical trial outcome prediction\n",
    "df_selected = pd.read_pickle('data/selected_trials_df.pkl')\n",
    "toy_nctids = df_selected[df_selected['dataset']=='toy']['nctid'].tolist()\n",
    "toy_df = pd.DataFrame()\n",
    "\n",
    "#Parse the XML file for each selected trial and save resulting dataframe\n",
    "for nctid in tqdm(toy_nctids):\n",
    "    try:\n",
    "        xml_file = 'raw_data/'+nctid[:7]+'xxxx/'+nctid+'.xml'\n",
    "        df = xmlfile2results(xml_file)\n",
    "        toy_df = pd.concat([toy_df, df], axis=0)  \n",
    "    except FileNotFoundError:\n",
    "        print(f\"The file {file} does not exist.\")\n",
    "        continue\n",
    "\n",
    "toy_df = toy_df.merge(df_selected[df_selected['dataset']=='toy'], on='nctid', how='left')\n",
    "pickle.dump(toy_df, open('data/toy_df.pkl', 'wb'))  \n",
    "toy_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2bb3b091-fdaa-4da2-8eee-5c907086906a",
   "metadata": {},
   "source": [
    "# Using sentence-transformers to embed information - Example"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3eccb526-ab65-4a82-8d5d-d69548cecd8b",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install -U sentence-transformers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "id": "b9d84339",
   "metadata": {},
   "outputs": [],
   "source": [
    "from sentence_transformers import SentenceTransformer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "f4b23bd8-a8ad-4673-b54a-4066703f71f4",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "No sentence-transformers model found with name C:\\Users\\Lennart/.cache\\torch\\sentence_transformers\\nlpie_tiny-biobert. Creating a new one with MEAN pooling.\n",
      "Some weights of BertModel were not initialized from the model checkpoint at C:\\Users\\Lennart/.cache\\torch\\sentence_transformers\\nlpie_tiny-biobert and are newly initialized: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']\n",
      "You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(2, 312)\n"
     ]
    }
   ],
   "source": [
    "sentences = [\"This is an example sentence\", \"Each sentence is converted\"]\n",
    "model = SentenceTransformer('all-MiniLM-L6-v2')\n",
    "embeddings = model.encode(sentences)\n",
    "print(embeddings.shape)\n",
    "print(embeddings)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4fc0db9c",
   "metadata": {},
   "source": [
    "# Indication Embedding \n",
    "### Create indication2embedding_dict using nlpie/tiny-biobert"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "id": "492df441",
   "metadata": {
    "collapsed": true,
    "jupyter": {
     "outputs_hidden": true
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "No sentence-transformers model found with name C:\\Users\\Lennart/.cache\\torch\\sentence_transformers\\nlpie_tiny-biobert. Creating a new one with MEAN pooling.\n",
      "Some weights of BertModel were not initialized from the model checkpoint at C:\\Users\\Lennart/.cache\\torch\\sentence_transformers\\nlpie_tiny-biobert and are newly initialized: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']\n",
      "You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "7eb382096c5c49b1be8d593dc159104f",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Batches:   0%|          | 0/43 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      " 57%|███████████████████████████████████████████▌                                 | 831/1469 [00:00<00:00, 3910.09it/s]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(2, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(9, 312)\n",
      "(312,)\n",
      "(5, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(8, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(5, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(5, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(4, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(4, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(5, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(5, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(6, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(7, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(4, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(5, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(4, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(6, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(5, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(4, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(4, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(4, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(7, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(4, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(4, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(4, 312)\n",
      "(312,)\n",
      "(4, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(8, 312)\n",
      "(312,)\n",
      "(4, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(4, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(6, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(4, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(5, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(5, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(4, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(8, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(10, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(4, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(5, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(6, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(4, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(4, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(5, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(4, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(4, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(6, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(4, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(6, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(4, 312)\n",
      "(312,)\n",
      "(8, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(4, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(8, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(4, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(9, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(4, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(10, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(5, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(5, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(4, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(5, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(4, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(4, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(7, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(14, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(4, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(34, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(21, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(16, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(5, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(4, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(5, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(7, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(4, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(4, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|████████████████████████████████████████████████████████████████████████████| 1469/1469 [00:00<00:00, 3675.90it/s]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(1, 312)\n",
      "(312,)\n",
      "(5, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(4, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(5, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(4, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(5, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(5, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(4, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(5, 312)\n",
      "(312,)\n",
      "(5, 312)\n",
      "(312,)\n",
      "(6, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(4, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(5, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(4, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(14, 312)\n",
      "(312,)\n",
      "(4, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(4, 312)\n",
      "(312,)\n",
      "(27, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(17, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(4, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(15, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(6, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(6, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(5, 312)\n",
      "(312,)\n",
      "(4, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(5, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(6, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(7, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(7, 312)\n",
      "(312,)\n",
      "(4, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(29, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(5, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(5, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(27, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(5, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(14, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(5, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(7, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(19, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(4, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(18, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(7, 312)\n",
      "(312,)\n",
      "(4, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(5, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(4, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(5, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(9, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(6, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(4, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(4, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(3, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(4, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(1, 312)\n",
      "(312,)\n",
      "(2, 312)\n",
      "(312,)\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(1469, 312)\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    }
   ],
   "source": [
    "def create_indication2embedding_dict():\n",
    "    # Import toy dataset\n",
    "    toy_df = pd.read_pickle('data/toy_df.pkl')\n",
    "\n",
    "    # Create list with all indications and encode each one into a 312-dimensional vector\n",
    "    all_indications = sorted(set(reduce(lambda x, y: x + y, toy_df['indications'].tolist())))     \n",
    "\n",
    "    # Using 'nlpie/tiny-biobert', a smaller version of BioBERT\n",
    "    model = SentenceTransformer('nlpie/tiny-biobert')\n",
    "    embeddings = model.encode(all_indications, show_progress_bar=True)\n",
    "\n",
    "    # Create dictionary mapping indications to embeddings\n",
    "    indication2embedding_dict = {}\n",
    "    for key, row in zip(all_indications, embeddings):\n",
    "        indication2embedding_dict[key] = row\n",
    "    pickle.dump(indication2embedding_dict, open('data/indication2embedding_dict.pkl', 'wb')) \n",
    "        \n",
    "    embedding = []\n",
    "    for indication_lst in tqdm(toy_df['indications'].tolist()):\n",
    "        vec = []\n",
    "        for indication in indication_lst:\n",
    "            vec.append(indication2embedding_dict[indication])\n",
    "        print(np.array(vec).shape) # DEBUG\n",
    "        vec = np.mean(np.array(vec), axis=0)\n",
    "        print(vec.shape) # DEBUG\n",
    "        embedding.append(vec)\n",
    "    print(np.array(embedding).shape)\n",
    "    \n",
    "    dict = zip(toy_df['nctid'], np.array(embedding))\n",
    "    nctid2disease_embedding_dict = {}\n",
    "    for key, row in zip(toy_df['nctid'], np.array(embedding)):\n",
    "        nctid2disease_embedding_dict[key] = row\n",
    "    pickle.dump(nctid2disease_embedding_dict, open('data/nctid2disease_embedding_dict.pkl', 'wb'))  \n",
    "    \n",
    "create_indication2embedding_dict()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ec8eed52",
   "metadata": {},
   "source": [
    "# Sponsor Embedding \n",
    "### Create sponsor2embedding_dict using all-MiniLM-L6-v2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "e8f7c027",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "3eb038b50679442b9e95a04928db5170",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Batches:   0%|          | 0/15 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(459, 384)\n"
     ]
    }
   ],
   "source": [
    "def create_sponsor2embedding_dict():\n",
    "    # Import toy dataset\n",
    "    toy_df = pd.read_pickle('data/toy_df.pkl')\n",
    "\n",
    "    # Create list with all indications and encode each one into a 384-dimensional vector\n",
    "    all_sponsors = sorted(set(toy_df['lead_sponsor'].tolist()))     \n",
    "\n",
    "    # Using 'all-MiniLM-L6-v2', a pre-trained model with excellent performance and speed\n",
    "    model = SentenceTransformer('all-MiniLM-L6-v2')\n",
    "    embeddings = model.encode(all_sponsors, show_progress_bar=True)\n",
    "    print(embeddings.shape)\n",
    "\n",
    "    # Create dictionary mapping indications to embeddings\n",
    "    sponsor2embedding_dict = {}\n",
    "    for key, row in zip(all_sponsors, embeddings):\n",
    "        sponsor2embedding_dict[key] = row\n",
    "    pickle.dump(sponsor2embedding_dict, open('data/sponsor2embedding_dict.pkl', 'wb'))\n",
    "    \n",
    "create_sponsor2embedding_dict()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "40baac38",
   "metadata": {},
   "source": [
    "# Protocol Embedding"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "f5757802",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Helper functions to clean up protocols from https://github.com/futianfan/clinical-trial-outcome-prediction/blob/main/HINT/protocol_encode.py\n",
    "def clean_protocol(protocol):\n",
    "    protocol = protocol.lower()\n",
    "    protocol_split = protocol.split('\\n')\n",
    "    filter_out_empty_fn = lambda x: len(x.strip())>0\n",
    "    strip_fn = lambda x:x.strip()\n",
    "    protocol_split = list(filter(filter_out_empty_fn, protocol_split))\n",
    "    protocol_split = list(map(strip_fn, protocol_split))\n",
    "    return protocol_split \n",
    "\n",
    "def split_protocol(protocol):\n",
    "    protocol_split = clean_protocol(protocol)\n",
    "    inclusion_idx, exclusion_idx = len(protocol_split), len(protocol_split)\n",
    "    for idx, sentence in enumerate(protocol_split):\n",
    "        if \"inclusion\" in sentence:\n",
    "            inclusion_idx = idx\n",
    "            break\n",
    "    for idx, sentence in enumerate(protocol_split):\n",
    "        if \"exclusion\" in sentence:\n",
    "            exclusion_idx = idx \n",
    "            break \t\t\n",
    "    if inclusion_idx + 1 < exclusion_idx + 1 < len(protocol_split):\n",
    "        inclusion_criteria = protocol_split[inclusion_idx:exclusion_idx]\n",
    "        exclusion_criteria = protocol_split[exclusion_idx:]\n",
    "        if not (len(inclusion_criteria) > 0 and len(exclusion_criteria) > 0):\n",
    "            print(len(inclusion_criteria), len(exclusion_criteria), len(protocol_split))\n",
    "            exit()\n",
    "        return inclusion_criteria, exclusion_criteria ## list, list \n",
    "    else:\n",
    "        return protocol_split, "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "b4c611d3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(['inclusion criteria:',\n",
       "  '-',\n",
       "  'patients must have:',\n",
       "  'unipolar major depression (per diagnostic and statistical manuel-iv criteria) with or',\n",
       "  'without melancholia.'],\n",
       " ['exclusion criteria:',\n",
       "  '-',\n",
       "  'patients with the following symptoms or conditions are excluded:',\n",
       "  'psychotic or atypical subtype of unipolar major depression.'])"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Example of clean-up functions\n",
    "# Import toy dataset\n",
    "toy_df = pd.read_pickle('data/toy_df.pkl')\n",
    "# split_protocol() cleans and splits web-scraped criteria into lists of inclusion and exclusion criteria\n",
    "split_protocol(toy_df['criteria'][0])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "289f0ae8",
   "metadata": {},
   "source": [
    "### Create nctid2protocol_embedding_dict using nlpie/tiny-biobert"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6d584a3e-a0ec-4804-93b8-164d9ab21f25",
   "metadata": {
    "jp-MarkdownHeadingCollapsed": true
   },
   "outputs": [],
   "source": [
    "def create_nctid2protocol_embedding_dict():\n",
    "     # Import toy dataset\n",
    "    toy_df = pd.read_pickle('data/toy_df.pkl')\n",
    "    \n",
    "    # Using 'nlpie/tiny-biobert', a smaller version of BioBERT\n",
    "    model = SentenceTransformer('nlpie/tiny-biobert')\n",
    "    \n",
    "    def criteria2vec(criteria):\n",
    "        embeddings = model.encode(criteria)\n",
    "#         print(embeddings.shape) # DEBUG\n",
    "        embeddings_avg = np.mean(embeddings, axis=0)\n",
    "#         print(embeddings_avg.shape) # DEBUG\n",
    "        return embeddings_avg\n",
    "    \n",
    "    nctid_2_protocol_embedding = dict()\n",
    "    print(f\"Embedding {len(toy_df)*2} inclusion/exclusion criteria..\")\n",
    "    for nctid, protocol in tqdm(zip(toy_df['nctid'].tolist(), toy_df['criteria'].tolist())):    \n",
    "#         if(nctid == 'NCT00003567'): break #DEBUG\n",
    "        split = split_protocol(protocol)\n",
    "        if len(split)==2:\n",
    "            embedding = np.concatenate((criteria2vec(split[0]), criteria2vec(split[1])))\n",
    "        else: \n",
    "            embedding = np.concatenate((criteria2vec(split[0]), np.zeros(312)))\n",
    "        nctid_2_protocol_embedding[nctid] = embedding\n",
    "#         for key in nctid_2_protocol_embedding: #DEBUG\n",
    "#             print(f\"{key}:{nctid_2_protocol_embedding[key].shape}\") #DEBUG\n",
    "    pickle.dump(nctid_2_protocol_embedding, open('data/nctid_2_protocol_embedding_dict.pkl', 'wb'))   \n",
    "    return \n",
    "\n",
    "create_nctid2protocol_embedding_dict()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f8f7abe3-5b22-4e84-b1d9-f9b5284fc4a4",
   "metadata": {},
   "source": [
    "# Drug molecule embedding\n",
    "### Converting drug names to their SMILES representation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "4e4ec86b-2024-44bf-8482-5d24763315f9",
   "metadata": {
    "jp-MarkdownHeadingCollapsed": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Drug Name: aspirin\n",
      "SMILES: CC(=O)Oc1ccccc1C(O)=O\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "'CC(=O)Oc1ccccc1C(O)=O'"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import requests\n",
    "\n",
    "def get_smiles(drug_name):\n",
    "    # URL for the CIR API\n",
    "    base_url = \"https://cactus.nci.nih.gov/chemical/structure\"\n",
    "    url = f\"{base_url}/{drug_name}/smiles\"\n",
    "    \n",
    "    try:\n",
    "        # Send a GET request to retrieve the SMILES representation\n",
    "        response = requests.get(url)\n",
    "    \n",
    "        if response.status_code == 200:\n",
    "            smiles = response.text.strip()  # Get the SMILES string\n",
    "            print(f\"Drug Name: {drug_name}\")\n",
    "            print(f\"SMILES: {smiles}\")\n",
    "        else:\n",
    "            print(f\"Failed to retrieve SMILES for {drug_name}. Status code: {response.status_code}\")\n",
    "            smiles = ''\n",
    "    \n",
    "    except requests.exceptions.RequestException as e:\n",
    "        print(f\"An error occurred: {e}\")\n",
    "\n",
    "    return smiles\n",
    "\n",
    "# Define the drug name you want to convert\n",
    "drug_name = \"aspirin\"  # Replace with the drug name of your choice\n",
    "get_smiles(drug_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1e28e892-c63e-4b89-a9f4-46e7f54fe9a0",
   "metadata": {},
   "source": [
    "### Create drug2smiles_dict"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c7045a68-e1b4-4bf9-b1c0-4c61306bcdcd",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "from functools import reduce\n",
    "\n",
    "# Import toy dataset\n",
    "toy_df = pd.read_pickle('data/toy_df.pkl')\n",
    "\n",
    "# Create list with all drugs and encode each one into its SMILES representation\n",
    "all_drugs = sorted(set(reduce(lambda x, y: x + y, toy_df['drug_interventions'].tolist())))     \n",
    "\n",
    "# Create dictionary mapping indications to embeddings\n",
    "drug2smiles_dict = {}\n",
    "for drug in all_drugs:\n",
    "    drug2smiles_dict[drug] = get_smiles(drug)\n",
    "pickle.dump(drug2smiles_dict, open('data/drug2smiles_dict.pkl', 'wb')) "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "403e4d9f-2a09-4999-a072-8271933006e7",
   "metadata": {},
   "source": [
    "### Converting SMILES to Morgan Fingerprint"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1541bc23-3c7c-45be-b764-0db5398fb0a8",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install DeepPurpose"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5c7a4897-3f2a-4d74-8607-355962f77df2",
   "metadata": {},
   "outputs": [],
   "source": [
    "from DeepPurpose.utils import encode_drug \n",
    "import pandas as pd\n",
    "\n",
    "# Example list of SMILES strings representing drug molecules\n",
    "smiles_list = pd.DataFrame(['O=C(C)Oc1ccccc1C(=O)O', 'CC(CC1=CC=CC=C1)C(=O)O', 'CN1CCN(CC1)C2=C(C=CC(=C2)OC)OC'], columns=['SMILES'])\n",
    "\n",
    "# Encode the drug molecules\n",
    "drug_encodings = encode_drug(smiles_list, drug_encoding='Morgan', column_name = 'SMILES', save_column_name = 'drug_encoding')\n",
    "\n",
    "# Print the encoded representations\n",
    "for x in drug_encodings['drug_encoding']:\n",
    "    print(x.shape)\n",
    "\n",
    "drug_encodings.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6b03ebcc-bcdc-4d92-800b-a38935a6ab45",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Helper function to clean up protocols from https://github.com/futianfan/clinical-trial-outcome-prediction/blob/main/HINT/protocol_encode.py\n",
    "def txt_to_lst(text):\n",
    "    \"\"\"\n",
    "        \"['CN[C@H]1CC[C@@H](C2=CC(Cl)=C(Cl)C=C2)C2=CC=CC=C12', 'CNCCC=C1C2=CC=CC=C2CCC2=CC=CC=C12']\" \n",
    "    \"\"\"\n",
    "    text = text[1:-1]\n",
    "    lst = [i.strip()[1:-1] for i in text.split(',')]\n",
    "    return lst \n",
    "\n",
    "def create_smiles2morgan_dict():\n",
    "    from DeepPurpose.utils import smiles2morgan \n",
    "\n",
    "    # Import toy dataset\n",
    "    toy_df = pd.read_csv('data/toy_df.csv')\n",
    "        \n",
    "    smiles_lst = list(map(txt_to_lst, toy_df['smiless'].tolist()))\n",
    "    unique_smiles = set(reduce(lambda x, y: x + y, smiles_lst))\n",
    "    \n",
    "    morgan = pd.Series(list(unique_smiles)).apply(smiles2morgan)\n",
    "    smiles2morgan_dict = dict(zip(unique_smiles, morgan))\n",
    "    pickle.dump(smiles2morgan_dict, open('data/smiles2morgan_dict.pkl', 'wb')) \n",
    "\n",
    "create_smiles2morgan_dict()\n",
    "\n",
    "def load_smiles2morgan_dict():\n",
    "    with open('data/smiles2morgan_dict.pkl', 'rb') as pickle_file:\n",
    "        return pickle.load(pickle_file)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f0005502-efaa-4008-8bc0-bd2f19bb6b2c",
   "metadata": {},
   "source": [
    "### Create nctid2molecule_embedding_dict"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "22cbd6ae-3ce2-4ce7-8841-e27332919079",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from tqdm import tqdm\n",
    "\n",
    "def create_nctid2molecule_embedding_dict():\n",
    "    # Import toy dataset\n",
    "    toy_df = pd.read_csv('data/toy_df.csv')\n",
    "    smiles_lst = list(map(txt_to_lst, toy_df['smiless'].tolist()))\n",
    "    smiles2morgan_dict = load_smiles2morgan_dict()\n",
    "    \n",
    "    embedding = []\n",
    "    for drugs in tqdm(smiles_lst):\n",
    "        vec = []\n",
    "        for drug in drugs:\n",
    "            vec.append(smiles2morgan_dict[drug])\n",
    "        # print(np.array(vec).shape) # DEBUG\n",
    "        vec = np.mean(np.array(vec), axis=0)\n",
    "        # print(vec.shape) # DEBUG\n",
    "        embedding.append(vec)\n",
    "    print(np.array(embedding).shape)\n",
    "    \n",
    "    dict = zip(toy_df['nctid'], np.array(embedding))\n",
    "    nctid2molecule_embedding_dict = {}\n",
    "    for key, row in zip(toy_df['nctid'], np.array(embedding)):\n",
    "        nctid2molecule_embedding_dict[key] = row\n",
    "    pickle.dump(nctid2molecule_embedding_dict, open('data/nctid2molecule_embedding_dict.pkl', 'wb'))  \n",
    "\n",
    "create_nctid2molecule_embedding_dict()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}