[480cb7]: / Data_aquisition_ChEMBL.ipynb

Download this file

4216 lines (4216 with data), 186.5 kB

{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/francescopatane96/Computer_aided_drug_discovery_kit/blob/main/Data_aquisition_ChEMBL.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "# 1. Compound data aquisition from ChEMBL database"
      ],
      "metadata": {
        "id": "_FxrSEOKBc7r"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "In this module, you will how to obtain compound data from the ChEMBL database for a molcelular target of interest. Data sets can be then used for many chemo-informatics tasks, eg. similarity search and clustering, or machine learning.\n",
        "\n",
        "In this notebook you will find compounds which were tested against a specific target and you will learn how to filter available bioactivity data."
      ],
      "metadata": {
        "id": "Oe1lkXvnZMPn"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Some theory concepts:\n",
        "1. ChEMBL is a manually curated database that contains bioactive molecules with drug-like characteristics. the relate web resource client can be used via Python.\n",
        "2. From it we can retrieve a lot of compound activity measures like as IC50 (half maximal inibitory concentration), pIC50 (negative lof of the IC50, to facilitate the comparison of relate values), EC50, etc..\n",
        "3. Those measures represents the information we need to create a system capable of predicting the likeliness of a molecule to be a candidate drug."
      ],
      "metadata": {
        "id": "pv3S4yULCjC4"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "Let's start\n",
        "\n",
        "## Install and import dependencies"
      ],
      "metadata": {
        "id": "5EbLmBkDBpGa"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "e106nlobq1s0"
      },
      "outputs": [],
      "source": [
        "# install dependencies\n",
        "!pip install chembl_webresource_client\n",
        "!pip install rdkit"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "metadata": {
        "id": "GheijY-dsL_O"
      },
      "outputs": [],
      "source": [
        "# import dependencies\n",
        "import pandas as pd\n",
        "import math\n",
        "import rdkit\n",
        "from tqdm.auto import tqdm\n",
        "from chembl_webresource_client.new_client import new_client\n",
        "from pandas import DataFrame\n",
        "import numpy as np\n",
        "from rdkit import Chem\n",
        "from rdkit.Chem import Descriptors, Lipinski, PandasTools\n",
        "import seaborn as sns\n",
        "import matplotlib.pyplot as plt\n",
        "from sklearn.model_selection import train_test_split\n",
        "from sklearn.ensemble import RandomForestRegressor\n",
        "from sklearn.feature_selection import VarianceThreshold\n",
        "from pathlib import Path\n",
        "from zipfile import ZipFile\n",
        "from tempfile import TemporaryDirectory"
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "#define local variables\n",
        "HERE = Path(_dh[-1])\n",
        "DATA = HERE / \"data\""
      ],
      "metadata": {
        "id": "1qRTJhkZcEu4"
      },
      "execution_count": 4,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Create the resource objects for API access"
      ],
      "metadata": {
        "id": "9kcRsLOUcyDt"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "targets_api = new_client.target\n",
        "compounds_api = new_client.molecule\n",
        "bioactivities_api = new_client.activity"
      ],
      "metadata": {
        "id": "Lit-Q2R8cPWG"
      },
      "execution_count": 5,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "type(targets_api)  #show the type of the object"
      ],
      "metadata": {
        "id": "XVU-T3BJcUg-",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "c446e264-bf50-43bc-e5b9-b943dd74086c"
      },
      "execution_count": 6,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "chembl_webresource_client.query_set.QuerySet"
            ]
          },
          "metadata": {},
          "execution_count": 6
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Obtain molecular target data\n",
        "\n",
        "Now you have to select an appropriate molecular target of interest. In these case we are going to choose as a target, the protein P00533 (UniProt ID)\n"
      ],
      "metadata": {
        "id": "tXQMUbdME_mz"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "uniprot_id = \"P00533\"    #change the uniprot ID for your project"
      ],
      "metadata": {
        "id": "qKpN49tuckve"
      },
      "execution_count": 13,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "Aquire target data from ChEMBL database"
      ],
      "metadata": {
        "id": "LXL1ThN-eG_X"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# Get target information from ChEMBL but restrict it to specified class only\n",
        "targets = targets_api.get(target_components__accession=uniprot_id).only(              ##variable that contains the results of the query\n",
        "    \"target_chembl_id\", \"organism\", \"pref_name\", \"target_type\"\n",
        ")\n",
        "print(f'The type of the targets is \"{type(targets)}\"')"
      ],
      "metadata": {
        "id": "4UKb5NCHeRHN",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "19c0c4f3-3d97-4945-8ce3-b00c67e2413a"
      },
      "execution_count": 14,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "The type of the targets is \"<class 'chembl_webresource_client.query_set.QuerySet'>\"\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Download target data from ChEMBL"
      ],
      "metadata": {
        "id": "sSItAfALerr9"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# use pandas to convert data to a dataframe\n",
        "targets = pd.DataFrame(targets)\n",
        "targets"
      ],
      "metadata": {
        "id": "D5TXIYlSeqF1",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 300
        },
        "outputId": "b5591066-670e-40ba-eaa8-7de12da0f378"
      },
      "execution_count": 15,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "       organism                                          pref_name  \\\n",
              "0  Homo sapiens             Epidermal growth factor receptor erbB1   \n",
              "1  Homo sapiens  Epidermal growth factor receptor and ErbB2 (HE...   \n",
              "2  Homo sapiens                   Epidermal growth factor receptor   \n",
              "3  Homo sapiens  MER intracellular domain/EGFR extracellular do...   \n",
              "4  Homo sapiens  Protein cereblon/Epidermal growth factor receptor   \n",
              "5  Homo sapiens                                        EGFR/PPP1CA   \n",
              "6  Homo sapiens                                           VHL/EGFR   \n",
              "7  Homo sapiens  Baculoviral IAP repeat-containing protein 2/Ep...   \n",
              "\n",
              "  target_chembl_id                  target_type  \n",
              "0        CHEMBL203               SINGLE PROTEIN  \n",
              "1    CHEMBL2111431               PROTEIN FAMILY  \n",
              "2    CHEMBL2363049               PROTEIN FAMILY  \n",
              "3    CHEMBL3137284             CHIMERIC PROTEIN  \n",
              "4    CHEMBL4523680  PROTEIN-PROTEIN INTERACTION  \n",
              "5    CHEMBL4523747  PROTEIN-PROTEIN INTERACTION  \n",
              "6    CHEMBL4523998  PROTEIN-PROTEIN INTERACTION  \n",
              "7    CHEMBL4802031  PROTEIN-PROTEIN INTERACTION  "
            ],
            "text/html": [
              "\n",
              "\n",
              "  <div id=\"df-f585e621-9657-44b3-acd7-5513aef14bb4\">\n",
              "    <div class=\"colab-df-container\">\n",
              "      <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>organism</th>\n",
              "      <th>pref_name</th>\n",
              "      <th>target_chembl_id</th>\n",
              "      <th>target_type</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>Homo sapiens</td>\n",
              "      <td>Epidermal growth factor receptor erbB1</td>\n",
              "      <td>CHEMBL203</td>\n",
              "      <td>SINGLE PROTEIN</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>Homo sapiens</td>\n",
              "      <td>Epidermal growth factor receptor and ErbB2 (HE...</td>\n",
              "      <td>CHEMBL2111431</td>\n",
              "      <td>PROTEIN FAMILY</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>Homo sapiens</td>\n",
              "      <td>Epidermal growth factor receptor</td>\n",
              "      <td>CHEMBL2363049</td>\n",
              "      <td>PROTEIN FAMILY</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>Homo sapiens</td>\n",
              "      <td>MER intracellular domain/EGFR extracellular do...</td>\n",
              "      <td>CHEMBL3137284</td>\n",
              "      <td>CHIMERIC PROTEIN</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>Homo sapiens</td>\n",
              "      <td>Protein cereblon/Epidermal growth factor receptor</td>\n",
              "      <td>CHEMBL4523680</td>\n",
              "      <td>PROTEIN-PROTEIN INTERACTION</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>5</th>\n",
              "      <td>Homo sapiens</td>\n",
              "      <td>EGFR/PPP1CA</td>\n",
              "      <td>CHEMBL4523747</td>\n",
              "      <td>PROTEIN-PROTEIN INTERACTION</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>6</th>\n",
              "      <td>Homo sapiens</td>\n",
              "      <td>VHL/EGFR</td>\n",
              "      <td>CHEMBL4523998</td>\n",
              "      <td>PROTEIN-PROTEIN INTERACTION</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>7</th>\n",
              "      <td>Homo sapiens</td>\n",
              "      <td>Baculoviral IAP repeat-containing protein 2/Ep...</td>\n",
              "      <td>CHEMBL4802031</td>\n",
              "      <td>PROTEIN-PROTEIN INTERACTION</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-f585e621-9657-44b3-acd7-5513aef14bb4')\"\n",
              "              title=\"Convert this dataframe to an interactive table.\"\n",
              "              style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
              "  </svg>\n",
              "      </button>\n",
              "\n",
              "\n",
              "\n",
              "    <div id=\"df-91e7fba5-52b3-4cd5-93bf-533f71099e25\">\n",
              "      <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-91e7fba5-52b3-4cd5-93bf-533f71099e25')\"\n",
              "              title=\"Suggest charts.\"\n",
              "              style=\"display:none;\">\n",
              "\n",
              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "     width=\"24px\">\n",
              "    <g>\n",
              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
              "    </g>\n",
              "</svg>\n",
              "      </button>\n",
              "    </div>\n",
              "\n",
              "<style>\n",
              "  .colab-df-quickchart {\n",
              "    background-color: #E8F0FE;\n",
              "    border: none;\n",
              "    border-radius: 50%;\n",
              "    cursor: pointer;\n",
              "    display: none;\n",
              "    fill: #1967D2;\n",
              "    height: 32px;\n",
              "    padding: 0 0 0 0;\n",
              "    width: 32px;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart:hover {\n",
              "    background-color: #E2EBFA;\n",
              "    box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "    fill: #174EA6;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart {\n",
              "    background-color: #3B4455;\n",
              "    fill: #D2E3FC;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart:hover {\n",
              "    background-color: #434B5C;\n",
              "    box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "    filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "    fill: #FFFFFF;\n",
              "  }\n",
              "</style>\n",
              "\n",
              "    <script>\n",
              "      async function quickchart(key) {\n",
              "        const containerElement = document.querySelector('#' + key);\n",
              "        const charts = await google.colab.kernel.invokeFunction(\n",
              "            'suggestCharts', [key], {});\n",
              "      }\n",
              "    </script>\n",
              "\n",
              "      <script>\n",
              "\n",
              "function displayQuickchartButton(domScope) {\n",
              "  let quickchartButtonEl =\n",
              "    domScope.querySelector('#df-91e7fba5-52b3-4cd5-93bf-533f71099e25 button.colab-df-quickchart');\n",
              "  quickchartButtonEl.style.display =\n",
              "    google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "}\n",
              "\n",
              "        displayQuickchartButton(document);\n",
              "      </script>\n",
              "      <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      flex-wrap:wrap;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "      <script>\n",
              "        const buttonEl =\n",
              "          document.querySelector('#df-f585e621-9657-44b3-acd7-5513aef14bb4 button.colab-df-convert');\n",
              "        buttonEl.style.display =\n",
              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "        async function convertToInteractive(key) {\n",
              "          const element = document.querySelector('#df-f585e621-9657-44b3-acd7-5513aef14bb4');\n",
              "          const dataTable =\n",
              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                     [key], {});\n",
              "          if (!dataTable) return;\n",
              "\n",
              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "            + ' to learn more about interactive tables.';\n",
              "          element.innerHTML = '';\n",
              "          dataTable['output_type'] = 'display_data';\n",
              "          await google.colab.output.renderOutput(dataTable, element);\n",
              "          const docLink = document.createElement('div');\n",
              "          docLink.innerHTML = docLinkHtml;\n",
              "          element.appendChild(docLink);\n",
              "        }\n",
              "      </script>\n",
              "    </div>\n",
              "  </div>\n"
            ]
          },
          "metadata": {},
          "execution_count": 15
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Select the target (ChEMBL ID)"
      ],
      "metadata": {
        "id": "bx1U-xWjfo23"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "target = targets.iloc[0]\n",
        "target"
      ],
      "metadata": {
        "id": "sewq42tUgorQ",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "c96afefb-80bd-4da0-cecd-0afe850fcbf4"
      },
      "execution_count": 16,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "organism                                      Homo sapiens\n",
              "pref_name           Epidermal growth factor receptor erbB1\n",
              "target_chembl_id                                 CHEMBL203\n",
              "target_type                                 SINGLE PROTEIN\n",
              "Name: 0, dtype: object"
            ]
          },
          "metadata": {},
          "execution_count": 16
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "This is our target 💪"
      ],
      "metadata": {
        "id": "suaF4JvQHCKZ"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "Let's save the target"
      ],
      "metadata": {
        "id": "cbgfetyajN_d"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "target_id = target.target_chembl_id\n",
        "print(f\"The target ChEMBL ID is {target_id}\")"
      ],
      "metadata": {
        "id": "hJTy18a9iQs1",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "a5f9e875-a736-4ee4-d7f4-2a1ebae9dad5"
      },
      "execution_count": 17,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "The target ChEMBL ID is CHEMBL203\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Get bioactivity data about tested ligands\n",
        "\n",
        "This step in needed in order to query bioactivity data for the selected target. We download data and filter it only considering human proteins, IC50, exact measurements ('='), and bindind-based data ('B')."
      ],
      "metadata": {
        "id": "XHHY1Oa6kk8W"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "bioactivities = bioactivities_api.filter(\n",
        "    target_chembl_id=target_id, type=\"IC50\", relation=\"=\", assay_type=\"B\"\n",
        ").only(\n",
        "    \"activity_id\",\n",
        "    \"assay_chembl_id\",\n",
        "    \"assay_description\",\n",
        "    \"assay_type\",\n",
        "    \"molecule_chembl_id\",\n",
        "    \"type\",\n",
        "    \"standard_units\",\n",
        "    \"relation\",\n",
        "    \"standard_value\",\n",
        "    \"target_chembl_id\",\n",
        "    \"target_organism\",\n",
        ")\n",
        "\n",
        "print(f\"Length and type of bioactivities object: {len(bioactivities)}, {type(bioactivities)}\")"
      ],
      "metadata": {
        "id": "DDowRa21ft0l",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "84183f53-454c-4910-d3db-2cb73240f49e"
      },
      "execution_count": 28,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Length and type of bioactivities object: 10420, <class 'chembl_webresource_client.query_set.QuerySet'>\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Lots of data 😃"
      ],
      "metadata": {
        "id": "kEHRSaD6Ifgl"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# every element aquired holds some information, see it\n",
        "print(f\"Length and type of first element: {len(bioactivities[0])}, {type(bioactivities[0])}\")\n",
        "bioactivities[0]"
      ],
      "metadata": {
        "id": "87Lg52G3lM_9",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "1faf57a2-89cb-4e73-bdbd-60cd06255bc2"
      },
      "execution_count": 29,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Length and type of first element: 13, <class 'dict'>\n"
          ]
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "{'activity_id': 32260,\n",
              " 'assay_chembl_id': 'CHEMBL674637',\n",
              " 'assay_description': 'Inhibitory activity towards tyrosine phosphorylation for the epidermal growth factor-receptor kinase',\n",
              " 'assay_type': 'B',\n",
              " 'molecule_chembl_id': 'CHEMBL68920',\n",
              " 'relation': '=',\n",
              " 'standard_units': 'nM',\n",
              " 'standard_value': '41.0',\n",
              " 'target_chembl_id': 'CHEMBL203',\n",
              " 'target_organism': 'Homo sapiens',\n",
              " 'type': 'IC50',\n",
              " 'units': 'uM',\n",
              " 'value': '0.041'}"
            ]
          },
          "metadata": {},
          "execution_count": 29
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Download Bioactivity data from ChEMBL"
      ],
      "metadata": {
        "id": "h_OiF1rdldYd"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# save obtained information into a pandas dataframe. it will take a while :)\n",
        "bioactivities_df = pd.DataFrame.from_records(bioactivities)\n",
        "print(f\"DataFrame shape: {bioactivities_df.shape}\")\n",
        "bioactivities_df.head()"
      ],
      "metadata": {
        "id": "tSBwsyeulhJf",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 479
        },
        "outputId": "93faff7a-d83d-451d-95ea-3afb2f21faa6"
      },
      "execution_count": 30,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "DataFrame shape: (10421, 13)\n"
          ]
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "   activity_id assay_chembl_id  \\\n",
              "0        32260    CHEMBL674637   \n",
              "1        32260    CHEMBL674637   \n",
              "2        32267    CHEMBL674637   \n",
              "3        32680    CHEMBL677833   \n",
              "4        32770    CHEMBL674643   \n",
              "\n",
              "                                   assay_description assay_type  \\\n",
              "0  Inhibitory activity towards tyrosine phosphory...          B   \n",
              "1  Inhibitory activity towards tyrosine phosphory...          B   \n",
              "2  Inhibitory activity towards tyrosine phosphory...          B   \n",
              "3  In vitro inhibition of Epidermal growth factor...          B   \n",
              "4  Inhibitory concentration of EGF dependent auto...          B   \n",
              "\n",
              "  molecule_chembl_id relation standard_units standard_value target_chembl_id  \\\n",
              "0        CHEMBL68920        =             nM           41.0        CHEMBL203   \n",
              "1        CHEMBL68920        =             nM           41.0        CHEMBL203   \n",
              "2        CHEMBL69960        =             nM          170.0        CHEMBL203   \n",
              "3       CHEMBL137635        =             nM         9300.0        CHEMBL203   \n",
              "4       CHEMBL306988        =             nM       500000.0        CHEMBL203   \n",
              "\n",
              "  target_organism  type units  value  \n",
              "0    Homo sapiens  IC50    uM  0.041  \n",
              "1    Homo sapiens  IC50    uM  0.041  \n",
              "2    Homo sapiens  IC50    uM   0.17  \n",
              "3    Homo sapiens  IC50    uM    9.3  \n",
              "4    Homo sapiens  IC50    uM  500.0  "
            ],
            "text/html": [
              "\n",
              "\n",
              "  <div id=\"df-9f89aef4-a445-4601-b081-7e976e5915fe\">\n",
              "    <div class=\"colab-df-container\">\n",
              "      <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>activity_id</th>\n",
              "      <th>assay_chembl_id</th>\n",
              "      <th>assay_description</th>\n",
              "      <th>assay_type</th>\n",
              "      <th>molecule_chembl_id</th>\n",
              "      <th>relation</th>\n",
              "      <th>standard_units</th>\n",
              "      <th>standard_value</th>\n",
              "      <th>target_chembl_id</th>\n",
              "      <th>target_organism</th>\n",
              "      <th>type</th>\n",
              "      <th>units</th>\n",
              "      <th>value</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>32260</td>\n",
              "      <td>CHEMBL674637</td>\n",
              "      <td>Inhibitory activity towards tyrosine phosphory...</td>\n",
              "      <td>B</td>\n",
              "      <td>CHEMBL68920</td>\n",
              "      <td>=</td>\n",
              "      <td>nM</td>\n",
              "      <td>41.0</td>\n",
              "      <td>CHEMBL203</td>\n",
              "      <td>Homo sapiens</td>\n",
              "      <td>IC50</td>\n",
              "      <td>uM</td>\n",
              "      <td>0.041</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>32260</td>\n",
              "      <td>CHEMBL674637</td>\n",
              "      <td>Inhibitory activity towards tyrosine phosphory...</td>\n",
              "      <td>B</td>\n",
              "      <td>CHEMBL68920</td>\n",
              "      <td>=</td>\n",
              "      <td>nM</td>\n",
              "      <td>41.0</td>\n",
              "      <td>CHEMBL203</td>\n",
              "      <td>Homo sapiens</td>\n",
              "      <td>IC50</td>\n",
              "      <td>uM</td>\n",
              "      <td>0.041</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>32267</td>\n",
              "      <td>CHEMBL674637</td>\n",
              "      <td>Inhibitory activity towards tyrosine phosphory...</td>\n",
              "      <td>B</td>\n",
              "      <td>CHEMBL69960</td>\n",
              "      <td>=</td>\n",
              "      <td>nM</td>\n",
              "      <td>170.0</td>\n",
              "      <td>CHEMBL203</td>\n",
              "      <td>Homo sapiens</td>\n",
              "      <td>IC50</td>\n",
              "      <td>uM</td>\n",
              "      <td>0.17</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>32680</td>\n",
              "      <td>CHEMBL677833</td>\n",
              "      <td>In vitro inhibition of Epidermal growth factor...</td>\n",
              "      <td>B</td>\n",
              "      <td>CHEMBL137635</td>\n",
              "      <td>=</td>\n",
              "      <td>nM</td>\n",
              "      <td>9300.0</td>\n",
              "      <td>CHEMBL203</td>\n",
              "      <td>Homo sapiens</td>\n",
              "      <td>IC50</td>\n",
              "      <td>uM</td>\n",
              "      <td>9.3</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>32770</td>\n",
              "      <td>CHEMBL674643</td>\n",
              "      <td>Inhibitory concentration of EGF dependent auto...</td>\n",
              "      <td>B</td>\n",
              "      <td>CHEMBL306988</td>\n",
              "      <td>=</td>\n",
              "      <td>nM</td>\n",
              "      <td>500000.0</td>\n",
              "      <td>CHEMBL203</td>\n",
              "      <td>Homo sapiens</td>\n",
              "      <td>IC50</td>\n",
              "      <td>uM</td>\n",
              "      <td>500.0</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-9f89aef4-a445-4601-b081-7e976e5915fe')\"\n",
              "              title=\"Convert this dataframe to an interactive table.\"\n",
              "              style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
              "  </svg>\n",
              "      </button>\n",
              "\n",
              "\n",
              "\n",
              "    <div id=\"df-e44d476d-9e2d-48dd-84f8-041dd0063fd9\">\n",
              "      <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-e44d476d-9e2d-48dd-84f8-041dd0063fd9')\"\n",
              "              title=\"Suggest charts.\"\n",
              "              style=\"display:none;\">\n",
              "\n",
              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "     width=\"24px\">\n",
              "    <g>\n",
              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
              "    </g>\n",
              "</svg>\n",
              "      </button>\n",
              "    </div>\n",
              "\n",
              "<style>\n",
              "  .colab-df-quickchart {\n",
              "    background-color: #E8F0FE;\n",
              "    border: none;\n",
              "    border-radius: 50%;\n",
              "    cursor: pointer;\n",
              "    display: none;\n",
              "    fill: #1967D2;\n",
              "    height: 32px;\n",
              "    padding: 0 0 0 0;\n",
              "    width: 32px;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart:hover {\n",
              "    background-color: #E2EBFA;\n",
              "    box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "    fill: #174EA6;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart {\n",
              "    background-color: #3B4455;\n",
              "    fill: #D2E3FC;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart:hover {\n",
              "    background-color: #434B5C;\n",
              "    box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "    filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "    fill: #FFFFFF;\n",
              "  }\n",
              "</style>\n",
              "\n",
              "    <script>\n",
              "      async function quickchart(key) {\n",
              "        const containerElement = document.querySelector('#' + key);\n",
              "        const charts = await google.colab.kernel.invokeFunction(\n",
              "            'suggestCharts', [key], {});\n",
              "      }\n",
              "    </script>\n",
              "\n",
              "      <script>\n",
              "\n",
              "function displayQuickchartButton(domScope) {\n",
              "  let quickchartButtonEl =\n",
              "    domScope.querySelector('#df-e44d476d-9e2d-48dd-84f8-041dd0063fd9 button.colab-df-quickchart');\n",
              "  quickchartButtonEl.style.display =\n",
              "    google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "}\n",
              "\n",
              "        displayQuickchartButton(document);\n",
              "      </script>\n",
              "      <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      flex-wrap:wrap;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "      <script>\n",
              "        const buttonEl =\n",
              "          document.querySelector('#df-9f89aef4-a445-4601-b081-7e976e5915fe button.colab-df-convert');\n",
              "        buttonEl.style.display =\n",
              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "        async function convertToInteractive(key) {\n",
              "          const element = document.querySelector('#df-9f89aef4-a445-4601-b081-7e976e5915fe');\n",
              "          const dataTable =\n",
              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                     [key], {});\n",
              "          if (!dataTable) return;\n",
              "\n",
              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "            + ' to learn more about interactive tables.';\n",
              "          element.innerHTML = '';\n",
              "          dataTable['output_type'] = 'display_data';\n",
              "          await google.colab.output.renderOutput(dataTable, element);\n",
              "          const docLink = document.createElement('div');\n",
              "          docLink.innerHTML = docLinkHtml;\n",
              "          element.appendChild(docLink);\n",
              "        }\n",
              "      </script>\n",
              "    </div>\n",
              "  </div>\n"
            ]
          },
          "metadata": {},
          "execution_count": 30
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "The interesting columns are represented by the 'standard_units' and 'standard_value' ones because referred in nM."
      ],
      "metadata": {
        "id": "V11L6RfQKbml"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "Convert IC50 values to nM unit\n",
        "\n",
        "We need to convert all values with many different units to nM."
      ],
      "metadata": {
        "id": "kEeVkM9cp03M"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "bioactivities_df['units'].unique()"
      ],
      "metadata": {
        "id": "WxNCpyg8pqDt",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "9a5109f0-9c65-46e4-b0a2-81f2a689422f"
      },
      "execution_count": 31,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "array(['uM', 'nM', 'pM', 'M', \"10'3 uM\", \"10'1 ug/ml\", 'ug ml-1',\n",
              "       \"10'-1microM\", \"10'1 uM\", \"10'-1 ug/ml\", \"10'-2 ug/ml\", \"10'2 uM\",\n",
              "       \"10'-3 ug/ml\", \"10'-2microM\", '/uM', \"10'-6g/ml\", 'mM', 'umol/L',\n",
              "       'nmol/L', \"10'-10M\", \"10'-7M\", 'nmol', '10^-8M', 'µM'],\n",
              "      dtype=object)"
            ]
          },
          "metadata": {},
          "execution_count": 31
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "bioactivities_df.drop([\"units\", \"value\"], axis=1, inplace=True)\n",
        "bioactivities_df.head()"
      ],
      "metadata": {
        "id": "-2Xi9Z0sqLOc",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 461
        },
        "outputId": "c65680da-3f0c-4e2b-f18e-4493300b81bc"
      },
      "execution_count": 32,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "   activity_id assay_chembl_id  \\\n",
              "0        32260    CHEMBL674637   \n",
              "1        32260    CHEMBL674637   \n",
              "2        32267    CHEMBL674637   \n",
              "3        32680    CHEMBL677833   \n",
              "4        32770    CHEMBL674643   \n",
              "\n",
              "                                   assay_description assay_type  \\\n",
              "0  Inhibitory activity towards tyrosine phosphory...          B   \n",
              "1  Inhibitory activity towards tyrosine phosphory...          B   \n",
              "2  Inhibitory activity towards tyrosine phosphory...          B   \n",
              "3  In vitro inhibition of Epidermal growth factor...          B   \n",
              "4  Inhibitory concentration of EGF dependent auto...          B   \n",
              "\n",
              "  molecule_chembl_id relation standard_units standard_value target_chembl_id  \\\n",
              "0        CHEMBL68920        =             nM           41.0        CHEMBL203   \n",
              "1        CHEMBL68920        =             nM           41.0        CHEMBL203   \n",
              "2        CHEMBL69960        =             nM          170.0        CHEMBL203   \n",
              "3       CHEMBL137635        =             nM         9300.0        CHEMBL203   \n",
              "4       CHEMBL306988        =             nM       500000.0        CHEMBL203   \n",
              "\n",
              "  target_organism  type  \n",
              "0    Homo sapiens  IC50  \n",
              "1    Homo sapiens  IC50  \n",
              "2    Homo sapiens  IC50  \n",
              "3    Homo sapiens  IC50  \n",
              "4    Homo sapiens  IC50  "
            ],
            "text/html": [
              "\n",
              "\n",
              "  <div id=\"df-f9cbbcc5-9a72-4262-83d4-cac05188f6e8\">\n",
              "    <div class=\"colab-df-container\">\n",
              "      <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>activity_id</th>\n",
              "      <th>assay_chembl_id</th>\n",
              "      <th>assay_description</th>\n",
              "      <th>assay_type</th>\n",
              "      <th>molecule_chembl_id</th>\n",
              "      <th>relation</th>\n",
              "      <th>standard_units</th>\n",
              "      <th>standard_value</th>\n",
              "      <th>target_chembl_id</th>\n",
              "      <th>target_organism</th>\n",
              "      <th>type</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>32260</td>\n",
              "      <td>CHEMBL674637</td>\n",
              "      <td>Inhibitory activity towards tyrosine phosphory...</td>\n",
              "      <td>B</td>\n",
              "      <td>CHEMBL68920</td>\n",
              "      <td>=</td>\n",
              "      <td>nM</td>\n",
              "      <td>41.0</td>\n",
              "      <td>CHEMBL203</td>\n",
              "      <td>Homo sapiens</td>\n",
              "      <td>IC50</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>32260</td>\n",
              "      <td>CHEMBL674637</td>\n",
              "      <td>Inhibitory activity towards tyrosine phosphory...</td>\n",
              "      <td>B</td>\n",
              "      <td>CHEMBL68920</td>\n",
              "      <td>=</td>\n",
              "      <td>nM</td>\n",
              "      <td>41.0</td>\n",
              "      <td>CHEMBL203</td>\n",
              "      <td>Homo sapiens</td>\n",
              "      <td>IC50</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>32267</td>\n",
              "      <td>CHEMBL674637</td>\n",
              "      <td>Inhibitory activity towards tyrosine phosphory...</td>\n",
              "      <td>B</td>\n",
              "      <td>CHEMBL69960</td>\n",
              "      <td>=</td>\n",
              "      <td>nM</td>\n",
              "      <td>170.0</td>\n",
              "      <td>CHEMBL203</td>\n",
              "      <td>Homo sapiens</td>\n",
              "      <td>IC50</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>32680</td>\n",
              "      <td>CHEMBL677833</td>\n",
              "      <td>In vitro inhibition of Epidermal growth factor...</td>\n",
              "      <td>B</td>\n",
              "      <td>CHEMBL137635</td>\n",
              "      <td>=</td>\n",
              "      <td>nM</td>\n",
              "      <td>9300.0</td>\n",
              "      <td>CHEMBL203</td>\n",
              "      <td>Homo sapiens</td>\n",
              "      <td>IC50</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>32770</td>\n",
              "      <td>CHEMBL674643</td>\n",
              "      <td>Inhibitory concentration of EGF dependent auto...</td>\n",
              "      <td>B</td>\n",
              "      <td>CHEMBL306988</td>\n",
              "      <td>=</td>\n",
              "      <td>nM</td>\n",
              "      <td>500000.0</td>\n",
              "      <td>CHEMBL203</td>\n",
              "      <td>Homo sapiens</td>\n",
              "      <td>IC50</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-f9cbbcc5-9a72-4262-83d4-cac05188f6e8')\"\n",
              "              title=\"Convert this dataframe to an interactive table.\"\n",
              "              style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
              "  </svg>\n",
              "      </button>\n",
              "\n",
              "\n",
              "\n",
              "    <div id=\"df-aeab1b0d-83aa-4ac0-9276-307cbdbe4c5d\">\n",
              "      <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-aeab1b0d-83aa-4ac0-9276-307cbdbe4c5d')\"\n",
              "              title=\"Suggest charts.\"\n",
              "              style=\"display:none;\">\n",
              "\n",
              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "     width=\"24px\">\n",
              "    <g>\n",
              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
              "    </g>\n",
              "</svg>\n",
              "      </button>\n",
              "    </div>\n",
              "\n",
              "<style>\n",
              "  .colab-df-quickchart {\n",
              "    background-color: #E8F0FE;\n",
              "    border: none;\n",
              "    border-radius: 50%;\n",
              "    cursor: pointer;\n",
              "    display: none;\n",
              "    fill: #1967D2;\n",
              "    height: 32px;\n",
              "    padding: 0 0 0 0;\n",
              "    width: 32px;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart:hover {\n",
              "    background-color: #E2EBFA;\n",
              "    box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "    fill: #174EA6;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart {\n",
              "    background-color: #3B4455;\n",
              "    fill: #D2E3FC;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart:hover {\n",
              "    background-color: #434B5C;\n",
              "    box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "    filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "    fill: #FFFFFF;\n",
              "  }\n",
              "</style>\n",
              "\n",
              "    <script>\n",
              "      async function quickchart(key) {\n",
              "        const containerElement = document.querySelector('#' + key);\n",
              "        const charts = await google.colab.kernel.invokeFunction(\n",
              "            'suggestCharts', [key], {});\n",
              "      }\n",
              "    </script>\n",
              "\n",
              "      <script>\n",
              "\n",
              "function displayQuickchartButton(domScope) {\n",
              "  let quickchartButtonEl =\n",
              "    domScope.querySelector('#df-aeab1b0d-83aa-4ac0-9276-307cbdbe4c5d button.colab-df-quickchart');\n",
              "  quickchartButtonEl.style.display =\n",
              "    google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "}\n",
              "\n",
              "        displayQuickchartButton(document);\n",
              "      </script>\n",
              "      <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      flex-wrap:wrap;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "      <script>\n",
              "        const buttonEl =\n",
              "          document.querySelector('#df-f9cbbcc5-9a72-4262-83d4-cac05188f6e8 button.colab-df-convert');\n",
              "        buttonEl.style.display =\n",
              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "        async function convertToInteractive(key) {\n",
              "          const element = document.querySelector('#df-f9cbbcc5-9a72-4262-83d4-cac05188f6e8');\n",
              "          const dataTable =\n",
              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                     [key], {});\n",
              "          if (!dataTable) return;\n",
              "\n",
              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "            + ' to learn more about interactive tables.';\n",
              "          element.innerHTML = '';\n",
              "          dataTable['output_type'] = 'display_data';\n",
              "          await google.colab.output.renderOutput(dataTable, element);\n",
              "          const docLink = document.createElement('div');\n",
              "          docLink.innerHTML = docLinkHtml;\n",
              "          element.appendChild(docLink);\n",
              "        }\n",
              "      </script>\n",
              "    </div>\n",
              "  </div>\n"
            ]
          },
          "metadata": {},
          "execution_count": 32
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Preprocess and filter bioactivity data\n",
        "\n",
        "1. Convert datatype of “standard_value” from “object” to “float”\n"
      ],
      "metadata": {
        "id": "mL0SzwR1zJSN"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "bioactivities_df.dtypes"
      ],
      "metadata": {
        "id": "VhRtswiNzdE7"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "bioactivities_df = bioactivities_df.astype({\"standard_value\" : \"float64\"})"
      ],
      "metadata": {
        "id": "jPct7o-OzlVT"
      },
      "execution_count": 34,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "bioactivities_df.dtypes"
      ],
      "metadata": {
        "id": "G24NPnfm0Yf0"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "2. Delete entries with missing values"
      ],
      "metadata": {
        "id": "VwtlFAr60p_W"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "bioactivities_df.dropna(axis=0, how=\"any\", inplace=True)   #drop rows which contain missing values\n",
        "print(f\"DataFrame shape: {bioactivities_df.shape}\")"
      ],
      "metadata": {
        "id": "4Fh1HrHO0qY7",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "34744c0b-f626-4c97-c81d-c5a9265cecb4"
      },
      "execution_count": 36,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "DataFrame shape: (10420, 11)\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "3. Keep only entries with “standard_unit == nM”"
      ],
      "metadata": {
        "id": "X7zrbbG02Dk9"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "print(f\"Units in downloaded data: {bioactivities_df['standard_units'].unique()}\")\n",
        "print(\n",
        "    f\"Number of non-nM entries:\\\n",
        "    {bioactivities_df[bioactivities_df['standard_units'] != 'nM'].shape[0]}\"\n",
        ")"
      ],
      "metadata": {
        "id": "07ywlhOp2Ful",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "9e4f2d24-3fe2-4d89-c022-bf5403eb8727"
      },
      "execution_count": 37,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Units in downloaded data: ['nM' 'ug.mL-1' '/uM' 'µM']\n",
            "Number of non-nM entries:    70\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "bioactivities_df = bioactivities_df[bioactivities_df[\"standard_units\"] == \"nM\"]\n",
        "print(f\"Units after filtering: {bioactivities_df['standard_units'].unique()}\")"
      ],
      "metadata": {
        "id": "FTwQNUNm3b6T",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "beaeb407-22f5-4978-ee32-ac119921e51f"
      },
      "execution_count": 38,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Units after filtering: ['nM']\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "print(f\"DataFrame shape: {bioactivities_df.shape}\")"
      ],
      "metadata": {
        "id": "gZXOar0b4JCb",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "a1eb7e3c-b1ff-4f59-ecbf-122ed054d6c9"
      },
      "execution_count": 39,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "DataFrame shape: (10350, 11)\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "4. Delete duplicate molecules"
      ],
      "metadata": {
        "id": "wRL-r3Mk4Qpa"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "bioactivities_df.drop_duplicates(\"molecule_chembl_id\", keep=\"first\", inplace=True)\n",
        "print(f\"DataFrame shape: {bioactivities_df.shape}\")"
      ],
      "metadata": {
        "id": "HxpzxMOn4RTA",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "ca76d3f2-0d6e-416d-810a-7a0ac3cea74c"
      },
      "execution_count": 40,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "DataFrame shape: (6823, 11)\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "5. Reset “DataFrame” index"
      ],
      "metadata": {
        "id": "0TCyGhxc7lG0"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "bioactivities_df.reset_index(drop=True, inplace=True)\n",
        "bioactivities_df.head()\n"
      ],
      "metadata": {
        "id": "6E8aR1_h5ODX",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 479
        },
        "outputId": "af732783-6938-48be-ca2b-ef371ed89ba5"
      },
      "execution_count": 41,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "   activity_id assay_chembl_id  \\\n",
              "0        32260    CHEMBL674637   \n",
              "1        32267    CHEMBL674637   \n",
              "2        32680    CHEMBL677833   \n",
              "3        32770    CHEMBL674643   \n",
              "4        32772    CHEMBL674643   \n",
              "\n",
              "                                   assay_description assay_type  \\\n",
              "0  Inhibitory activity towards tyrosine phosphory...          B   \n",
              "1  Inhibitory activity towards tyrosine phosphory...          B   \n",
              "2  In vitro inhibition of Epidermal growth factor...          B   \n",
              "3  Inhibitory concentration of EGF dependent auto...          B   \n",
              "4  Inhibitory concentration of EGF dependent auto...          B   \n",
              "\n",
              "  molecule_chembl_id relation standard_units  standard_value target_chembl_id  \\\n",
              "0        CHEMBL68920        =             nM            41.0        CHEMBL203   \n",
              "1        CHEMBL69960        =             nM           170.0        CHEMBL203   \n",
              "2       CHEMBL137635        =             nM          9300.0        CHEMBL203   \n",
              "3       CHEMBL306988        =             nM        500000.0        CHEMBL203   \n",
              "4        CHEMBL66879        =             nM       3000000.0        CHEMBL203   \n",
              "\n",
              "  target_organism  type  \n",
              "0    Homo sapiens  IC50  \n",
              "1    Homo sapiens  IC50  \n",
              "2    Homo sapiens  IC50  \n",
              "3    Homo sapiens  IC50  \n",
              "4    Homo sapiens  IC50  "
            ],
            "text/html": [
              "\n",
              "\n",
              "  <div id=\"df-e8756456-057c-435e-9af6-2de19f7904d6\">\n",
              "    <div class=\"colab-df-container\">\n",
              "      <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>activity_id</th>\n",
              "      <th>assay_chembl_id</th>\n",
              "      <th>assay_description</th>\n",
              "      <th>assay_type</th>\n",
              "      <th>molecule_chembl_id</th>\n",
              "      <th>relation</th>\n",
              "      <th>standard_units</th>\n",
              "      <th>standard_value</th>\n",
              "      <th>target_chembl_id</th>\n",
              "      <th>target_organism</th>\n",
              "      <th>type</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>32260</td>\n",
              "      <td>CHEMBL674637</td>\n",
              "      <td>Inhibitory activity towards tyrosine phosphory...</td>\n",
              "      <td>B</td>\n",
              "      <td>CHEMBL68920</td>\n",
              "      <td>=</td>\n",
              "      <td>nM</td>\n",
              "      <td>41.0</td>\n",
              "      <td>CHEMBL203</td>\n",
              "      <td>Homo sapiens</td>\n",
              "      <td>IC50</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>32267</td>\n",
              "      <td>CHEMBL674637</td>\n",
              "      <td>Inhibitory activity towards tyrosine phosphory...</td>\n",
              "      <td>B</td>\n",
              "      <td>CHEMBL69960</td>\n",
              "      <td>=</td>\n",
              "      <td>nM</td>\n",
              "      <td>170.0</td>\n",
              "      <td>CHEMBL203</td>\n",
              "      <td>Homo sapiens</td>\n",
              "      <td>IC50</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>32680</td>\n",
              "      <td>CHEMBL677833</td>\n",
              "      <td>In vitro inhibition of Epidermal growth factor...</td>\n",
              "      <td>B</td>\n",
              "      <td>CHEMBL137635</td>\n",
              "      <td>=</td>\n",
              "      <td>nM</td>\n",
              "      <td>9300.0</td>\n",
              "      <td>CHEMBL203</td>\n",
              "      <td>Homo sapiens</td>\n",
              "      <td>IC50</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>32770</td>\n",
              "      <td>CHEMBL674643</td>\n",
              "      <td>Inhibitory concentration of EGF dependent auto...</td>\n",
              "      <td>B</td>\n",
              "      <td>CHEMBL306988</td>\n",
              "      <td>=</td>\n",
              "      <td>nM</td>\n",
              "      <td>500000.0</td>\n",
              "      <td>CHEMBL203</td>\n",
              "      <td>Homo sapiens</td>\n",
              "      <td>IC50</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>32772</td>\n",
              "      <td>CHEMBL674643</td>\n",
              "      <td>Inhibitory concentration of EGF dependent auto...</td>\n",
              "      <td>B</td>\n",
              "      <td>CHEMBL66879</td>\n",
              "      <td>=</td>\n",
              "      <td>nM</td>\n",
              "      <td>3000000.0</td>\n",
              "      <td>CHEMBL203</td>\n",
              "      <td>Homo sapiens</td>\n",
              "      <td>IC50</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-e8756456-057c-435e-9af6-2de19f7904d6')\"\n",
              "              title=\"Convert this dataframe to an interactive table.\"\n",
              "              style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
              "  </svg>\n",
              "      </button>\n",
              "\n",
              "\n",
              "\n",
              "    <div id=\"df-99c21fcc-6e48-402c-b666-75ed683f59d0\">\n",
              "      <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-99c21fcc-6e48-402c-b666-75ed683f59d0')\"\n",
              "              title=\"Suggest charts.\"\n",
              "              style=\"display:none;\">\n",
              "\n",
              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "     width=\"24px\">\n",
              "    <g>\n",
              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
              "    </g>\n",
              "</svg>\n",
              "      </button>\n",
              "    </div>\n",
              "\n",
              "<style>\n",
              "  .colab-df-quickchart {\n",
              "    background-color: #E8F0FE;\n",
              "    border: none;\n",
              "    border-radius: 50%;\n",
              "    cursor: pointer;\n",
              "    display: none;\n",
              "    fill: #1967D2;\n",
              "    height: 32px;\n",
              "    padding: 0 0 0 0;\n",
              "    width: 32px;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart:hover {\n",
              "    background-color: #E2EBFA;\n",
              "    box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "    fill: #174EA6;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart {\n",
              "    background-color: #3B4455;\n",
              "    fill: #D2E3FC;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart:hover {\n",
              "    background-color: #434B5C;\n",
              "    box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "    filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "    fill: #FFFFFF;\n",
              "  }\n",
              "</style>\n",
              "\n",
              "    <script>\n",
              "      async function quickchart(key) {\n",
              "        const containerElement = document.querySelector('#' + key);\n",
              "        const charts = await google.colab.kernel.invokeFunction(\n",
              "            'suggestCharts', [key], {});\n",
              "      }\n",
              "    </script>\n",
              "\n",
              "      <script>\n",
              "\n",
              "function displayQuickchartButton(domScope) {\n",
              "  let quickchartButtonEl =\n",
              "    domScope.querySelector('#df-99c21fcc-6e48-402c-b666-75ed683f59d0 button.colab-df-quickchart');\n",
              "  quickchartButtonEl.style.display =\n",
              "    google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "}\n",
              "\n",
              "        displayQuickchartButton(document);\n",
              "      </script>\n",
              "      <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      flex-wrap:wrap;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "      <script>\n",
              "        const buttonEl =\n",
              "          document.querySelector('#df-e8756456-057c-435e-9af6-2de19f7904d6 button.colab-df-convert');\n",
              "        buttonEl.style.display =\n",
              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "        async function convertToInteractive(key) {\n",
              "          const element = document.querySelector('#df-e8756456-057c-435e-9af6-2de19f7904d6');\n",
              "          const dataTable =\n",
              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                     [key], {});\n",
              "          if (!dataTable) return;\n",
              "\n",
              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "            + ' to learn more about interactive tables.';\n",
              "          element.innerHTML = '';\n",
              "          dataTable['output_type'] = 'display_data';\n",
              "          await google.colab.output.renderOutput(dataTable, element);\n",
              "          const docLink = document.createElement('div');\n",
              "          docLink.innerHTML = docLinkHtml;\n",
              "          element.appendChild(docLink);\n",
              "        }\n",
              "      </script>\n",
              "    </div>\n",
              "  </div>\n"
            ]
          },
          "metadata": {},
          "execution_count": 41
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "6. Rename columns"
      ],
      "metadata": {
        "id": "tW6-uahs7wrL"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "bioactivities_df.rename(\n",
        "    columns={\"standard_value\": \"IC50\", \"standard_units\": \"units\"}, inplace=True\n",
        ")\n",
        "bioactivities_df.head()"
      ],
      "metadata": {
        "id": "KaAy_0Mc7xOs",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 479
        },
        "outputId": "8be364cc-d848-4d0b-9aec-5e09252fe77d"
      },
      "execution_count": 42,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "   activity_id assay_chembl_id  \\\n",
              "0        32260    CHEMBL674637   \n",
              "1        32267    CHEMBL674637   \n",
              "2        32680    CHEMBL677833   \n",
              "3        32770    CHEMBL674643   \n",
              "4        32772    CHEMBL674643   \n",
              "\n",
              "                                   assay_description assay_type  \\\n",
              "0  Inhibitory activity towards tyrosine phosphory...          B   \n",
              "1  Inhibitory activity towards tyrosine phosphory...          B   \n",
              "2  In vitro inhibition of Epidermal growth factor...          B   \n",
              "3  Inhibitory concentration of EGF dependent auto...          B   \n",
              "4  Inhibitory concentration of EGF dependent auto...          B   \n",
              "\n",
              "  molecule_chembl_id relation units       IC50 target_chembl_id  \\\n",
              "0        CHEMBL68920        =    nM       41.0        CHEMBL203   \n",
              "1        CHEMBL69960        =    nM      170.0        CHEMBL203   \n",
              "2       CHEMBL137635        =    nM     9300.0        CHEMBL203   \n",
              "3       CHEMBL306988        =    nM   500000.0        CHEMBL203   \n",
              "4        CHEMBL66879        =    nM  3000000.0        CHEMBL203   \n",
              "\n",
              "  target_organism  type  \n",
              "0    Homo sapiens  IC50  \n",
              "1    Homo sapiens  IC50  \n",
              "2    Homo sapiens  IC50  \n",
              "3    Homo sapiens  IC50  \n",
              "4    Homo sapiens  IC50  "
            ],
            "text/html": [
              "\n",
              "\n",
              "  <div id=\"df-67b9c8dd-c35c-4cac-bfef-d9ca6f250da3\">\n",
              "    <div class=\"colab-df-container\">\n",
              "      <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>activity_id</th>\n",
              "      <th>assay_chembl_id</th>\n",
              "      <th>assay_description</th>\n",
              "      <th>assay_type</th>\n",
              "      <th>molecule_chembl_id</th>\n",
              "      <th>relation</th>\n",
              "      <th>units</th>\n",
              "      <th>IC50</th>\n",
              "      <th>target_chembl_id</th>\n",
              "      <th>target_organism</th>\n",
              "      <th>type</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>32260</td>\n",
              "      <td>CHEMBL674637</td>\n",
              "      <td>Inhibitory activity towards tyrosine phosphory...</td>\n",
              "      <td>B</td>\n",
              "      <td>CHEMBL68920</td>\n",
              "      <td>=</td>\n",
              "      <td>nM</td>\n",
              "      <td>41.0</td>\n",
              "      <td>CHEMBL203</td>\n",
              "      <td>Homo sapiens</td>\n",
              "      <td>IC50</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>32267</td>\n",
              "      <td>CHEMBL674637</td>\n",
              "      <td>Inhibitory activity towards tyrosine phosphory...</td>\n",
              "      <td>B</td>\n",
              "      <td>CHEMBL69960</td>\n",
              "      <td>=</td>\n",
              "      <td>nM</td>\n",
              "      <td>170.0</td>\n",
              "      <td>CHEMBL203</td>\n",
              "      <td>Homo sapiens</td>\n",
              "      <td>IC50</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>32680</td>\n",
              "      <td>CHEMBL677833</td>\n",
              "      <td>In vitro inhibition of Epidermal growth factor...</td>\n",
              "      <td>B</td>\n",
              "      <td>CHEMBL137635</td>\n",
              "      <td>=</td>\n",
              "      <td>nM</td>\n",
              "      <td>9300.0</td>\n",
              "      <td>CHEMBL203</td>\n",
              "      <td>Homo sapiens</td>\n",
              "      <td>IC50</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>32770</td>\n",
              "      <td>CHEMBL674643</td>\n",
              "      <td>Inhibitory concentration of EGF dependent auto...</td>\n",
              "      <td>B</td>\n",
              "      <td>CHEMBL306988</td>\n",
              "      <td>=</td>\n",
              "      <td>nM</td>\n",
              "      <td>500000.0</td>\n",
              "      <td>CHEMBL203</td>\n",
              "      <td>Homo sapiens</td>\n",
              "      <td>IC50</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>32772</td>\n",
              "      <td>CHEMBL674643</td>\n",
              "      <td>Inhibitory concentration of EGF dependent auto...</td>\n",
              "      <td>B</td>\n",
              "      <td>CHEMBL66879</td>\n",
              "      <td>=</td>\n",
              "      <td>nM</td>\n",
              "      <td>3000000.0</td>\n",
              "      <td>CHEMBL203</td>\n",
              "      <td>Homo sapiens</td>\n",
              "      <td>IC50</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-67b9c8dd-c35c-4cac-bfef-d9ca6f250da3')\"\n",
              "              title=\"Convert this dataframe to an interactive table.\"\n",
              "              style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
              "  </svg>\n",
              "      </button>\n",
              "\n",
              "\n",
              "\n",
              "    <div id=\"df-9a2e74a9-2bdd-4398-b163-6674cbb79c55\">\n",
              "      <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-9a2e74a9-2bdd-4398-b163-6674cbb79c55')\"\n",
              "              title=\"Suggest charts.\"\n",
              "              style=\"display:none;\">\n",
              "\n",
              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "     width=\"24px\">\n",
              "    <g>\n",
              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
              "    </g>\n",
              "</svg>\n",
              "      </button>\n",
              "    </div>\n",
              "\n",
              "<style>\n",
              "  .colab-df-quickchart {\n",
              "    background-color: #E8F0FE;\n",
              "    border: none;\n",
              "    border-radius: 50%;\n",
              "    cursor: pointer;\n",
              "    display: none;\n",
              "    fill: #1967D2;\n",
              "    height: 32px;\n",
              "    padding: 0 0 0 0;\n",
              "    width: 32px;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart:hover {\n",
              "    background-color: #E2EBFA;\n",
              "    box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "    fill: #174EA6;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart {\n",
              "    background-color: #3B4455;\n",
              "    fill: #D2E3FC;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart:hover {\n",
              "    background-color: #434B5C;\n",
              "    box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "    filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "    fill: #FFFFFF;\n",
              "  }\n",
              "</style>\n",
              "\n",
              "    <script>\n",
              "      async function quickchart(key) {\n",
              "        const containerElement = document.querySelector('#' + key);\n",
              "        const charts = await google.colab.kernel.invokeFunction(\n",
              "            'suggestCharts', [key], {});\n",
              "      }\n",
              "    </script>\n",
              "\n",
              "      <script>\n",
              "\n",
              "function displayQuickchartButton(domScope) {\n",
              "  let quickchartButtonEl =\n",
              "    domScope.querySelector('#df-9a2e74a9-2bdd-4398-b163-6674cbb79c55 button.colab-df-quickchart');\n",
              "  quickchartButtonEl.style.display =\n",
              "    google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "}\n",
              "\n",
              "        displayQuickchartButton(document);\n",
              "      </script>\n",
              "      <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      flex-wrap:wrap;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "      <script>\n",
              "        const buttonEl =\n",
              "          document.querySelector('#df-67b9c8dd-c35c-4cac-bfef-d9ca6f250da3 button.colab-df-convert');\n",
              "        buttonEl.style.display =\n",
              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "        async function convertToInteractive(key) {\n",
              "          const element = document.querySelector('#df-67b9c8dd-c35c-4cac-bfef-d9ca6f250da3');\n",
              "          const dataTable =\n",
              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                     [key], {});\n",
              "          if (!dataTable) return;\n",
              "\n",
              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "            + ' to learn more about interactive tables.';\n",
              "          element.innerHTML = '';\n",
              "          dataTable['output_type'] = 'display_data';\n",
              "          await google.colab.output.renderOutput(dataTable, element);\n",
              "          const docLink = document.createElement('div');\n",
              "          docLink.innerHTML = docLinkHtml;\n",
              "          element.appendChild(docLink);\n",
              "        }\n",
              "      </script>\n",
              "    </div>\n",
              "  </div>\n"
            ]
          },
          "metadata": {},
          "execution_count": 42
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "print(f\"DataFrame shape: {bioactivities_df.shape}\")"
      ],
      "metadata": {
        "id": "Hof3VZxZ77gj",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "f6cf9905-b07f-4574-d389-7f59288e2244"
      },
      "execution_count": 43,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "DataFrame shape: (6823, 11)\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Fetch compound data (molecular structures) from ChEMBL ..."
      ],
      "metadata": {
        "id": "aIS9YaVV8TQK"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "compounds_provider = compounds_api.filter(\n",
        "    molecule_chembl_id__in=list(bioactivities_df[\"molecule_chembl_id\"])\n",
        ").only(\"molecule_chembl_id\", \"molecule_structures\")"
      ],
      "metadata": {
        "id": "hXA6DPwi8Vxb"
      },
      "execution_count": 44,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "... and download it"
      ],
      "metadata": {
        "id": "0S6rNxRi8flL"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "compounds = list(tqdm(compounds_provider))"
      ],
      "metadata": {
        "id": "yAknPJXu8gOD",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 49,
          "referenced_widgets": [
            "b15d0f15b84b4def826e1d239e28fdf1",
            "eeda925759c04afb881a76e88e3b0c66",
            "e83ec3ff21c7471894cdcf073964d992",
            "751660ec13fa410c8eb1385f719103d4",
            "d16a5b33961e4a849bf687d8746fd2e4",
            "0ae3990397f94a449a159faab3251615",
            "fa08ea49c3ac43afb324fcaf8340a1e6",
            "7f61a7b60b274044a7618ced159a9597",
            "d98b21f92e444e408761d83a7c8d9b1a",
            "2039e2ed16ed479fb7f26731397481f9",
            "79ad69fdca97433984baf06f544de798"
          ]
        },
        "outputId": "a7e3b99b-2d6a-42ba-9312-ff3e4397e79a"
      },
      "execution_count": 45,
      "outputs": [
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [
              "  0%|          | 0/6823 [00:00<?, ?it/s]"
            ],
            "application/vnd.jupyter.widget-view+json": {
              "version_major": 2,
              "version_minor": 0,
              "model_id": "b15d0f15b84b4def826e1d239e28fdf1"
            }
          },
          "metadata": {}
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "compounds_df = pd.DataFrame.from_records(\n",
        "    compounds,\n",
        ")\n",
        "print(f\"DataFrame shape: {compounds_df.shape}\")"
      ],
      "metadata": {
        "id": "D5GZ956xStWC",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "2327bcb4-08e9-4d5a-f09d-60ad8fe805b6"
      },
      "execution_count": 46,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "DataFrame shape: (6823, 2)\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "compounds_df.head()"
      ],
      "metadata": {
        "id": "rp7AZJuoTAkJ",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 206
        },
        "outputId": "69392761-eaee-451a-a51e-d1cf5b134067"
      },
      "execution_count": 47,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "  molecule_chembl_id                                molecule_structures\n",
              "0         CHEMBL6246  {'canonical_smiles': 'O=c1oc2c(O)c(O)cc3c(=O)o...\n",
              "1           CHEMBL10  {'canonical_smiles': 'C[S+]([O-])c1ccc(-c2nc(-...\n",
              "2         CHEMBL6976  {'canonical_smiles': 'COc1cc2c(cc1OC)Nc1ncn(C)...\n",
              "3         CHEMBL7002  {'canonical_smiles': 'CC1(COc2ccc(CC3SC(=O)NC3...\n",
              "4       CHEMBL414013  {'canonical_smiles': 'COc1cc2c(cc1OC)Nc1ncnc(O..."
            ],
            "text/html": [
              "\n",
              "\n",
              "  <div id=\"df-564d03ac-e511-44b8-a476-44f8ec3086d9\">\n",
              "    <div class=\"colab-df-container\">\n",
              "      <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>molecule_chembl_id</th>\n",
              "      <th>molecule_structures</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>CHEMBL6246</td>\n",
              "      <td>{'canonical_smiles': 'O=c1oc2c(O)c(O)cc3c(=O)o...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>CHEMBL10</td>\n",
              "      <td>{'canonical_smiles': 'C[S+]([O-])c1ccc(-c2nc(-...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>CHEMBL6976</td>\n",
              "      <td>{'canonical_smiles': 'COc1cc2c(cc1OC)Nc1ncn(C)...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>CHEMBL7002</td>\n",
              "      <td>{'canonical_smiles': 'CC1(COc2ccc(CC3SC(=O)NC3...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>CHEMBL414013</td>\n",
              "      <td>{'canonical_smiles': 'COc1cc2c(cc1OC)Nc1ncnc(O...</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-564d03ac-e511-44b8-a476-44f8ec3086d9')\"\n",
              "              title=\"Convert this dataframe to an interactive table.\"\n",
              "              style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
              "  </svg>\n",
              "      </button>\n",
              "\n",
              "\n",
              "\n",
              "    <div id=\"df-97f702ff-b084-4446-a681-865032f54046\">\n",
              "      <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-97f702ff-b084-4446-a681-865032f54046')\"\n",
              "              title=\"Suggest charts.\"\n",
              "              style=\"display:none;\">\n",
              "\n",
              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "     width=\"24px\">\n",
              "    <g>\n",
              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
              "    </g>\n",
              "</svg>\n",
              "      </button>\n",
              "    </div>\n",
              "\n",
              "<style>\n",
              "  .colab-df-quickchart {\n",
              "    background-color: #E8F0FE;\n",
              "    border: none;\n",
              "    border-radius: 50%;\n",
              "    cursor: pointer;\n",
              "    display: none;\n",
              "    fill: #1967D2;\n",
              "    height: 32px;\n",
              "    padding: 0 0 0 0;\n",
              "    width: 32px;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart:hover {\n",
              "    background-color: #E2EBFA;\n",
              "    box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "    fill: #174EA6;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart {\n",
              "    background-color: #3B4455;\n",
              "    fill: #D2E3FC;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart:hover {\n",
              "    background-color: #434B5C;\n",
              "    box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "    filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "    fill: #FFFFFF;\n",
              "  }\n",
              "</style>\n",
              "\n",
              "    <script>\n",
              "      async function quickchart(key) {\n",
              "        const containerElement = document.querySelector('#' + key);\n",
              "        const charts = await google.colab.kernel.invokeFunction(\n",
              "            'suggestCharts', [key], {});\n",
              "      }\n",
              "    </script>\n",
              "\n",
              "      <script>\n",
              "\n",
              "function displayQuickchartButton(domScope) {\n",
              "  let quickchartButtonEl =\n",
              "    domScope.querySelector('#df-97f702ff-b084-4446-a681-865032f54046 button.colab-df-quickchart');\n",
              "  quickchartButtonEl.style.display =\n",
              "    google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "}\n",
              "\n",
              "        displayQuickchartButton(document);\n",
              "      </script>\n",
              "      <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      flex-wrap:wrap;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "      <script>\n",
              "        const buttonEl =\n",
              "          document.querySelector('#df-564d03ac-e511-44b8-a476-44f8ec3086d9 button.colab-df-convert');\n",
              "        buttonEl.style.display =\n",
              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "        async function convertToInteractive(key) {\n",
              "          const element = document.querySelector('#df-564d03ac-e511-44b8-a476-44f8ec3086d9');\n",
              "          const dataTable =\n",
              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                     [key], {});\n",
              "          if (!dataTable) return;\n",
              "\n",
              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "            + ' to learn more about interactive tables.';\n",
              "          element.innerHTML = '';\n",
              "          dataTable['output_type'] = 'display_data';\n",
              "          await google.colab.output.renderOutput(dataTable, element);\n",
              "          const docLink = document.createElement('div');\n",
              "          docLink.innerHTML = docLinkHtml;\n",
              "          element.appendChild(docLink);\n",
              "        }\n",
              "      </script>\n",
              "    </div>\n",
              "  </div>\n"
            ]
          },
          "metadata": {},
          "execution_count": 47
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Preprocess and filter compound data"
      ],
      "metadata": {
        "id": "aYNuT613TY3A"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "1. Remove entries with missing molecule structure entry"
      ],
      "metadata": {
        "id": "mKvosdsvTetA"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "compounds_df.dropna(axis=0, how=\"any\", inplace=True)\n",
        "print(f\"DataFrame shape: {compounds_df.shape}\")"
      ],
      "metadata": {
        "id": "ih416BHBTcLw",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "c43f3bb4-22c8-4053-f7fc-806ba6844a8f"
      },
      "execution_count": 48,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "DataFrame shape: (6816, 2)\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "2. Delete duplicate molecules"
      ],
      "metadata": {
        "id": "4mn6curmUYgY"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "compounds_df.drop_duplicates(\"molecule_chembl_id\", keep=\"first\", inplace=True)\n",
        "print(f\"DataFrame shape: {compounds_df.shape}\")"
      ],
      "metadata": {
        "id": "eeDEjxiMUZEU",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "bea2b3c6-13ad-4ddc-b9d8-7e234a67ed04"
      },
      "execution_count": 49,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "DataFrame shape: (6816, 2)\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "3. Get molecules with canonical SMILES"
      ],
      "metadata": {
        "id": "DNAz4-D2VraP"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "compounds_df.iloc[0].molecule_structures.keys()"
      ],
      "metadata": {
        "id": "34Jox4MjVxlH",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "abf6526c-1f8c-40f7-cfe9-2dccdc2725f7"
      },
      "execution_count": 50,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "dict_keys(['canonical_smiles', 'molfile', 'standard_inchi', 'standard_inchi_key'])"
            ]
          },
          "metadata": {},
          "execution_count": 50
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "canonical_smiles = []\n",
        "\n",
        "for i, compounds in compounds_df.iterrows():\n",
        "    try:\n",
        "        canonical_smiles.append(compounds[\"molecule_structures\"][\"canonical_smiles\"])\n",
        "    except KeyError:\n",
        "        canonical_smiles.append(None)\n",
        "\n",
        "compounds_df[\"smiles\"] = canonical_smiles\n",
        "compounds_df.drop(\"molecule_structures\", axis=1, inplace=True)\n",
        "print(f\"DataFrame shape: {compounds_df.shape}\")"
      ],
      "metadata": {
        "id": "X5n6vGUBWBxw",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "f59b106a-2287-4f18-b219-b08c02654639"
      },
      "execution_count": 51,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "DataFrame shape: (6816, 2)\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "compounds_df.dropna(axis=0, how=\"any\", inplace=True)\n",
        "print(f\"DataFrame shape: {compounds_df.shape}\")"
      ],
      "metadata": {
        "id": "Dl68xc3NWKag",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "5988d653-d8ff-4492-8457-7f1111d6b109"
      },
      "execution_count": 52,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "DataFrame shape: (6816, 2)\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Summary of compound and bioactivity data"
      ],
      "metadata": {
        "id": "afNMfjvpWTFd"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "print(f\"Bioactivities filtered: {bioactivities_df.shape[0]}\")\n",
        "bioactivities_df.columns"
      ],
      "metadata": {
        "id": "sGYwdkcCWTq6",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "6ce8ca91-bb21-4903-c5fb-2d31a4782306"
      },
      "execution_count": 53,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Bioactivities filtered: 6823\n"
          ]
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "Index(['activity_id', 'assay_chembl_id', 'assay_description', 'assay_type',\n",
              "       'molecule_chembl_id', 'relation', 'units', 'IC50', 'target_chembl_id',\n",
              "       'target_organism', 'type'],\n",
              "      dtype='object')"
            ]
          },
          "metadata": {},
          "execution_count": 53
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "print(f\"Compounds filtered: {compounds_df.shape[0]}\")\n",
        "compounds_df.columns"
      ],
      "metadata": {
        "id": "7MGMVVJvZt0n",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "530e5bf9-bfae-4526-a0ef-7af9fa87e55a"
      },
      "execution_count": 54,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Compounds filtered: 6816\n"
          ]
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "Index(['molecule_chembl_id', 'smiles'], dtype='object')"
            ]
          },
          "metadata": {},
          "execution_count": 54
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Merge both datasets"
      ],
      "metadata": {
        "id": "FGcP0j4uZ2nA"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "# Merge DataFrames\n",
        "output_df = pd.merge(\n",
        "    bioactivities_df[[\"molecule_chembl_id\", \"IC50\", \"units\"]],\n",
        "    compounds_df,\n",
        "    on=\"molecule_chembl_id\",\n",
        ")\n",
        "\n",
        "# Reset row indices\n",
        "output_df.reset_index(drop=True, inplace=True)\n",
        "\n",
        "print(f\"Dataset with {output_df.shape[0]} entries.\")"
      ],
      "metadata": {
        "id": "0zDTBibAZ_Rf",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "411bc6c4-a08d-4d7e-9d84-cac4b7260ad2"
      },
      "execution_count": 55,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Dataset with 6816 entries.\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "output_df.dtypes"
      ],
      "metadata": {
        "id": "pDdiJPS5a94A",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "485a3859-1d7a-4523-c3e6-b3a657ea6610"
      },
      "execution_count": 56,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "molecule_chembl_id     object\n",
              "IC50                  float64\n",
              "units                  object\n",
              "smiles                 object\n",
              "dtype: object"
            ]
          },
          "metadata": {},
          "execution_count": 56
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "output_df.head(10)"
      ],
      "metadata": {
        "id": "aBuWZ-YIbDZY",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 363
        },
        "outputId": "5634f1cb-3adb-4de8-bf8c-bd56f5fa5f55"
      },
      "execution_count": 57,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "  molecule_chembl_id       IC50 units  \\\n",
              "0        CHEMBL68920       41.0    nM   \n",
              "1        CHEMBL69960      170.0    nM   \n",
              "2       CHEMBL137635     9300.0    nM   \n",
              "3       CHEMBL306988   500000.0    nM   \n",
              "4        CHEMBL66879  3000000.0    nM   \n",
              "5        CHEMBL77085    96000.0    nM   \n",
              "6       CHEMBL443268     5310.0    nM   \n",
              "7        CHEMBL76979   264000.0    nM   \n",
              "8        CHEMBL76589      125.0    nM   \n",
              "9        CHEMBL76904    35000.0    nM   \n",
              "\n",
              "                                              smiles  \n",
              "0  Cc1cc(C)c(/C=C2\\C(=O)Nc3ncnc(Nc4ccc(F)c(Cl)c4)...  \n",
              "1  Cc1cc(C(=O)N2CCOCC2)[nH]c1/C=C1\\C(=O)Nc2ncnc(N...  \n",
              "2        CN(c1ccccc1)c1ncnc2ccc(N/N=N/Cc3ccccn3)cc12  \n",
              "3             CC(=C(C#N)C#N)c1ccc(NC(=O)CCC(=O)O)cc1  \n",
              "4                             O=C(O)/C=C/c1ccc(O)cc1  \n",
              "5                 N#CC(C#N)=Cc1cc(O)ccc1[N+](=O)[O-]  \n",
              "6  Cc1cc(C(=O)NCCN2CCOCC2)[nH]c1/C=C1\\C(=O)N(C)c2...  \n",
              "7                  COc1cc(/C=C(\\C#N)C(=O)O)cc(OC)c1O  \n",
              "8                N#CC(C#N)=C(N)/C(C#N)=C/c1ccc(O)cc1  \n",
              "9                          N#CC(C#N)=Cc1ccc(O)c(O)c1  "
            ],
            "text/html": [
              "\n",
              "\n",
              "  <div id=\"df-416c8062-6f38-4d01-b42e-e6b854f0fb5e\">\n",
              "    <div class=\"colab-df-container\">\n",
              "      <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>molecule_chembl_id</th>\n",
              "      <th>IC50</th>\n",
              "      <th>units</th>\n",
              "      <th>smiles</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>CHEMBL68920</td>\n",
              "      <td>41.0</td>\n",
              "      <td>nM</td>\n",
              "      <td>Cc1cc(C)c(/C=C2\\C(=O)Nc3ncnc(Nc4ccc(F)c(Cl)c4)...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>CHEMBL69960</td>\n",
              "      <td>170.0</td>\n",
              "      <td>nM</td>\n",
              "      <td>Cc1cc(C(=O)N2CCOCC2)[nH]c1/C=C1\\C(=O)Nc2ncnc(N...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>CHEMBL137635</td>\n",
              "      <td>9300.0</td>\n",
              "      <td>nM</td>\n",
              "      <td>CN(c1ccccc1)c1ncnc2ccc(N/N=N/Cc3ccccn3)cc12</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>CHEMBL306988</td>\n",
              "      <td>500000.0</td>\n",
              "      <td>nM</td>\n",
              "      <td>CC(=C(C#N)C#N)c1ccc(NC(=O)CCC(=O)O)cc1</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>CHEMBL66879</td>\n",
              "      <td>3000000.0</td>\n",
              "      <td>nM</td>\n",
              "      <td>O=C(O)/C=C/c1ccc(O)cc1</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>5</th>\n",
              "      <td>CHEMBL77085</td>\n",
              "      <td>96000.0</td>\n",
              "      <td>nM</td>\n",
              "      <td>N#CC(C#N)=Cc1cc(O)ccc1[N+](=O)[O-]</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>6</th>\n",
              "      <td>CHEMBL443268</td>\n",
              "      <td>5310.0</td>\n",
              "      <td>nM</td>\n",
              "      <td>Cc1cc(C(=O)NCCN2CCOCC2)[nH]c1/C=C1\\C(=O)N(C)c2...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>7</th>\n",
              "      <td>CHEMBL76979</td>\n",
              "      <td>264000.0</td>\n",
              "      <td>nM</td>\n",
              "      <td>COc1cc(/C=C(\\C#N)C(=O)O)cc(OC)c1O</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>8</th>\n",
              "      <td>CHEMBL76589</td>\n",
              "      <td>125.0</td>\n",
              "      <td>nM</td>\n",
              "      <td>N#CC(C#N)=C(N)/C(C#N)=C/c1ccc(O)cc1</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>9</th>\n",
              "      <td>CHEMBL76904</td>\n",
              "      <td>35000.0</td>\n",
              "      <td>nM</td>\n",
              "      <td>N#CC(C#N)=Cc1ccc(O)c(O)c1</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-416c8062-6f38-4d01-b42e-e6b854f0fb5e')\"\n",
              "              title=\"Convert this dataframe to an interactive table.\"\n",
              "              style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
              "  </svg>\n",
              "      </button>\n",
              "\n",
              "\n",
              "\n",
              "    <div id=\"df-bbee8165-7caa-4792-a790-ae3807e42167\">\n",
              "      <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-bbee8165-7caa-4792-a790-ae3807e42167')\"\n",
              "              title=\"Suggest charts.\"\n",
              "              style=\"display:none;\">\n",
              "\n",
              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "     width=\"24px\">\n",
              "    <g>\n",
              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
              "    </g>\n",
              "</svg>\n",
              "      </button>\n",
              "    </div>\n",
              "\n",
              "<style>\n",
              "  .colab-df-quickchart {\n",
              "    background-color: #E8F0FE;\n",
              "    border: none;\n",
              "    border-radius: 50%;\n",
              "    cursor: pointer;\n",
              "    display: none;\n",
              "    fill: #1967D2;\n",
              "    height: 32px;\n",
              "    padding: 0 0 0 0;\n",
              "    width: 32px;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart:hover {\n",
              "    background-color: #E2EBFA;\n",
              "    box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "    fill: #174EA6;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart {\n",
              "    background-color: #3B4455;\n",
              "    fill: #D2E3FC;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart:hover {\n",
              "    background-color: #434B5C;\n",
              "    box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "    filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "    fill: #FFFFFF;\n",
              "  }\n",
              "</style>\n",
              "\n",
              "    <script>\n",
              "      async function quickchart(key) {\n",
              "        const containerElement = document.querySelector('#' + key);\n",
              "        const charts = await google.colab.kernel.invokeFunction(\n",
              "            'suggestCharts', [key], {});\n",
              "      }\n",
              "    </script>\n",
              "\n",
              "      <script>\n",
              "\n",
              "function displayQuickchartButton(domScope) {\n",
              "  let quickchartButtonEl =\n",
              "    domScope.querySelector('#df-bbee8165-7caa-4792-a790-ae3807e42167 button.colab-df-quickchart');\n",
              "  quickchartButtonEl.style.display =\n",
              "    google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "}\n",
              "\n",
              "        displayQuickchartButton(document);\n",
              "      </script>\n",
              "      <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      flex-wrap:wrap;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "      <script>\n",
              "        const buttonEl =\n",
              "          document.querySelector('#df-416c8062-6f38-4d01-b42e-e6b854f0fb5e button.colab-df-convert');\n",
              "        buttonEl.style.display =\n",
              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "        async function convertToInteractive(key) {\n",
              "          const element = document.querySelector('#df-416c8062-6f38-4d01-b42e-e6b854f0fb5e');\n",
              "          const dataTable =\n",
              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                     [key], {});\n",
              "          if (!dataTable) return;\n",
              "\n",
              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "            + ' to learn more about interactive tables.';\n",
              "          element.innerHTML = '';\n",
              "          dataTable['output_type'] = 'display_data';\n",
              "          await google.colab.output.renderOutput(dataTable, element);\n",
              "          const docLink = document.createElement('div');\n",
              "          docLink.innerHTML = docLinkHtml;\n",
              "          element.appendChild(docLink);\n",
              "        }\n",
              "      </script>\n",
              "    </div>\n",
              "  </div>\n"
            ]
          },
          "metadata": {},
          "execution_count": 57
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Add pIC50 values (for a better visualization and processing)"
      ],
      "metadata": {
        "id": "PNqDKdFlbH2_"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "def convert_ic50_to_pic50(IC50_value):\n",
        "    pIC50_value = 9 - math.log10(IC50_value)\n",
        "    return pIC50_value"
      ],
      "metadata": {
        "id": "MSdbsgQgbQvg"
      },
      "execution_count": 58,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# Apply conversion to each row of the compounds DataFrame\n",
        "output_df[\"pIC50\"] = output_df.apply(lambda x: convert_ic50_to_pic50(x.IC50), axis=1)"
      ],
      "metadata": {
        "id": "gCjIDvlYbVUv"
      },
      "execution_count": 59,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "output_df.head()"
      ],
      "metadata": {
        "id": "tH7o_kLibyLy",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 206
        },
        "outputId": "174d4c58-cb45-4b92-9773-3e30d4164cae"
      },
      "execution_count": 60,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "  molecule_chembl_id       IC50 units  \\\n",
              "0        CHEMBL68920       41.0    nM   \n",
              "1        CHEMBL69960      170.0    nM   \n",
              "2       CHEMBL137635     9300.0    nM   \n",
              "3       CHEMBL306988   500000.0    nM   \n",
              "4        CHEMBL66879  3000000.0    nM   \n",
              "\n",
              "                                              smiles     pIC50  \n",
              "0  Cc1cc(C)c(/C=C2\\C(=O)Nc3ncnc(Nc4ccc(F)c(Cl)c4)...  7.387216  \n",
              "1  Cc1cc(C(=O)N2CCOCC2)[nH]c1/C=C1\\C(=O)Nc2ncnc(N...  6.769551  \n",
              "2        CN(c1ccccc1)c1ncnc2ccc(N/N=N/Cc3ccccn3)cc12  5.031517  \n",
              "3             CC(=C(C#N)C#N)c1ccc(NC(=O)CCC(=O)O)cc1  3.301030  \n",
              "4                             O=C(O)/C=C/c1ccc(O)cc1  2.522879  "
            ],
            "text/html": [
              "\n",
              "\n",
              "  <div id=\"df-9b30a801-9db0-4f73-971f-d78e7db8dfcb\">\n",
              "    <div class=\"colab-df-container\">\n",
              "      <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>molecule_chembl_id</th>\n",
              "      <th>IC50</th>\n",
              "      <th>units</th>\n",
              "      <th>smiles</th>\n",
              "      <th>pIC50</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>CHEMBL68920</td>\n",
              "      <td>41.0</td>\n",
              "      <td>nM</td>\n",
              "      <td>Cc1cc(C)c(/C=C2\\C(=O)Nc3ncnc(Nc4ccc(F)c(Cl)c4)...</td>\n",
              "      <td>7.387216</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>CHEMBL69960</td>\n",
              "      <td>170.0</td>\n",
              "      <td>nM</td>\n",
              "      <td>Cc1cc(C(=O)N2CCOCC2)[nH]c1/C=C1\\C(=O)Nc2ncnc(N...</td>\n",
              "      <td>6.769551</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>CHEMBL137635</td>\n",
              "      <td>9300.0</td>\n",
              "      <td>nM</td>\n",
              "      <td>CN(c1ccccc1)c1ncnc2ccc(N/N=N/Cc3ccccn3)cc12</td>\n",
              "      <td>5.031517</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>CHEMBL306988</td>\n",
              "      <td>500000.0</td>\n",
              "      <td>nM</td>\n",
              "      <td>CC(=C(C#N)C#N)c1ccc(NC(=O)CCC(=O)O)cc1</td>\n",
              "      <td>3.301030</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>CHEMBL66879</td>\n",
              "      <td>3000000.0</td>\n",
              "      <td>nM</td>\n",
              "      <td>O=C(O)/C=C/c1ccc(O)cc1</td>\n",
              "      <td>2.522879</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-9b30a801-9db0-4f73-971f-d78e7db8dfcb')\"\n",
              "              title=\"Convert this dataframe to an interactive table.\"\n",
              "              style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
              "  </svg>\n",
              "      </button>\n",
              "\n",
              "\n",
              "\n",
              "    <div id=\"df-05f32a15-054d-4ea7-932d-44fcc49b6860\">\n",
              "      <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-05f32a15-054d-4ea7-932d-44fcc49b6860')\"\n",
              "              title=\"Suggest charts.\"\n",
              "              style=\"display:none;\">\n",
              "\n",
              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "     width=\"24px\">\n",
              "    <g>\n",
              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
              "    </g>\n",
              "</svg>\n",
              "      </button>\n",
              "    </div>\n",
              "\n",
              "<style>\n",
              "  .colab-df-quickchart {\n",
              "    background-color: #E8F0FE;\n",
              "    border: none;\n",
              "    border-radius: 50%;\n",
              "    cursor: pointer;\n",
              "    display: none;\n",
              "    fill: #1967D2;\n",
              "    height: 32px;\n",
              "    padding: 0 0 0 0;\n",
              "    width: 32px;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart:hover {\n",
              "    background-color: #E2EBFA;\n",
              "    box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "    fill: #174EA6;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart {\n",
              "    background-color: #3B4455;\n",
              "    fill: #D2E3FC;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart:hover {\n",
              "    background-color: #434B5C;\n",
              "    box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "    filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "    fill: #FFFFFF;\n",
              "  }\n",
              "</style>\n",
              "\n",
              "    <script>\n",
              "      async function quickchart(key) {\n",
              "        const containerElement = document.querySelector('#' + key);\n",
              "        const charts = await google.colab.kernel.invokeFunction(\n",
              "            'suggestCharts', [key], {});\n",
              "      }\n",
              "    </script>\n",
              "\n",
              "      <script>\n",
              "\n",
              "function displayQuickchartButton(domScope) {\n",
              "  let quickchartButtonEl =\n",
              "    domScope.querySelector('#df-05f32a15-054d-4ea7-932d-44fcc49b6860 button.colab-df-quickchart');\n",
              "  quickchartButtonEl.style.display =\n",
              "    google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "}\n",
              "\n",
              "        displayQuickchartButton(document);\n",
              "      </script>\n",
              "      <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      flex-wrap:wrap;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "      <script>\n",
              "        const buttonEl =\n",
              "          document.querySelector('#df-9b30a801-9db0-4f73-971f-d78e7db8dfcb button.colab-df-convert');\n",
              "        buttonEl.style.display =\n",
              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "        async function convertToInteractive(key) {\n",
              "          const element = document.querySelector('#df-9b30a801-9db0-4f73-971f-d78e7db8dfcb');\n",
              "          const dataTable =\n",
              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                     [key], {});\n",
              "          if (!dataTable) return;\n",
              "\n",
              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "            + ' to learn more about interactive tables.';\n",
              "          element.innerHTML = '';\n",
              "          dataTable['output_type'] = 'display_data';\n",
              "          await google.colab.output.renderOutput(dataTable, element);\n",
              "          const docLink = document.createElement('div');\n",
              "          docLink.innerHTML = docLinkHtml;\n",
              "          element.appendChild(docLink);\n",
              "        }\n",
              "      </script>\n",
              "    </div>\n",
              "  </div>\n"
            ]
          },
          "metadata": {},
          "execution_count": 60
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Draw compound data"
      ],
      "metadata": {
        "id": "Fa3HGN67cNgK"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "output_df.hist(column=\"pIC50\")"
      ],
      "metadata": {
        "id": "AerCnuTFcMaI",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 469
        },
        "outputId": "930e6d2f-cbd2-497a-eda3-faf4eb15ac8e"
      },
      "execution_count": 61,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "array([[<Axes: title={'center': 'pIC50'}>]], dtype=object)"
            ]
          },
          "metadata": {},
          "execution_count": 61
        },
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [
              "<Figure size 640x480 with 1 Axes>"
            ],
            "image/png": "iVBORw0KGgoAAAANSUhEUgAAAjgAAAGzCAYAAAAi6m1wAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/bCgiHAAAACXBIWXMAAA9hAAAPYQGoP6dpAAA2J0lEQVR4nO3de3RU5b3/8c+EJBNAJiF4SJhjwNRl5SqhRCFKFUpIuIgiWEpJMS0cOIcmKqRFoBIMF41EiggilJ4Kugq1tS0UkUJGoERr5BJM5VbEI4qnOEnbACNwmAyZ+f3hyvwcA9bAnkx48n6txYp7P8/e+7u/TXY+nT07YwsEAgEBAAAYJCrSBQAAAFiNgAMAAIxDwAEAAMYh4AAAAOMQcAAAgHEIOAAAwDgEHAAAYBwCDgAAMA4BBwAAGIeAAwAAjEPAAdAs3HjjjbrnnnsarL9w4YKeeeYZ9evXT/Hx8YqLi9PXv/515efn67333gvOW7t2rWw22yX/ud3uBvvdtGmTvvGNbyguLk6dO3fW448/rosXL4b1HAE0nehIFwAAl/OPf/xDQ4cOVUVFhe655x6NHz9e1113nY4ePaqXX35Zq1evVm1tbcg28+fPV2pqasi6hISEkOU//vGPGjVqlAYOHKjly5frwIEDWrhwoaqrq7Vy5cpwnxaAJkDAAdBsff/739c777yj3/72txozZkzI2IIFC/TYY4812GbYsGFKT0//0v3++Mc/1q233qrS0lJFR392GXQ4HHryySf1yCOPqGvXrtadBICI4BYVgLAqKiqSzWbTX//6V40dO1YOh0MdOnTQI488ogsXLlx2u927d+u1117TpEmTGoQbSbLb7Vq8ePElt/30009VV1d3ybHDhw/r8OHDmjJlSjDcSNIPf/hDBQIB/fa3v23kGQJojgg4AJrE2LFjdeHCBRUXF2v48OFatmyZpkyZctn5mzZtkiRNmDChUccZNGiQHA6H2rRpo3vvvVfHjh0LGX/nnXckqcGrPE6nUzfccENwHMC1jVtUAJpEamqq/vCHP0iS8vLy5HA49PzzzwdvF33RkSNHJEm9evX6Svtv06aNvv/97wcDTkVFhZYsWaI77rhD+/fvV0pKiiTpk08+kSR16tSpwT46deqkkydPXtH5AWheeAUHQJPIy8sLWX7ooYckSVu2bLnkfI/HI0lq167dV9r/2LFjtWbNGj344IMaNWqUFixYoG3btumf//ynnnjiieC8//u//5P02S2uL4qLiwuOA7i2EXAANImbb745ZPmmm25SVFSUPvzww0vOdzgckj57P82VGjBggPr166fXX389uK5169aSJK/X22D+hQsXguMArm0EHAARYbPZvnS8/kmmAwcOXNVxUlJSVFNTE1yuvzVVf6vq8z755BM5nc6rOh6A5oGAA6BJfPHNvu+//778fr9uvPHGS84fOXKkJOmXv/zlVR33gw8+0L/9278Fl9PS0iRJ+/btC5l38uRJ/e///m9wHMC1jYADoEmsWLEiZHn58uWSPvu7NZeSkZGhoUOH6r//+7+1cePGBuO1tbX68Y9/HFz++9//3mDOli1bVFFRoaFDhwbX9ejRQ127dtXq1atDHiVfuXKlbDabHnjggUadF4DmiaeoADSJ48eP695779XQoUNVXl6uX/7ylxo/frx69+592W1eeuklZWVlafTo0Ro5cqQGDx6stm3b6tixY3r55Zf1ySefBP8Wzh133KE+ffooPT1d8fHx2r9/v1544QWlpKToJz/5Sch+n376ad17773KysrSuHHjdPDgQT333HP6j//4D3Xr1i2sfQDQRAIAEEaPP/54QFLg8OHDgQceeCDQrl27QPv27QP5+fmB//u//wvO69KlS2DEiBENtj9//nxg8eLFgdtuuy1w3XXXBWJjYwM333xz4KGHHgq8//77wXmPPfZYIC0tLRAfHx+IiYkJdO7cOTB16tSA2+2+ZF0bNmwIpKWlBex2e+CGG24IzJkzJ1BbW2t9AwBEhC0QCAQiHbIAmKuoqEjz5s3T3//+d11//fWRLgdAC8F7cAAAgHEIOAAAwDgEHAAAYBzegwMAAIzDKzgAAMA4BBwAAGAcY//Qn9/v18mTJ9WuXbt/+Zk3AACgeQgEAvr000/ldDoVFXXlr8MYG3BOnjyplJSUSJcBAACuwMcff6wbbrjhirc3NuC0a9dO0mcNcjgcEa7GWj6fT6WlpcrKylJMTEykyzEO/Q0v+hte9De86G/41Pc2IyNDqampwd/jV8rYgFN/W8rhcBgZcNq0aSOHw8EPWBjQ3/Civ+FFf8OL/oZPfW/rg83Vvr2ENxkDAADjEHAAAIBxCDgAAMA4BBwAAGAcAg4AADAOAQcAABiHgAMAAIxDwAEAAMYh4AAAAOMQcAAAgHEIOAAAwDgEHAAAYBwCDgAAMA4BBwAAGCc60gUAwJW4cdZrkS6h0T58akSkSwBaDF7BAQAAxiHgAAAA4xBwAACAcQg4AADAOAQcAABgHAIOAAAwDgEHAAAYh4ADAACMQ8ABAADGIeAAAADjEHAAAIBxCDgAAMA4BBwAAGAcAg4AADBOowNOWVmZRo4cKafTKZvNpo0bNzaYc+TIEd17772Kj49X27Ztddttt+nEiRPB8QsXLigvL08dOnTQddddpzFjxqiqqipkHydOnNCIESPUpk0bdezYUTNmzNDFixcbf4YAAKDFaXTAOXfunHr37q0VK1Zccvx//ud/NGDAAHXt2lV/+tOf9O6776qwsFBxcXHBOdOnT9err76qV155Rbt27dLJkyc1evTo4HhdXZ1GjBih2tpavfXWW3rxxRe1du1azZ079wpOEQAAtDTRjd1g2LBhGjZs2GXHH3vsMQ0fPlwlJSXBdTfddFPwv8+cOaNf/OIXWr9+vb71rW9JktasWaNu3brp7bffVv/+/VVaWqrDhw/r9ddfV1JSktLS0rRgwQLNnDlTRUVFio2NbXBcr9crr9cbXPZ4PJIkn88nn8/X2NNs1urPx7Tzai7ob3hZ1V97q4AV5TSppvie4vs3vOhv+FjdW1sgELjiq4TNZtOGDRs0atQoSZLf71d8fLweffRRvfnmm3rnnXeUmpqq2bNnB+fs2LFDgwcP1qlTp5SQkBDcV5cuXTRt2jRNnz5dc+fO1aZNm1RZWRkcP378uL72ta9p//796tOnT4NaioqKNG/evAbr169frzZt2lzpKQIAgCZ0/vx5jR8/XmfOnJHD4bji/TT6FZwvU11drbNnz+qpp57SwoULtWjRIm3dulWjR4/Wzp07dffdd8vtdis2NjYk3EhSUlKS3G63JMntdispKanBeP3YpcyePVsFBQXBZY/Ho5SUFGVlZV1Vg5ojn88nl8ulIUOGKCYmJtLlGIf+hpdV/e1ZtM3CqsxhjwpoQbpfhfui5PXbrnp/B4uyLajKHFwfwqe+t4MGDbJkf5YGHL/fL0m67777NH36dElSWlqa3nrrLa1atUp33323lYcLYbfbZbfbG6yPiYkx9pvQ5HNrDuhveF1tf711V//L22Rev82SHvEzcGlcH8LHqr5a+pj49ddfr+joaHXv3j1kfbdu3YJPUSUnJ6u2tlanT58OmVNVVaXk5OTgnC8+VVW/XD8HAADgciwNOLGxsbrtttt09OjRkPXvvfeeunTpIknq27evYmJitH379uD40aNHdeLECWVkZEiSMjIydODAAVVXVwfnuFwuORyOBuEJAADgixp9i+rs2bN6//33g8vHjx9XZWWlEhMT1blzZ82YMUPf+c53dNddd2nQoEHaunWrXn31Vf3pT3+SJMXHx2vSpEkqKChQYmKiHA6HHnroIWVkZKh///6SpKysLHXv3l0TJkxQSUmJ3G635syZo7y8vEvehgIAAPi8Rgecffv2hbwBqP6Nvbm5uVq7dq3uv/9+rVq1SsXFxXr44Yd1yy236He/+50GDBgQ3OaZZ55RVFSUxowZI6/Xq+zsbD3//PPB8VatWmnz5s2aOnWqMjIy1LZtW+Xm5mr+/PlXc64AAKCFaHTAGThwoP7Vk+UTJ07UxIkTLzseFxenFStWXPaPBUqfPTa+ZcuWxpYHAADAZ1EBAADzEHAAAIBxCDgAAMA4BBwAAGAcAg4AADAOAQcAABiHgAMAAIxDwAEAAMYh4AAAAOMQcAAAgHEIOAAAwDgEHAAAYBwCDgAAMA4BBwAAGIeAAwAAjEPAAQAAxiHgAAAA4xBwAACAcQg4AADAOAQcAABgHAIOAAAwTnSkCwAQeTfOeq3JjmVvFVDJ7VLPom3y1tma7LgAWhZewQEAAMYh4AAAAOMQcAAAgHEIOAAAwDgEHAAAYBwCDgAAMA4BBwAAGIeAAwAAjNPogFNWVqaRI0fK6XTKZrNp48aNl537X//1X7LZbFq6dGnI+pqaGuXk5MjhcCghIUGTJk3S2bNnQ+a8++67+uY3v6m4uDilpKSopKSksaUCAIAWqtEB59y5c+rdu7dWrFjxpfM2bNigt99+W06ns8FYTk6ODh06JJfLpc2bN6usrExTpkwJjns8HmVlZalLly6qqKjQ008/raKiIq1evbqx5QIAgBao0R/VMGzYMA0bNuxL5/ztb3/TQw89pG3btmnEiBEhY0eOHNHWrVu1d+9epaenS5KWL1+u4cOHa/HixXI6nVq3bp1qa2v1wgsvKDY2Vj169FBlZaWWLFkSEoQAAAAuxfLPovL7/ZowYYJmzJihHj16NBgvLy9XQkJCMNxIUmZmpqKiorR7927df//9Ki8v11133aXY2NjgnOzsbC1atEinTp1S+/btG+zX6/XK6/UGlz0ejyTJ5/PJ5/NZeYoRV38+pp1Xc9ES+2tvFWi6Y0UFQr7CWlb3tyX9HHwVLfH60FSs7q3lAWfRokWKjo7Www8/fMlxt9utjh07hhYRHa3ExES53e7gnNTU1JA5SUlJwbFLBZzi4mLNmzevwfrS0lK1adPmis6luXO5XJEuwWgtqb8ltzf9MRek+5v+oC2IVf3dsmWLJfsxTUu6PjS1nTt3WrIfSwNORUWFnn32We3fv182W9N+SvDs2bNVUFAQXPZ4PEpJSVFWVpYcDkeT1hJuPp9PLpdLQ4YMUUxMTKTLMU5L7G/Pom1Ndix7VEAL0v0q3Bclr59PE7ea1f09WJRtQVXmaInXh6ZS39tBgwZZsj9LA84bb7yh6upqde7cObiurq5OP/rRj7R06VJ9+OGHSk5OVnV1dch2Fy9eVE1NjZKTkyVJycnJqqqqCplTv1w/54vsdrvsdnuD9TExMcZ+E5p8bs1BS+qvt67pg4bXb4vIcVsKq/rbUn4GGqslXR+amlV9tfTv4EyYMEHvvvuuKisrg/+cTqdmzJihbds++3+IGRkZOn36tCoqKoLb7dixQ36/X/369QvOKSsrC7kP53K5dMstt1zy9hQAAMDnNfoVnLNnz+r9998PLh8/flyVlZVKTExU586d1aFDh5D5MTExSk5O1i233CJJ6tatm4YOHarJkydr1apV8vl8ys/P17hx44KPlI8fP17z5s3TpEmTNHPmTB08eFDPPvusnnnmmas5VwAA0EI0OuDs27cv5P5Y/ftecnNztXbt2q+0j3Xr1ik/P1+DBw9WVFSUxowZo2XLlgXH4+PjVVpaqry8PPXt21fXX3+95s6dyyPiAADgK2l0wBk4cKACga/++OGHH37YYF1iYqLWr1//pdvdeuuteuONNxpbHgAAAJ9FBQAAzEPAAQAAxiHgAAAA4xBwAACAcQg4AADAOAQcAABgHAIOAAAwDgEHAAAYh4ADAACMQ8ABAADGIeAAAADjEHAAAIBxCDgAAMA4BBwAAGAcAg4AADAOAQcAABiHgAMAAIxDwAEAAMYh4AAAAOMQcAAAgHEIOAAAwDgEHAAAYBwCDgAAMA4BBwAAGIeAAwAAjEPAAQAAxiHgAAAA4xBwAACAcQg4AADAOAQcAABgHAIOAAAwTqMDTllZmUaOHCmn0ymbzaaNGzcGx3w+n2bOnKlevXqpbdu2cjqdevDBB3Xy5MmQfdTU1CgnJ0cOh0MJCQmaNGmSzp49GzLn3Xff1Te/+U3FxcUpJSVFJSUlV3aGAACgxWl0wDl37px69+6tFStWNBg7f/689u/fr8LCQu3fv1+///3vdfToUd17770h83JycnTo0CG5XC5t3rxZZWVlmjJlSnDc4/EoKytLXbp0UUVFhZ5++mkVFRVp9erVV3CKAACgpYlu7AbDhg3TsGHDLjkWHx8vl8sVsu65557T7bffrhMnTqhz5846cuSItm7dqr179yo9PV2StHz5cg0fPlyLFy+W0+nUunXrVFtbqxdeeEGxsbHq0aOHKisrtWTJkpAgBAAAcCmNDjiNdebMGdlsNiUkJEiSysvLlZCQEAw3kpSZmamoqCjt3r1b999/v8rLy3XXXXcpNjY2OCc7O1uLFi3SqVOn1L59+wbH8Xq98nq9wWWPxyPps9tmPp8vTGcXGfXnY9p5NRctsb/2VoGmO1ZUIOQrrGV1f1vSz8FX0RKvD03F6t6GNeBcuHBBM2fO1He/+105HA5JktvtVseOHUOLiI5WYmKi3G53cE5qamrInKSkpODYpQJOcXGx5s2b12B9aWmp2rRpY8n5NDdffLUM1mpJ/S25vemPuSDd3/QHbUGs6u+WLVss2Y9pWtL1oant3LnTkv2ELeD4fD6NHTtWgUBAK1euDNdhgmbPnq2CgoLgssfjUUpKirKysoLhyhQ+n08ul0tDhgxRTExMpMsxTkvsb8+ibU12LHtUQAvS/SrcFyWv39Zkx20prO7vwaJsC6oyR0u8PjSV+t4OGjTIkv2FJeDUh5uPPvpIO3bsCAkYycnJqq6uDpl/8eJF1dTUKDk5OTinqqoqZE79cv2cL7Lb7bLb7Q3Wx8TEGPtNaPK5NQctqb/euqYPGl6/LSLHbSms6m9L+RlorJZ0fWhqVvXV8r+DUx9ujh07ptdff10dOnQIGc/IyNDp06dVUVERXLdjxw75/X7169cvOKesrCzkPpzL5dItt9xyydtTAAAAn9fogHP27FlVVlaqsrJSknT8+HFVVlbqxIkT8vl8euCBB7Rv3z6tW7dOdXV1crvdcrvdqq2tlSR169ZNQ4cO1eTJk7Vnzx79+c9/Vn5+vsaNGyen0ylJGj9+vGJjYzVp0iQdOnRIv/71r/Xss8+G3IICAAC4nEbfotq3b1/I/bH60JGbm6uioiJt2rRJkpSWlhay3c6dOzVw4EBJ0rp165Sfn6/BgwcrKipKY8aM0bJly4Jz4+PjVVpaqry8PPXt21fXX3+95s6dyyPiAADgK2l0wBk4cKACgcs/fvhlY/USExO1fv36L51z66236o033mhseQAAAHwWFQAAMA8BBwAAGIeAAwAAjEPAAQAAxiHgAAAA4xBwAACAcQg4AADAOAQcAABgHAIOAAAwDgEHAAAYh4ADAACMQ8ABAADGIeAAAADjEHAAAIBxCDgAAMA4BBwAAGAcAg4AADAOAQcAABiHgAMAAIxDwAEAAMYh4AAAAOMQcAAAgHEIOAAAwDgEHAAAYBwCDgAAMA4BBwAAGIeAAwAAjEPAAQAAxiHgAAAA4xBwAACAcQg4AADAOI0OOGVlZRo5cqScTqdsNps2btwYMh4IBDR37lx16tRJrVu3VmZmpo4dOxYyp6amRjk5OXI4HEpISNCkSZN09uzZkDnvvvuuvvnNbyouLk4pKSkqKSlp/NkBAIAWKbqxG5w7d069e/fWxIkTNXr06AbjJSUlWrZsmV588UWlpqaqsLBQ2dnZOnz4sOLi4iRJOTk5+uSTT+RyueTz+fSDH/xAU6ZM0fr16yVJHo9HWVlZyszM1KpVq3TgwAFNnDhRCQkJmjJlylWeMhBeN856LdIlAECL1+iAM2zYMA0bNuySY4FAQEuXLtWcOXN03333SZJeeuklJSUlaePGjRo3bpyOHDmirVu3au/evUpPT5ckLV++XMOHD9fixYvldDq1bt061dbW6oUXXlBsbKx69OihyspKLVmyhIADAAD+pUYHnC9z/Phxud1uZWZmBtfFx8erX79+Ki8v17hx41ReXq6EhIRguJGkzMxMRUVFaffu3br//vtVXl6uu+66S7GxscE52dnZWrRokU6dOqX27ds3OLbX65XX6w0uezweSZLP55PP57PyNCOu/nxMO6/m4mr7a28VsLIc49ijAiFfYS2r+8t1JhTX3/CxureWBhy32y1JSkpKClmflJQUHHO73erYsWNoEdHRSkxMDJmTmpraYB/1Y5cKOMXFxZo3b16D9aWlpWrTps0VnlHz5nK5Il2C0a60vyW3W1yIoRak+yNdgtGs6u+WLVss2Y9puP6Gz86dOy3Zj6UBJ5Jmz56tgoKC4LLH41FKSoqysrLkcDgiWJn1fD6fXC6XhgwZopiYmEiXY5yr7W/Pom1hqMoc9qiAFqT7VbgvSl6/LdLlGMfq/h4syragKnNw/Q2f+t4OGjTIkv1ZGnCSk5MlSVVVVerUqVNwfVVVldLS0oJzqqurQ7a7ePGiampqgtsnJyerqqoqZE79cv2cL7Lb7bLb7Q3Wx8TEGPtNaPK5NQdX2l9vHb+0vwqv30avwsiq/nKNuTSuv+FjVV8t/Ts4qampSk5O1vbt24PrPB6Pdu/erYyMDElSRkaGTp8+rYqKiuCcHTt2yO/3q1+/fsE5ZWVlIffhXC6XbrnllkvengIAAPi8Rgecs2fPqrKyUpWVlZI+e2NxZWWlTpw4IZvNpmnTpmnhwoXatGmTDhw4oAcffFBOp1OjRo2SJHXr1k1Dhw7V5MmTtWfPHv35z39Wfn6+xo0bJ6fTKUkaP368YmNjNWnSJB06dEi//vWv9eyzz4bcggIAALicRt+i2rdvX8j9sfrQkZubq7Vr1+rRRx/VuXPnNGXKFJ0+fVoDBgzQ1q1bg38DR5LWrVun/Px8DR48WFFRURozZoyWLVsWHI+Pj1dpaany8vLUt29fXX/99Zo7dy6PiAMAgK+k0QFn4MCBCgQu//ihzWbT/PnzNX/+/MvOSUxMDP5Rv8u59dZb9cYbbzS2PAAAAD6LCgAAmIeAAwAAjEPAAQAAxiHgAAAA4xBwAACAcQg4AADAOAQcAABgHAIOAAAwDgEHAAAYh4ADAACMQ8ABAADGIeAAAADjEHAAAIBxCDgAAMA4BBwAAGAcAg4AADAOAQcAABiHgAMAAIxDwAEAAMYh4AAAAOMQcAAAgHEIOAAAwDgEHAAAYBwCDgAAMA4BBwAAGIeAAwAAjEPAAQAAxiHgAAAA4xBwAACAcQg4AADAOAQcAABgHMsDTl1dnQoLC5WamqrWrVvrpptu0oIFCxQIBIJzAoGA5s6dq06dOql169bKzMzUsWPHQvZTU1OjnJwcORwOJSQkaNKkSTp79qzV5QIAAANZHnAWLVqklStX6rnnntORI0e0aNEilZSUaPny5cE5JSUlWrZsmVatWqXdu3erbdu2ys7O1oULF4JzcnJydOjQIblcLm3evFllZWWaMmWK1eUCAAADRVu9w7feekv33XefRowYIUm68cYb9atf/Up79uyR9NmrN0uXLtWcOXN03333SZJeeuklJSUlaePGjRo3bpyOHDmirVu3au/evUpPT5ckLV++XMOHD9fixYvldDobHNfr9crr9QaXPR6PJMnn88nn81l9mhFVfz6mnVdzcbX9tbcK/OtJLZg9KhDyFdayur9cZ0Jx/Q0fq3trC3z+3pEFnnzySa1evVqlpaX6+te/rr/85S/KysrSkiVLlJOTow8++EA33XST3nnnHaWlpQW3u/vuu5WWlqZnn31WL7zwgn70ox/p1KlTwfGLFy8qLi5Or7zyiu6///4Gxy0qKtK8efMarF+/fr3atGlj5SkCAIAwOX/+vMaPH68zZ87I4XBc8X4sfwVn1qxZ8ng86tq1q1q1aqW6ujo98cQTysnJkSS53W5JUlJSUsh2SUlJwTG3262OHTuGFhodrcTExOCcL5o9e7YKCgqCyx6PRykpKcrKyrqqBjVHPp9PLpdLQ4YMUUxMTKTLMc7V9rdn0bYwVGUOe1RAC9L9KtwXJa/fFulyjGN1fw8WZVtQlTm4/oZPfW8HDRpkyf4sDzi/+c1vtG7dOq1fv149evRQZWWlpk2bJqfTqdzcXKsPF2S322W32xusj4mJMfab0ORzaw6utL/eOn5pfxVev41ehZFV/eUac2lcf8PHqr5aHnBmzJihWbNmady4cZKkXr166aOPPlJxcbFyc3OVnJwsSaqqqlKnTp2C21VVVQVvWSUnJ6u6ujpkvxcvXlRNTU1wewAAgMux/Cmq8+fPKyoqdLetWrWS3++XJKWmpio5OVnbt28Pjns8Hu3evVsZGRmSpIyMDJ0+fVoVFRXBOTt27JDf71e/fv2sLhkAABjG8ldwRo4cqSeeeEKdO3dWjx499M4772jJkiWaOHGiJMlms2natGlauHChbr75ZqWmpqqwsFBOp1OjRo2SJHXr1k1Dhw7V5MmTtWrVKvl8PuXn52vcuHGXfIIKAADg8ywPOMuXL1dhYaF++MMfqrq6Wk6nU//5n/+puXPnBuc8+uijOnfunKZMmaLTp09rwIAB2rp1q+Li4oJz1q1bp/z8fA0ePFhRUVEaM2aMli1bZnW5AADAQJYHnHbt2mnp0qVaunTpZefYbDbNnz9f8+fPv+ycxMRErV+/3uryAABAC8BnUQEAAOMQcAAAgHEIOAAAwDgEHAAAYBwCDgAAMA4BBwAAGIeAAwAAjEPAAQAAxiHgAAAA4xBwAACAcQg4AADAOAQcAABgHAIOAAAwDgEHAAAYh4ADAACMQ8ABAADGIeAAAADjEHAAAIBxCDgAAMA40ZEuAADQfN0467VIl9BoHz41ItIloBngFRwAAGAcAg4AADAOAQcAABiHgAMAAIxDwAEAAMYh4AAAAOMQcAAAgHEIOAAAwDgEHAAAYBwCDgAAME5YAs7f/vY3fe9731OHDh3UunVr9erVS/v27QuOBwIBzZ07V506dVLr1q2VmZmpY8eOheyjpqZGOTk5cjgcSkhI0KRJk3T27NlwlAsAAAxjecA5deqU7rzzTsXExOiPf/yjDh8+rJ/+9Kdq3759cE5JSYmWLVumVatWaffu3Wrbtq2ys7N14cKF4JycnBwdOnRILpdLmzdvVllZmaZMmWJ1uQAAwECWf9jmokWLlJKSojVr1gTXpaamBv87EAho6dKlmjNnju677z5J0ksvvaSkpCRt3LhR48aN05EjR7R161bt3btX6enpkqTly5dr+PDhWrx4sZxOp9VlAwAAg1gecDZt2qTs7Gx9+9vf1q5du/Tv//7v+uEPf6jJkydLko4fPy63263MzMzgNvHx8erXr5/Ky8s1btw4lZeXKyEhIRhuJCkzM1NRUVHavXu37r///gbH9Xq98nq9wWWPxyNJ8vl88vl8Vp9mRNWfj2nn1VxcbX/trQJWlmMce1Qg5CusRX/De23k+hs+VvfW8oDzwQcfaOXKlSooKNBPfvIT7d27Vw8//LBiY2OVm5srt9stSUpKSgrZLikpKTjmdrvVsWPH0EKjo5WYmBic80XFxcWaN29eg/WlpaVq06aNFafW7LhcrkiXYLQr7W/J7RYXYqgF6f5Il2C0ltzfLVu2hP0YXH/DZ+fOnZbsx/KA4/f7lZ6erieffFKS1KdPHx08eFCrVq1Sbm6u1YcLmj17tgoKCoLLHo9HKSkpysrKksPhCNtxI8Hn88nlcmnIkCGKiYmJdDnGudr+9izaFoaqzGGPCmhBul+F+6Lk9dsiXY5x6K90sCg7bPvm+hs+9b0dNGiQJfuzPOB06tRJ3bt3D1nXrVs3/e53v5MkJScnS5KqqqrUqVOn4JyqqiqlpaUF51RXV4fs4+LFi6qpqQlu/0V2u112u73B+piYGGO/CU0+t+bgSvvrrWuZv1Qay+u30aswasn9bYrrItff8LGqr5Y/RXXnnXfq6NGjIevee+89denSRdJnbzhOTk7W9u3bg+Mej0e7d+9WRkaGJCkjI0OnT59WRUVFcM6OHTvk9/vVr18/q0sGAACGsfwVnOnTp+uOO+7Qk08+qbFjx2rPnj1avXq1Vq9eLUmy2WyaNm2aFi5cqJtvvlmpqakqLCyU0+nUqFGjJH32is/QoUM1efJkrVq1Sj6fT/n5+Ro3bhxPUAEAgH/J8oBz2223acOGDZo9e7bmz5+v1NRULV26VDk5OcE5jz76qM6dO6cpU6bo9OnTGjBggLZu3aq4uLjgnHXr1ik/P1+DBw9WVFSUxowZo2XLllldLgAAMJDlAUeS7rnnHt1zzz2XHbfZbJo/f77mz59/2TmJiYlav359OMoDAACG47OoAACAcQg4AADAOAQcAABgHAIOAAAwDgEHAAAYh4ADAACMQ8ABAADGIeAAAADjEHAAAIBxCDgAAMA4BBwAAGAcAg4AADAOAQcAABiHgAMAAIxDwAEAAMYh4AAAAOMQcAAAgHEIOAAAwDgEHAAAYBwCDgAAMA4BBwAAGIeAAwAAjEPAAQAAxiHgAAAA4xBwAACAcQg4AADAOAQcAABgHAIOAAAwDgEHAAAYh4ADAACMQ8ABAADGCXvAeeqpp2Sz2TRt2rTgugsXLigvL08dOnTQddddpzFjxqiqqipkuxMnTmjEiBFq06aNOnbsqBkzZujixYvhLhcAABggrAFn7969+tnPfqZbb701ZP306dP16quv6pVXXtGuXbt08uRJjR49OjheV1enESNGqLa2Vm+99ZZefPFFrV27VnPnzg1nuQAAwBBhCzhnz55VTk6Ofv7zn6t9+/bB9WfOnNEvfvELLVmyRN/61rfUt29frVmzRm+99ZbefvttSVJpaakOHz6sX/7yl0pLS9OwYcO0YMECrVixQrW1teEqGQAAGCI6XDvOy8vTiBEjlJmZqYULFwbXV1RUyOfzKTMzM7iua9eu6ty5s8rLy9W/f3+Vl5erV69eSkpKCs7Jzs7W1KlTdejQIfXp06fB8bxer7xeb3DZ4/FIknw+n3w+XzhOMWLqz8e082ourra/9lYBK8sxjj0qEPIV1qK/4b02cv0NH6t7G5aA8/LLL2v//v3au3dvgzG3263Y2FglJCSErE9KSpLb7Q7O+Xy4qR+vH7uU4uJizZs3r8H60tJStWnT5kpOo9lzuVyRLsFoV9rfktstLsRQC9L9kS7BaC25v1u2bAn7Mbj+hs/OnTst2Y/lAefjjz/WI488IpfLpbi4OKt3f1mzZ89WQUFBcNnj8SglJUVZWVlyOBxNVkdT8Pl8crlcGjJkiGJiYiJdjnGutr89i7aFoSpz2KMCWpDuV+G+KHn9tkiXYxz6Kx0syg7bvrn+hk99bwcNGmTJ/iwPOBUVFaqurtY3vvGN4Lq6ujqVlZXpueee07Zt21RbW6vTp0+HvIpTVVWl5ORkSVJycrL27NkTst/6p6zq53yR3W6X3W5vsD4mJsbYb0KTz605uNL+euta5i+VxvL6bfQqjFpyf5viusj1N3ys6qvlbzIePHiwDhw4oMrKyuC/9PR05eTkBP87JiZG27dvD25z9OhRnThxQhkZGZKkjIwMHThwQNXV1cE5LpdLDodD3bt3t7pkAABgGMtfwWnXrp169uwZsq5t27bq0KFDcP2kSZNUUFCgxMREORwOPfTQQ8rIyFD//v0lSVlZWerevbsmTJigkpISud1uzZkzR3l5eZd8lQYAAODzwvYU1Zd55plnFBUVpTFjxsjr9So7O1vPP/98cLxVq1bavHmzpk6dqoyMDLVt21a5ubmaP39+JMoFAADXmCYJOH/6059CluPi4rRixQqtWLHistt06dKlSd4JDwAAzMNnUQEAAOMQcAAAgHEIOAAAwDgEHAAAYBwCDgAAMA4BBwAAGIeAAwAAjEPAAQAAxiHgAAAA4xBwAACAcQg4AADAOAQcAABgHAIOAAAwDgEHAAAYh4ADAACMQ8ABAADGIeAAAADjEHAAAIBxCDgAAMA4BBwAAGAcAg4AADAOAQcAABiHgAMAAIxDwAEAAMYh4AAAAOMQcAAAgHEIOAAAwDgEHAAAYBwCDgAAMA4BBwAAGIeAAwAAjBNt9Q6Li4v1+9//Xn/961/VunVr3XHHHVq0aJFuueWW4JwLFy7oRz/6kV5++WV5vV5lZ2fr+eefV1JSUnDOiRMnNHXqVO3cuVPXXXedcnNzVVxcrOhoy0tGM3bjrNea/Jj2VgGV3C71LNomb52tyY8PALh6lr+Cs2vXLuXl5entt9+Wy+WSz+dTVlaWzp07F5wzffp0vfrqq3rllVe0a9cunTx5UqNHjw6O19XVacSIEaqtrdVbb72lF198UWvXrtXcuXOtLhcAABjI8pdDtm7dGrK8du1adezYURUVFbrrrrt05swZ/eIXv9D69ev1rW99S5K0Zs0adevWTW+//bb69++v0tJSHT58WK+//rqSkpKUlpamBQsWaObMmSoqKlJsbKzVZQMAAIOE/X7PmTNnJEmJiYmSpIqKCvl8PmVmZgbndO3aVZ07d1Z5ebn69++v8vJy9erVK+SWVXZ2tqZOnapDhw6pT58+DY7j9Xrl9XqDyx6PR5Lk8/nk8/nCcm6RUn8+pp3XpdhbBZr+mFGBkK+wFv0NL/ob3mtjS7r+NjWrexvWgOP3+zVt2jTdeeed6tmzpyTJ7XYrNjZWCQkJIXOTkpLkdruDcz4fburH68cupbi4WPPmzWuwvrS0VG3atLnaU2mWXC5XpEsIu5LbI3fsBen+yB28BaC/4dWS+7tly5awH6MlXH8jZefOnZbsJ6wBJy8vTwcPHtSbb74ZzsNIkmbPnq2CgoLgssfjUUpKirKysuRwOMJ+/Kbk8/nkcrk0ZMgQxcTERLqcsOpZtK3Jj2mPCmhBul+F+6Lk9fMmY6vR3/Civ9LBouyw7bslXX+bWn1vBw0aZMn+whZw8vPztXnzZpWVlemGG24Irk9OTlZtba1Onz4d8ipOVVWVkpOTg3P27NkTsr+qqqrg2KXY7XbZ7fYG62NiYoz9JjT53OpF8ikmr9/GU1RhRH/DqyX3tymuiy3h+hspVvXV8qeoAoGA8vPztWHDBu3YsUOpqakh43379lVMTIy2b98eXHf06FGdOHFCGRkZkqSMjAwdOHBA1dXVwTkul0sOh0Pdu3e3umQAAGAYy1/BycvL0/r16/WHP/xB7dq1C75nJj4+Xq1bt1Z8fLwmTZqkgoICJSYmyuFw6KGHHlJGRob69+8vScrKylL37t01YcIElZSUyO12a86cOcrLy7vkqzQAAACfZ3nAWblypSRp4MCBIevXrFmj73//+5KkZ555RlFRURozZkzIH/qr16pVK23evFlTp05VRkaG2rZtq9zcXM2fP9/qcgEAgIEsDziBwL9+NDEuLk4rVqzQihUrLjunS5cuTfJOeAAAYB4+iwoAABiHgAMAAIxDwAEAAMYh4AAAAOMQcAAAgHEIOAAAwDgEHAAAYBwCDgAAMA4BBwAAGIeAAwAAjGP5RzUAABBJN856LWz7trcKqOR2qWfRNnnrbJbt98OnRli2L3yGV3AAAIBxCDgAAMA4BBwAAGAcAg4AADAOAQcAABiHgAMAAIxDwAEAAMYh4AAAAOMQcAAAgHEIOAAAwDgEHAAAYBwCDgAAMA4BBwAAGIeAAwAAjEPAAQAAxiHgAAAA40RHugA0nRtnvRbpEgAAaBK8ggMAAIxDwAEAAMZp1gFnxYoVuvHGGxUXF6d+/fppz549kS4JAABcA5ptwPn1r3+tgoICPf7449q/f7969+6t7OxsVVdXR7o0AADQzDXbNxkvWbJEkydP1g9+8ANJ0qpVq/Taa6/phRde0KxZsyJcHQAA1rkWHwL58KkRkS7hSzXLgFNbW6uKigrNnj07uC4qKkqZmZkqLy+/5DZer1derze4fObMGUlSTU2NfD6f5TX2K95u+T6/KntUQHP6+JX22O/l9du+8nbN8n/sZijaH9D5835F+6JU14j+4quhv+FFf8OL/v5///znPy3dn8/n0/nz51VTUyNJCgQCV7W/Zvk77x//+Ifq6uqUlJQUsj4pKUl//etfL7lNcXGx5s2b12B9ampqWGqMtPGRLsBw9De86G940d/wor+fuf6n4d3/p59+qvj4+CvevlkGnCsxe/ZsFRQUBJf9fr9qamrUoUMH2WxmpWyPx6OUlBR9/PHHcjgckS7HOPQ3vOhveNHf8KK/4VPf2xMnTshms8npdF7V/pplwLn++uvVqlUrVVVVhayvqqpScnLyJbex2+2y2+0h6xISEsJVYrPgcDj4AQsj+hte9De86G940d/wiY+Pt6S3zfIpqtjYWPXt21fbt///97n4/X5t375dGRkZEawMAABcC5rlKziSVFBQoNzcXKWnp+v222/X0qVLde7cueBTVQAAAJfTbAPOd77zHf3973/X3Llz5Xa7lZaWpq1btzZ443FLZLfb9fjjjze4JQdr0N/wor/hRX/Di/6Gj9W9tQWu9jksAACAZqZZvgcHAADgahBwAACAcQg4AADAOAQcAABgHAIOAAAwDgHnGlJcXKzbbrtN7dq1U8eOHTVq1CgdPXo00mUZ6amnnpLNZtO0adMiXYox/va3v+l73/ueOnTooNatW6tXr17at29fpMsyQl1dnQoLC5WamqrWrVvrpptu0oIFC676wwpbqrKyMo0cOVJOp1M2m00bN24MGQ8EApo7d646deqk1q1bKzMzU8eOHYtMsdegL+uvz+fTzJkz1atXL7Vt21ZOp1MPPvigTp482ejjEHCuIbt27VJeXp7efvttuVwu+Xw+ZWVl6dy5c5EuzSh79+7Vz372M916662RLsUYp06d0p133qmYmBj98Y9/1OHDh/XTn/5U7du3j3RpRli0aJFWrlyp5557TkeOHNGiRYtUUlKi5cuXR7q0a9K5c+fUu3dvrVix4pLjJSUlWrZsmVatWqXdu3erbdu2ys7O1oULF5q40mvTl/X3/Pnz2r9/vwoLC7V//379/ve/19GjR3Xvvfc2/kABXLOqq6sDkgK7du2KdCnG+PTTTwM333xzwOVyBe6+++7AI488EumSjDBz5szAgAEDIl2GsUaMGBGYOHFiyLrRo0cHcnJyIlSROSQFNmzYEFz2+/2B5OTkwNNPPx1cd/r06YDdbg/86le/ikCF17Yv9vdS9uzZE5AU+Oijjxq1b17BuYadOXNGkpSYmBjhSsyRl5enESNGKDMzM9KlGGXTpk1KT0/Xt7/9bXXs2FF9+vTRz3/+80iXZYw77rhD27dv13vvvSdJ+stf/qI333xTw4YNi3Bl5jl+/LjcbnfINSI+Pl79+vVTeXl5BCsz15kzZ2Sz2Rr9AdrN9qMa8OX8fr+mTZumO++8Uz179ox0OUZ4+eWXtX//fu3duzfSpRjngw8+0MqVK1VQUKCf/OQn2rt3rx5++GHFxsYqNzc30uVd82bNmiWPx6OuXbuqVatWqqur0xNPPKGcnJxIl2Yct9stSQ0+NigpKSk4ButcuHBBM2fO1He/+91Gf8I4AecalZeXp4MHD+rNN9+MdClG+Pjjj/XII4/I5XIpLi4u0uUYx+/3Kz09XU8++aQkqU+fPjp48KBWrVpFwLHAb37zG61bt07r169Xjx49VFlZqWnTpsnpdNJfXLN8Pp/Gjh2rQCCglStXNnp7blFdg/Lz87V582bt3LlTN9xwQ6TLMUJFRYWqq6v1jW98Q9HR0YqOjtauXbu0bNkyRUdHq66uLtIlXtM6deqk7t27h6zr1q2bTpw4EaGKzDJjxgzNmjVL48aNU69evTRhwgRNnz5dxcXFkS7NOMnJyZKkqqqqkPVVVVXBMVy9+nDz0UcfyeVyNfrVG4mAc00JBALKz8/Xhg0btGPHDqWmpka6JGMMHjxYBw4cUGVlZfBfenq6cnJyVFlZqVatWkW6xGvanXfe2eBPGrz33nvq0qVLhCoyy/nz5xUVFXo5b9Wqlfx+f4QqMldqaqqSk5O1ffv24DqPx6Pdu3crIyMjgpWZoz7cHDt2TK+//ro6dOhwRfvhFtU1JC8vT+vXr9cf/vAHtWvXLni/Nz4+Xq1bt45wdde2du3aNXgvU9u2bdWhQwfe42SB6dOn64477tCTTz6psWPHas+ePVq9erVWr14d6dKMMHLkSD3xxBPq3LmzevTooXfeeUdLlizRxIkTI13aNens2bN6//33g8vHjx9XZWWlEhMT1blzZ02bNk0LFy7UzTffrNTUVBUWFsrpdGrUqFGRK/oa8mX97dSpkx544AHt379fmzdvVl1dXfB3XWJiomJjY7/6ga742S40OUmX/LdmzZpIl2YkHhO31quvvhro2bNnwG63B7p27RpYvXp1pEsyhsfjCTzyyCOBzp07B+Li4gJf+9rXAo899ljA6/VGurRr0s6dOy95rc3NzQ0EAp89Kl5YWBhISkoK2O32wODBgwNHjx6NbNHXkC/r7/Hjxy/7u27nzp2NOo4tEOBPXQIAALPwHhwAAGAcAg4AADAOAQcAABiHgAMAAIxDwAEAAMYh4AAAAOMQcAAAgHEIOAAAwDgEHAAAYBwCDgAAMA4BBwAAGOf/AaIwOkGDv6xOAAAAAElFTkSuQmCC\n"
          },
          "metadata": {}
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "# Add molecule column\n",
        "PandasTools.AddMoleculeColumnToFrame(output_df, smilesCol=\"smiles\")"
      ],
      "metadata": {
        "id": "YAYGFRulcybY"
      },
      "execution_count": 62,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# Sort molecules by pIC50\n",
        "output_df.sort_values(by=\"pIC50\", ascending=False, inplace=True)"
      ],
      "metadata": {
        "id": "hZFyN8i5c7WK"
      },
      "execution_count": 63,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# Reset index\n",
        "output_df.reset_index(drop=True, inplace=True)"
      ],
      "metadata": {
        "id": "cahO1MlXc9xo"
      },
      "execution_count": 64,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "output_df.drop(\"smiles\", axis=1).head(10)"
      ],
      "metadata": {
        "id": "PjF0ghp5dAS3"
      },
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "print(f\"DataFrame shape: {output_df.shape}\")"
      ],
      "metadata": {
        "id": "vDcnL71Vd3_H",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "01bbd1ce-bfc2-49b5-87ff-1e3705723403"
      },
      "execution_count": 66,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "DataFrame shape: (6816, 6)\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "output_df.to_csv(\"EGFR_compounds.csv\")\n",
        "output_df.head()"
      ],
      "metadata": {
        "id": "GuvoBlHmd_zh",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 337
        },
        "outputId": "90355bd5-153a-4154-9453-ceac65b2e778"
      },
      "execution_count": 67,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "  molecule_chembl_id   IC50 units                               smiles  \\\n",
              "0        CHEMBL63786  0.003    nM    Brc1cccc(Nc2ncnc3cc4ccccc4cc23)c1   \n",
              "1        CHEMBL53711  0.006    nM   CN(C)c1cc2c(Nc3cccc(Br)c3)ncnc2cn1   \n",
              "2        CHEMBL35820  0.006    nM  CCOc1cc2ncnc(Nc3cccc(Br)c3)c2cc1OCC   \n",
              "3        CHEMBL53753  0.008    nM      CNc1cc2c(Nc3cccc(Br)c3)ncnc2cn1   \n",
              "4        CHEMBL66031  0.008    nM  Brc1cccc(Nc2ncnc3cc4[nH]cnc4cc23)c1   \n",
              "\n",
              "       pIC50                                             ROMol  \n",
              "0  11.522879  <rdkit.Chem.rdchem.Mol object at 0x78dddbff6ff0>  \n",
              "1  11.221849  <rdkit.Chem.rdchem.Mol object at 0x78dddbffa7a0>  \n",
              "2  11.221849  <rdkit.Chem.rdchem.Mol object at 0x78dddc005930>  \n",
              "3  11.096910  <rdkit.Chem.rdchem.Mol object at 0x78dddc0035a0>  \n",
              "4  11.096910  <rdkit.Chem.rdchem.Mol object at 0x78dddbffcb30>  "
            ],
            "text/html": [
              "\n",
              "\n",
              "  <div id=\"df-756707e0-08d9-46ed-860e-11c85cc2ec52\">\n",
              "    <div class=\"colab-df-container\">\n",
              "      <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>molecule_chembl_id</th>\n",
              "      <th>IC50</th>\n",
              "      <th>units</th>\n",
              "      <th>smiles</th>\n",
              "      <th>pIC50</th>\n",
              "      <th>ROMol</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>CHEMBL63786</td>\n",
              "      <td>0.003</td>\n",
              "      <td>nM</td>\n",
              "      <td>Brc1cccc(Nc2ncnc3cc4ccccc4cc23)c1</td>\n",
              "      <td>11.522879</td>\n",
              "      <td>&lt;rdkit.Chem.rdchem.Mol object at 0x78dddbff6ff0&gt;</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>CHEMBL53711</td>\n",
              "      <td>0.006</td>\n",
              "      <td>nM</td>\n",
              "      <td>CN(C)c1cc2c(Nc3cccc(Br)c3)ncnc2cn1</td>\n",
              "      <td>11.221849</td>\n",
              "      <td>&lt;rdkit.Chem.rdchem.Mol object at 0x78dddbffa7a0&gt;</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>CHEMBL35820</td>\n",
              "      <td>0.006</td>\n",
              "      <td>nM</td>\n",
              "      <td>CCOc1cc2ncnc(Nc3cccc(Br)c3)c2cc1OCC</td>\n",
              "      <td>11.221849</td>\n",
              "      <td>&lt;rdkit.Chem.rdchem.Mol object at 0x78dddc005930&gt;</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>CHEMBL53753</td>\n",
              "      <td>0.008</td>\n",
              "      <td>nM</td>\n",
              "      <td>CNc1cc2c(Nc3cccc(Br)c3)ncnc2cn1</td>\n",
              "      <td>11.096910</td>\n",
              "      <td>&lt;rdkit.Chem.rdchem.Mol object at 0x78dddc0035a0&gt;</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>CHEMBL66031</td>\n",
              "      <td>0.008</td>\n",
              "      <td>nM</td>\n",
              "      <td>Brc1cccc(Nc2ncnc3cc4[nH]cnc4cc23)c1</td>\n",
              "      <td>11.096910</td>\n",
              "      <td>&lt;rdkit.Chem.rdchem.Mol object at 0x78dddbffcb30&gt;</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-756707e0-08d9-46ed-860e-11c85cc2ec52')\"\n",
              "              title=\"Convert this dataframe to an interactive table.\"\n",
              "              style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
              "  </svg>\n",
              "      </button>\n",
              "\n",
              "\n",
              "\n",
              "    <div id=\"df-8ca104ca-8aae-4952-98cc-56bb45da041a\">\n",
              "      <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-8ca104ca-8aae-4952-98cc-56bb45da041a')\"\n",
              "              title=\"Suggest charts.\"\n",
              "              style=\"display:none;\">\n",
              "\n",
              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "     width=\"24px\">\n",
              "    <g>\n",
              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
              "    </g>\n",
              "</svg>\n",
              "      </button>\n",
              "    </div>\n",
              "\n",
              "<style>\n",
              "  .colab-df-quickchart {\n",
              "    background-color: #E8F0FE;\n",
              "    border: none;\n",
              "    border-radius: 50%;\n",
              "    cursor: pointer;\n",
              "    display: none;\n",
              "    fill: #1967D2;\n",
              "    height: 32px;\n",
              "    padding: 0 0 0 0;\n",
              "    width: 32px;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart:hover {\n",
              "    background-color: #E2EBFA;\n",
              "    box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "    fill: #174EA6;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart {\n",
              "    background-color: #3B4455;\n",
              "    fill: #D2E3FC;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart:hover {\n",
              "    background-color: #434B5C;\n",
              "    box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "    filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "    fill: #FFFFFF;\n",
              "  }\n",
              "</style>\n",
              "\n",
              "    <script>\n",
              "      async function quickchart(key) {\n",
              "        const containerElement = document.querySelector('#' + key);\n",
              "        const charts = await google.colab.kernel.invokeFunction(\n",
              "            'suggestCharts', [key], {});\n",
              "      }\n",
              "    </script>\n",
              "\n",
              "      <script>\n",
              "\n",
              "function displayQuickchartButton(domScope) {\n",
              "  let quickchartButtonEl =\n",
              "    domScope.querySelector('#df-8ca104ca-8aae-4952-98cc-56bb45da041a button.colab-df-quickchart');\n",
              "  quickchartButtonEl.style.display =\n",
              "    google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "}\n",
              "\n",
              "        displayQuickchartButton(document);\n",
              "      </script>\n",
              "      <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      flex-wrap:wrap;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "      <script>\n",
              "        const buttonEl =\n",
              "          document.querySelector('#df-756707e0-08d9-46ed-860e-11c85cc2ec52 button.colab-df-convert');\n",
              "        buttonEl.style.display =\n",
              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "        async function convertToInteractive(key) {\n",
              "          const element = document.querySelector('#df-756707e0-08d9-46ed-860e-11c85cc2ec52');\n",
              "          const dataTable =\n",
              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                     [key], {});\n",
              "          if (!dataTable) return;\n",
              "\n",
              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "            + ' to learn more about interactive tables.';\n",
              "          element.innerHTML = '';\n",
              "          dataTable['output_type'] = 'display_data';\n",
              "          await google.colab.output.renderOutput(dataTable, element);\n",
              "          const docLink = document.createElement('div');\n",
              "          docLink.innerHTML = docLinkHtml;\n",
              "          element.appendChild(docLink);\n",
              "        }\n",
              "      </script>\n",
              "    </div>\n",
              "  </div>\n"
            ]
          },
          "metadata": {},
          "execution_count": 67
        }
      ]
    }
  ],
  "metadata": {
    "colab": {
      "provenance": [],
      "authorship_tag": "ABX9TyNBRbQjyFSY30ZjBFgxg+ka",
      "include_colab_link": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    },
    "language_info": {
      "name": "python"
    },
    "gpuClass": "standard",
    "widgets": {
      "application/vnd.jupyter.widget-state+json": {
        "b15d0f15b84b4def826e1d239e28fdf1": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "HBoxModel",
          "model_module_version": "1.5.0",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "HBoxModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "HBoxView",
            "box_style": "",
            "children": [
              "IPY_MODEL_eeda925759c04afb881a76e88e3b0c66",
              "IPY_MODEL_e83ec3ff21c7471894cdcf073964d992",
              "IPY_MODEL_751660ec13fa410c8eb1385f719103d4"
            ],
            "layout": "IPY_MODEL_d16a5b33961e4a849bf687d8746fd2e4"
          }
        },
        "eeda925759c04afb881a76e88e3b0c66": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "HTMLModel",
          "model_module_version": "1.5.0",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "HTMLModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "HTMLView",
            "description": "",
            "description_tooltip": null,
            "layout": "IPY_MODEL_0ae3990397f94a449a159faab3251615",
            "placeholder": "​",
            "style": "IPY_MODEL_fa08ea49c3ac43afb324fcaf8340a1e6",
            "value": "100%"
          }
        },
        "e83ec3ff21c7471894cdcf073964d992": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "FloatProgressModel",
          "model_module_version": "1.5.0",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "FloatProgressModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "ProgressView",
            "bar_style": "success",
            "description": "",
            "description_tooltip": null,
            "layout": "IPY_MODEL_7f61a7b60b274044a7618ced159a9597",
            "max": 6823,
            "min": 0,
            "orientation": "horizontal",
            "style": "IPY_MODEL_d98b21f92e444e408761d83a7c8d9b1a",
            "value": 6823
          }
        },
        "751660ec13fa410c8eb1385f719103d4": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "HTMLModel",
          "model_module_version": "1.5.0",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "HTMLModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "HTMLView",
            "description": "",
            "description_tooltip": null,
            "layout": "IPY_MODEL_2039e2ed16ed479fb7f26731397481f9",
            "placeholder": "​",
            "style": "IPY_MODEL_79ad69fdca97433984baf06f544de798",
            "value": " 6823/6823 [11:21&lt;00:00, 10.52it/s]"
          }
        },
        "d16a5b33961e4a849bf687d8746fd2e4": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "0ae3990397f94a449a159faab3251615": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "fa08ea49c3ac43afb324fcaf8340a1e6": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "DescriptionStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "DescriptionStyleModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "StyleView",
            "description_width": ""
          }
        },
        "7f61a7b60b274044a7618ced159a9597": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "d98b21f92e444e408761d83a7c8d9b1a": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "ProgressStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "ProgressStyleModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "StyleView",
            "bar_color": null,
            "description_width": ""
          }
        },
        "2039e2ed16ed479fb7f26731397481f9": {
          "model_module": "@jupyter-widgets/base",
          "model_name": "LayoutModel",
          "model_module_version": "1.2.0",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "79ad69fdca97433984baf06f544de798": {
          "model_module": "@jupyter-widgets/controls",
          "model_name": "DescriptionStyleModel",
          "model_module_version": "1.5.0",
          "state": {
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "DescriptionStyleModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "StyleView",
            "description_width": ""
          }
        }
      }
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}