4216 lines (4216 with data), 186.5 kB
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/github/francescopatane96/Computer_aided_drug_discovery_kit/blob/main/Data_aquisition_ChEMBL.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"source": [
"# 1. Compound data aquisition from ChEMBL database"
],
"metadata": {
"id": "_FxrSEOKBc7r"
}
},
{
"cell_type": "markdown",
"source": [
"In this module, you will how to obtain compound data from the ChEMBL database for a molcelular target of interest. Data sets can be then used for many chemo-informatics tasks, eg. similarity search and clustering, or machine learning.\n",
"\n",
"In this notebook you will find compounds which were tested against a specific target and you will learn how to filter available bioactivity data."
],
"metadata": {
"id": "Oe1lkXvnZMPn"
}
},
{
"cell_type": "markdown",
"source": [
"### Some theory concepts:\n",
"1. ChEMBL is a manually curated database that contains bioactive molecules with drug-like characteristics. the relate web resource client can be used via Python.\n",
"2. From it we can retrieve a lot of compound activity measures like as IC50 (half maximal inibitory concentration), pIC50 (negative lof of the IC50, to facilitate the comparison of relate values), EC50, etc..\n",
"3. Those measures represents the information we need to create a system capable of predicting the likeliness of a molecule to be a candidate drug."
],
"metadata": {
"id": "pv3S4yULCjC4"
}
},
{
"cell_type": "markdown",
"source": [
"Let's start\n",
"\n",
"## Install and import dependencies"
],
"metadata": {
"id": "5EbLmBkDBpGa"
}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "e106nlobq1s0"
},
"outputs": [],
"source": [
"# install dependencies\n",
"!pip install chembl_webresource_client\n",
"!pip install rdkit"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"id": "GheijY-dsL_O"
},
"outputs": [],
"source": [
"# import dependencies\n",
"import pandas as pd\n",
"import math\n",
"import rdkit\n",
"from tqdm.auto import tqdm\n",
"from chembl_webresource_client.new_client import new_client\n",
"from pandas import DataFrame\n",
"import numpy as np\n",
"from rdkit import Chem\n",
"from rdkit.Chem import Descriptors, Lipinski, PandasTools\n",
"import seaborn as sns\n",
"import matplotlib.pyplot as plt\n",
"from sklearn.model_selection import train_test_split\n",
"from sklearn.ensemble import RandomForestRegressor\n",
"from sklearn.feature_selection import VarianceThreshold\n",
"from pathlib import Path\n",
"from zipfile import ZipFile\n",
"from tempfile import TemporaryDirectory"
]
},
{
"cell_type": "code",
"source": [
"#define local variables\n",
"HERE = Path(_dh[-1])\n",
"DATA = HERE / \"data\""
],
"metadata": {
"id": "1qRTJhkZcEu4"
},
"execution_count": 4,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## Create the resource objects for API access"
],
"metadata": {
"id": "9kcRsLOUcyDt"
}
},
{
"cell_type": "code",
"source": [
"targets_api = new_client.target\n",
"compounds_api = new_client.molecule\n",
"bioactivities_api = new_client.activity"
],
"metadata": {
"id": "Lit-Q2R8cPWG"
},
"execution_count": 5,
"outputs": []
},
{
"cell_type": "code",
"source": [
"type(targets_api) #show the type of the object"
],
"metadata": {
"id": "XVU-T3BJcUg-",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "c446e264-bf50-43bc-e5b9-b943dd74086c"
},
"execution_count": 6,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"chembl_webresource_client.query_set.QuerySet"
]
},
"metadata": {},
"execution_count": 6
}
]
},
{
"cell_type": "markdown",
"source": [
"### Obtain molecular target data\n",
"\n",
"Now you have to select an appropriate molecular target of interest. In these case we are going to choose as a target, the protein P00533 (UniProt ID)\n"
],
"metadata": {
"id": "tXQMUbdME_mz"
}
},
{
"cell_type": "code",
"source": [
"uniprot_id = \"P00533\" #change the uniprot ID for your project"
],
"metadata": {
"id": "qKpN49tuckve"
},
"execution_count": 13,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"Aquire target data from ChEMBL database"
],
"metadata": {
"id": "LXL1ThN-eG_X"
}
},
{
"cell_type": "code",
"source": [
"# Get target information from ChEMBL but restrict it to specified class only\n",
"targets = targets_api.get(target_components__accession=uniprot_id).only( ##variable that contains the results of the query\n",
" \"target_chembl_id\", \"organism\", \"pref_name\", \"target_type\"\n",
")\n",
"print(f'The type of the targets is \"{type(targets)}\"')"
],
"metadata": {
"id": "4UKb5NCHeRHN",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "19c0c4f3-3d97-4945-8ce3-b00c67e2413a"
},
"execution_count": 14,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"The type of the targets is \"<class 'chembl_webresource_client.query_set.QuerySet'>\"\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"Download target data from ChEMBL"
],
"metadata": {
"id": "sSItAfALerr9"
}
},
{
"cell_type": "code",
"source": [
"# use pandas to convert data to a dataframe\n",
"targets = pd.DataFrame(targets)\n",
"targets"
],
"metadata": {
"id": "D5TXIYlSeqF1",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 300
},
"outputId": "b5591066-670e-40ba-eaa8-7de12da0f378"
},
"execution_count": 15,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" organism pref_name \\\n",
"0 Homo sapiens Epidermal growth factor receptor erbB1 \n",
"1 Homo sapiens Epidermal growth factor receptor and ErbB2 (HE... \n",
"2 Homo sapiens Epidermal growth factor receptor \n",
"3 Homo sapiens MER intracellular domain/EGFR extracellular do... \n",
"4 Homo sapiens Protein cereblon/Epidermal growth factor receptor \n",
"5 Homo sapiens EGFR/PPP1CA \n",
"6 Homo sapiens VHL/EGFR \n",
"7 Homo sapiens Baculoviral IAP repeat-containing protein 2/Ep... \n",
"\n",
" target_chembl_id target_type \n",
"0 CHEMBL203 SINGLE PROTEIN \n",
"1 CHEMBL2111431 PROTEIN FAMILY \n",
"2 CHEMBL2363049 PROTEIN FAMILY \n",
"3 CHEMBL3137284 CHIMERIC PROTEIN \n",
"4 CHEMBL4523680 PROTEIN-PROTEIN INTERACTION \n",
"5 CHEMBL4523747 PROTEIN-PROTEIN INTERACTION \n",
"6 CHEMBL4523998 PROTEIN-PROTEIN INTERACTION \n",
"7 CHEMBL4802031 PROTEIN-PROTEIN INTERACTION "
],
"text/html": [
"\n",
"\n",
" <div id=\"df-f585e621-9657-44b3-acd7-5513aef14bb4\">\n",
" <div class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>organism</th>\n",
" <th>pref_name</th>\n",
" <th>target_chembl_id</th>\n",
" <th>target_type</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Homo sapiens</td>\n",
" <td>Epidermal growth factor receptor erbB1</td>\n",
" <td>CHEMBL203</td>\n",
" <td>SINGLE PROTEIN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Homo sapiens</td>\n",
" <td>Epidermal growth factor receptor and ErbB2 (HE...</td>\n",
" <td>CHEMBL2111431</td>\n",
" <td>PROTEIN FAMILY</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Homo sapiens</td>\n",
" <td>Epidermal growth factor receptor</td>\n",
" <td>CHEMBL2363049</td>\n",
" <td>PROTEIN FAMILY</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Homo sapiens</td>\n",
" <td>MER intracellular domain/EGFR extracellular do...</td>\n",
" <td>CHEMBL3137284</td>\n",
" <td>CHIMERIC PROTEIN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Homo sapiens</td>\n",
" <td>Protein cereblon/Epidermal growth factor receptor</td>\n",
" <td>CHEMBL4523680</td>\n",
" <td>PROTEIN-PROTEIN INTERACTION</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Homo sapiens</td>\n",
" <td>EGFR/PPP1CA</td>\n",
" <td>CHEMBL4523747</td>\n",
" <td>PROTEIN-PROTEIN INTERACTION</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Homo sapiens</td>\n",
" <td>VHL/EGFR</td>\n",
" <td>CHEMBL4523998</td>\n",
" <td>PROTEIN-PROTEIN INTERACTION</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Homo sapiens</td>\n",
" <td>Baculoviral IAP repeat-containing protein 2/Ep...</td>\n",
" <td>CHEMBL4802031</td>\n",
" <td>PROTEIN-PROTEIN INTERACTION</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-f585e621-9657-44b3-acd7-5513aef14bb4')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
"\n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
" <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
" </svg>\n",
" </button>\n",
"\n",
"\n",
"\n",
" <div id=\"df-91e7fba5-52b3-4cd5-93bf-533f71099e25\">\n",
" <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-91e7fba5-52b3-4cd5-93bf-533f71099e25')\"\n",
" title=\"Suggest charts.\"\n",
" style=\"display:none;\">\n",
"\n",
"<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <g>\n",
" <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
" </g>\n",
"</svg>\n",
" </button>\n",
" </div>\n",
"\n",
"<style>\n",
" .colab-df-quickchart {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-quickchart:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-quickchart {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-quickchart:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
"</style>\n",
"\n",
" <script>\n",
" async function quickchart(key) {\n",
" const containerElement = document.querySelector('#' + key);\n",
" const charts = await google.colab.kernel.invokeFunction(\n",
" 'suggestCharts', [key], {});\n",
" }\n",
" </script>\n",
"\n",
" <script>\n",
"\n",
"function displayQuickchartButton(domScope) {\n",
" let quickchartButtonEl =\n",
" domScope.querySelector('#df-91e7fba5-52b3-4cd5-93bf-533f71099e25 button.colab-df-quickchart');\n",
" quickchartButtonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"}\n",
"\n",
" displayQuickchartButton(document);\n",
" </script>\n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" flex-wrap:wrap;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-f585e621-9657-44b3-acd7-5513aef14bb4 button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-f585e621-9657-44b3-acd7-5513aef14bb4');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
" </div>\n"
]
},
"metadata": {},
"execution_count": 15
}
]
},
{
"cell_type": "markdown",
"source": [
"Select the target (ChEMBL ID)"
],
"metadata": {
"id": "bx1U-xWjfo23"
}
},
{
"cell_type": "code",
"source": [
"target = targets.iloc[0]\n",
"target"
],
"metadata": {
"id": "sewq42tUgorQ",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "c96afefb-80bd-4da0-cecd-0afe850fcbf4"
},
"execution_count": 16,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"organism Homo sapiens\n",
"pref_name Epidermal growth factor receptor erbB1\n",
"target_chembl_id CHEMBL203\n",
"target_type SINGLE PROTEIN\n",
"Name: 0, dtype: object"
]
},
"metadata": {},
"execution_count": 16
}
]
},
{
"cell_type": "markdown",
"source": [
"This is our target 💪"
],
"metadata": {
"id": "suaF4JvQHCKZ"
}
},
{
"cell_type": "markdown",
"source": [
"Let's save the target"
],
"metadata": {
"id": "cbgfetyajN_d"
}
},
{
"cell_type": "code",
"source": [
"target_id = target.target_chembl_id\n",
"print(f\"The target ChEMBL ID is {target_id}\")"
],
"metadata": {
"id": "hJTy18a9iQs1",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "a5f9e875-a736-4ee4-d7f4-2a1ebae9dad5"
},
"execution_count": 17,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"The target ChEMBL ID is CHEMBL203\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"## Get bioactivity data about tested ligands\n",
"\n",
"This step in needed in order to query bioactivity data for the selected target. We download data and filter it only considering human proteins, IC50, exact measurements ('='), and bindind-based data ('B')."
],
"metadata": {
"id": "XHHY1Oa6kk8W"
}
},
{
"cell_type": "code",
"source": [
"bioactivities = bioactivities_api.filter(\n",
" target_chembl_id=target_id, type=\"IC50\", relation=\"=\", assay_type=\"B\"\n",
").only(\n",
" \"activity_id\",\n",
" \"assay_chembl_id\",\n",
" \"assay_description\",\n",
" \"assay_type\",\n",
" \"molecule_chembl_id\",\n",
" \"type\",\n",
" \"standard_units\",\n",
" \"relation\",\n",
" \"standard_value\",\n",
" \"target_chembl_id\",\n",
" \"target_organism\",\n",
")\n",
"\n",
"print(f\"Length and type of bioactivities object: {len(bioactivities)}, {type(bioactivities)}\")"
],
"metadata": {
"id": "DDowRa21ft0l",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "84183f53-454c-4910-d3db-2cb73240f49e"
},
"execution_count": 28,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Length and type of bioactivities object: 10420, <class 'chembl_webresource_client.query_set.QuerySet'>\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"Lots of data 😃"
],
"metadata": {
"id": "kEHRSaD6Ifgl"
}
},
{
"cell_type": "code",
"source": [
"# every element aquired holds some information, see it\n",
"print(f\"Length and type of first element: {len(bioactivities[0])}, {type(bioactivities[0])}\")\n",
"bioactivities[0]"
],
"metadata": {
"id": "87Lg52G3lM_9",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "1faf57a2-89cb-4e73-bdbd-60cd06255bc2"
},
"execution_count": 29,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Length and type of first element: 13, <class 'dict'>\n"
]
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"{'activity_id': 32260,\n",
" 'assay_chembl_id': 'CHEMBL674637',\n",
" 'assay_description': 'Inhibitory activity towards tyrosine phosphorylation for the epidermal growth factor-receptor kinase',\n",
" 'assay_type': 'B',\n",
" 'molecule_chembl_id': 'CHEMBL68920',\n",
" 'relation': '=',\n",
" 'standard_units': 'nM',\n",
" 'standard_value': '41.0',\n",
" 'target_chembl_id': 'CHEMBL203',\n",
" 'target_organism': 'Homo sapiens',\n",
" 'type': 'IC50',\n",
" 'units': 'uM',\n",
" 'value': '0.041'}"
]
},
"metadata": {},
"execution_count": 29
}
]
},
{
"cell_type": "markdown",
"source": [
"Download Bioactivity data from ChEMBL"
],
"metadata": {
"id": "h_OiF1rdldYd"
}
},
{
"cell_type": "code",
"source": [
"# save obtained information into a pandas dataframe. it will take a while :)\n",
"bioactivities_df = pd.DataFrame.from_records(bioactivities)\n",
"print(f\"DataFrame shape: {bioactivities_df.shape}\")\n",
"bioactivities_df.head()"
],
"metadata": {
"id": "tSBwsyeulhJf",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 479
},
"outputId": "93faff7a-d83d-451d-95ea-3afb2f21faa6"
},
"execution_count": 30,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"DataFrame shape: (10421, 13)\n"
]
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
" activity_id assay_chembl_id \\\n",
"0 32260 CHEMBL674637 \n",
"1 32260 CHEMBL674637 \n",
"2 32267 CHEMBL674637 \n",
"3 32680 CHEMBL677833 \n",
"4 32770 CHEMBL674643 \n",
"\n",
" assay_description assay_type \\\n",
"0 Inhibitory activity towards tyrosine phosphory... B \n",
"1 Inhibitory activity towards tyrosine phosphory... B \n",
"2 Inhibitory activity towards tyrosine phosphory... B \n",
"3 In vitro inhibition of Epidermal growth factor... B \n",
"4 Inhibitory concentration of EGF dependent auto... B \n",
"\n",
" molecule_chembl_id relation standard_units standard_value target_chembl_id \\\n",
"0 CHEMBL68920 = nM 41.0 CHEMBL203 \n",
"1 CHEMBL68920 = nM 41.0 CHEMBL203 \n",
"2 CHEMBL69960 = nM 170.0 CHEMBL203 \n",
"3 CHEMBL137635 = nM 9300.0 CHEMBL203 \n",
"4 CHEMBL306988 = nM 500000.0 CHEMBL203 \n",
"\n",
" target_organism type units value \n",
"0 Homo sapiens IC50 uM 0.041 \n",
"1 Homo sapiens IC50 uM 0.041 \n",
"2 Homo sapiens IC50 uM 0.17 \n",
"3 Homo sapiens IC50 uM 9.3 \n",
"4 Homo sapiens IC50 uM 500.0 "
],
"text/html": [
"\n",
"\n",
" <div id=\"df-9f89aef4-a445-4601-b081-7e976e5915fe\">\n",
" <div class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>activity_id</th>\n",
" <th>assay_chembl_id</th>\n",
" <th>assay_description</th>\n",
" <th>assay_type</th>\n",
" <th>molecule_chembl_id</th>\n",
" <th>relation</th>\n",
" <th>standard_units</th>\n",
" <th>standard_value</th>\n",
" <th>target_chembl_id</th>\n",
" <th>target_organism</th>\n",
" <th>type</th>\n",
" <th>units</th>\n",
" <th>value</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>32260</td>\n",
" <td>CHEMBL674637</td>\n",
" <td>Inhibitory activity towards tyrosine phosphory...</td>\n",
" <td>B</td>\n",
" <td>CHEMBL68920</td>\n",
" <td>=</td>\n",
" <td>nM</td>\n",
" <td>41.0</td>\n",
" <td>CHEMBL203</td>\n",
" <td>Homo sapiens</td>\n",
" <td>IC50</td>\n",
" <td>uM</td>\n",
" <td>0.041</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>32260</td>\n",
" <td>CHEMBL674637</td>\n",
" <td>Inhibitory activity towards tyrosine phosphory...</td>\n",
" <td>B</td>\n",
" <td>CHEMBL68920</td>\n",
" <td>=</td>\n",
" <td>nM</td>\n",
" <td>41.0</td>\n",
" <td>CHEMBL203</td>\n",
" <td>Homo sapiens</td>\n",
" <td>IC50</td>\n",
" <td>uM</td>\n",
" <td>0.041</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>32267</td>\n",
" <td>CHEMBL674637</td>\n",
" <td>Inhibitory activity towards tyrosine phosphory...</td>\n",
" <td>B</td>\n",
" <td>CHEMBL69960</td>\n",
" <td>=</td>\n",
" <td>nM</td>\n",
" <td>170.0</td>\n",
" <td>CHEMBL203</td>\n",
" <td>Homo sapiens</td>\n",
" <td>IC50</td>\n",
" <td>uM</td>\n",
" <td>0.17</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>32680</td>\n",
" <td>CHEMBL677833</td>\n",
" <td>In vitro inhibition of Epidermal growth factor...</td>\n",
" <td>B</td>\n",
" <td>CHEMBL137635</td>\n",
" <td>=</td>\n",
" <td>nM</td>\n",
" <td>9300.0</td>\n",
" <td>CHEMBL203</td>\n",
" <td>Homo sapiens</td>\n",
" <td>IC50</td>\n",
" <td>uM</td>\n",
" <td>9.3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>32770</td>\n",
" <td>CHEMBL674643</td>\n",
" <td>Inhibitory concentration of EGF dependent auto...</td>\n",
" <td>B</td>\n",
" <td>CHEMBL306988</td>\n",
" <td>=</td>\n",
" <td>nM</td>\n",
" <td>500000.0</td>\n",
" <td>CHEMBL203</td>\n",
" <td>Homo sapiens</td>\n",
" <td>IC50</td>\n",
" <td>uM</td>\n",
" <td>500.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-9f89aef4-a445-4601-b081-7e976e5915fe')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
"\n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
" <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
" </svg>\n",
" </button>\n",
"\n",
"\n",
"\n",
" <div id=\"df-e44d476d-9e2d-48dd-84f8-041dd0063fd9\">\n",
" <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-e44d476d-9e2d-48dd-84f8-041dd0063fd9')\"\n",
" title=\"Suggest charts.\"\n",
" style=\"display:none;\">\n",
"\n",
"<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <g>\n",
" <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
" </g>\n",
"</svg>\n",
" </button>\n",
" </div>\n",
"\n",
"<style>\n",
" .colab-df-quickchart {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-quickchart:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-quickchart {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-quickchart:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
"</style>\n",
"\n",
" <script>\n",
" async function quickchart(key) {\n",
" const containerElement = document.querySelector('#' + key);\n",
" const charts = await google.colab.kernel.invokeFunction(\n",
" 'suggestCharts', [key], {});\n",
" }\n",
" </script>\n",
"\n",
" <script>\n",
"\n",
"function displayQuickchartButton(domScope) {\n",
" let quickchartButtonEl =\n",
" domScope.querySelector('#df-e44d476d-9e2d-48dd-84f8-041dd0063fd9 button.colab-df-quickchart');\n",
" quickchartButtonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"}\n",
"\n",
" displayQuickchartButton(document);\n",
" </script>\n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" flex-wrap:wrap;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-9f89aef4-a445-4601-b081-7e976e5915fe button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-9f89aef4-a445-4601-b081-7e976e5915fe');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
" </div>\n"
]
},
"metadata": {},
"execution_count": 30
}
]
},
{
"cell_type": "markdown",
"source": [
"The interesting columns are represented by the 'standard_units' and 'standard_value' ones because referred in nM."
],
"metadata": {
"id": "V11L6RfQKbml"
}
},
{
"cell_type": "markdown",
"source": [
"Convert IC50 values to nM unit\n",
"\n",
"We need to convert all values with many different units to nM."
],
"metadata": {
"id": "kEeVkM9cp03M"
}
},
{
"cell_type": "code",
"source": [
"bioactivities_df['units'].unique()"
],
"metadata": {
"id": "WxNCpyg8pqDt",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "9a5109f0-9c65-46e4-b0a2-81f2a689422f"
},
"execution_count": 31,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"array(['uM', 'nM', 'pM', 'M', \"10'3 uM\", \"10'1 ug/ml\", 'ug ml-1',\n",
" \"10'-1microM\", \"10'1 uM\", \"10'-1 ug/ml\", \"10'-2 ug/ml\", \"10'2 uM\",\n",
" \"10'-3 ug/ml\", \"10'-2microM\", '/uM', \"10'-6g/ml\", 'mM', 'umol/L',\n",
" 'nmol/L', \"10'-10M\", \"10'-7M\", 'nmol', '10^-8M', 'µM'],\n",
" dtype=object)"
]
},
"metadata": {},
"execution_count": 31
}
]
},
{
"cell_type": "code",
"source": [
"bioactivities_df.drop([\"units\", \"value\"], axis=1, inplace=True)\n",
"bioactivities_df.head()"
],
"metadata": {
"id": "-2Xi9Z0sqLOc",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 461
},
"outputId": "c65680da-3f0c-4e2b-f18e-4493300b81bc"
},
"execution_count": 32,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" activity_id assay_chembl_id \\\n",
"0 32260 CHEMBL674637 \n",
"1 32260 CHEMBL674637 \n",
"2 32267 CHEMBL674637 \n",
"3 32680 CHEMBL677833 \n",
"4 32770 CHEMBL674643 \n",
"\n",
" assay_description assay_type \\\n",
"0 Inhibitory activity towards tyrosine phosphory... B \n",
"1 Inhibitory activity towards tyrosine phosphory... B \n",
"2 Inhibitory activity towards tyrosine phosphory... B \n",
"3 In vitro inhibition of Epidermal growth factor... B \n",
"4 Inhibitory concentration of EGF dependent auto... B \n",
"\n",
" molecule_chembl_id relation standard_units standard_value target_chembl_id \\\n",
"0 CHEMBL68920 = nM 41.0 CHEMBL203 \n",
"1 CHEMBL68920 = nM 41.0 CHEMBL203 \n",
"2 CHEMBL69960 = nM 170.0 CHEMBL203 \n",
"3 CHEMBL137635 = nM 9300.0 CHEMBL203 \n",
"4 CHEMBL306988 = nM 500000.0 CHEMBL203 \n",
"\n",
" target_organism type \n",
"0 Homo sapiens IC50 \n",
"1 Homo sapiens IC50 \n",
"2 Homo sapiens IC50 \n",
"3 Homo sapiens IC50 \n",
"4 Homo sapiens IC50 "
],
"text/html": [
"\n",
"\n",
" <div id=\"df-f9cbbcc5-9a72-4262-83d4-cac05188f6e8\">\n",
" <div class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>activity_id</th>\n",
" <th>assay_chembl_id</th>\n",
" <th>assay_description</th>\n",
" <th>assay_type</th>\n",
" <th>molecule_chembl_id</th>\n",
" <th>relation</th>\n",
" <th>standard_units</th>\n",
" <th>standard_value</th>\n",
" <th>target_chembl_id</th>\n",
" <th>target_organism</th>\n",
" <th>type</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>32260</td>\n",
" <td>CHEMBL674637</td>\n",
" <td>Inhibitory activity towards tyrosine phosphory...</td>\n",
" <td>B</td>\n",
" <td>CHEMBL68920</td>\n",
" <td>=</td>\n",
" <td>nM</td>\n",
" <td>41.0</td>\n",
" <td>CHEMBL203</td>\n",
" <td>Homo sapiens</td>\n",
" <td>IC50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>32260</td>\n",
" <td>CHEMBL674637</td>\n",
" <td>Inhibitory activity towards tyrosine phosphory...</td>\n",
" <td>B</td>\n",
" <td>CHEMBL68920</td>\n",
" <td>=</td>\n",
" <td>nM</td>\n",
" <td>41.0</td>\n",
" <td>CHEMBL203</td>\n",
" <td>Homo sapiens</td>\n",
" <td>IC50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>32267</td>\n",
" <td>CHEMBL674637</td>\n",
" <td>Inhibitory activity towards tyrosine phosphory...</td>\n",
" <td>B</td>\n",
" <td>CHEMBL69960</td>\n",
" <td>=</td>\n",
" <td>nM</td>\n",
" <td>170.0</td>\n",
" <td>CHEMBL203</td>\n",
" <td>Homo sapiens</td>\n",
" <td>IC50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>32680</td>\n",
" <td>CHEMBL677833</td>\n",
" <td>In vitro inhibition of Epidermal growth factor...</td>\n",
" <td>B</td>\n",
" <td>CHEMBL137635</td>\n",
" <td>=</td>\n",
" <td>nM</td>\n",
" <td>9300.0</td>\n",
" <td>CHEMBL203</td>\n",
" <td>Homo sapiens</td>\n",
" <td>IC50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>32770</td>\n",
" <td>CHEMBL674643</td>\n",
" <td>Inhibitory concentration of EGF dependent auto...</td>\n",
" <td>B</td>\n",
" <td>CHEMBL306988</td>\n",
" <td>=</td>\n",
" <td>nM</td>\n",
" <td>500000.0</td>\n",
" <td>CHEMBL203</td>\n",
" <td>Homo sapiens</td>\n",
" <td>IC50</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-f9cbbcc5-9a72-4262-83d4-cac05188f6e8')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
"\n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
" <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
" </svg>\n",
" </button>\n",
"\n",
"\n",
"\n",
" <div id=\"df-aeab1b0d-83aa-4ac0-9276-307cbdbe4c5d\">\n",
" <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-aeab1b0d-83aa-4ac0-9276-307cbdbe4c5d')\"\n",
" title=\"Suggest charts.\"\n",
" style=\"display:none;\">\n",
"\n",
"<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <g>\n",
" <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
" </g>\n",
"</svg>\n",
" </button>\n",
" </div>\n",
"\n",
"<style>\n",
" .colab-df-quickchart {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-quickchart:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-quickchart {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-quickchart:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
"</style>\n",
"\n",
" <script>\n",
" async function quickchart(key) {\n",
" const containerElement = document.querySelector('#' + key);\n",
" const charts = await google.colab.kernel.invokeFunction(\n",
" 'suggestCharts', [key], {});\n",
" }\n",
" </script>\n",
"\n",
" <script>\n",
"\n",
"function displayQuickchartButton(domScope) {\n",
" let quickchartButtonEl =\n",
" domScope.querySelector('#df-aeab1b0d-83aa-4ac0-9276-307cbdbe4c5d button.colab-df-quickchart');\n",
" quickchartButtonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"}\n",
"\n",
" displayQuickchartButton(document);\n",
" </script>\n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" flex-wrap:wrap;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-f9cbbcc5-9a72-4262-83d4-cac05188f6e8 button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-f9cbbcc5-9a72-4262-83d4-cac05188f6e8');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
" </div>\n"
]
},
"metadata": {},
"execution_count": 32
}
]
},
{
"cell_type": "markdown",
"source": [
"Preprocess and filter bioactivity data\n",
"\n",
"1. Convert datatype of “standard_value” from “object” to “float”\n"
],
"metadata": {
"id": "mL0SzwR1zJSN"
}
},
{
"cell_type": "code",
"source": [
"bioactivities_df.dtypes"
],
"metadata": {
"id": "VhRtswiNzdE7"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"bioactivities_df = bioactivities_df.astype({\"standard_value\" : \"float64\"})"
],
"metadata": {
"id": "jPct7o-OzlVT"
},
"execution_count": 34,
"outputs": []
},
{
"cell_type": "code",
"source": [
"bioactivities_df.dtypes"
],
"metadata": {
"id": "G24NPnfm0Yf0"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"2. Delete entries with missing values"
],
"metadata": {
"id": "VwtlFAr60p_W"
}
},
{
"cell_type": "code",
"source": [
"bioactivities_df.dropna(axis=0, how=\"any\", inplace=True) #drop rows which contain missing values\n",
"print(f\"DataFrame shape: {bioactivities_df.shape}\")"
],
"metadata": {
"id": "4Fh1HrHO0qY7",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "34744c0b-f626-4c97-c81d-c5a9265cecb4"
},
"execution_count": 36,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"DataFrame shape: (10420, 11)\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"3. Keep only entries with “standard_unit == nM”"
],
"metadata": {
"id": "X7zrbbG02Dk9"
}
},
{
"cell_type": "code",
"source": [
"print(f\"Units in downloaded data: {bioactivities_df['standard_units'].unique()}\")\n",
"print(\n",
" f\"Number of non-nM entries:\\\n",
" {bioactivities_df[bioactivities_df['standard_units'] != 'nM'].shape[0]}\"\n",
")"
],
"metadata": {
"id": "07ywlhOp2Ful",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "9e4f2d24-3fe2-4d89-c022-bf5403eb8727"
},
"execution_count": 37,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Units in downloaded data: ['nM' 'ug.mL-1' '/uM' 'µM']\n",
"Number of non-nM entries: 70\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"bioactivities_df = bioactivities_df[bioactivities_df[\"standard_units\"] == \"nM\"]\n",
"print(f\"Units after filtering: {bioactivities_df['standard_units'].unique()}\")"
],
"metadata": {
"id": "FTwQNUNm3b6T",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "beaeb407-22f5-4978-ee32-ac119921e51f"
},
"execution_count": 38,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Units after filtering: ['nM']\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"print(f\"DataFrame shape: {bioactivities_df.shape}\")"
],
"metadata": {
"id": "gZXOar0b4JCb",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "a1eb7e3c-b1ff-4f59-ecbf-122ed054d6c9"
},
"execution_count": 39,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"DataFrame shape: (10350, 11)\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"4. Delete duplicate molecules"
],
"metadata": {
"id": "wRL-r3Mk4Qpa"
}
},
{
"cell_type": "code",
"source": [
"bioactivities_df.drop_duplicates(\"molecule_chembl_id\", keep=\"first\", inplace=True)\n",
"print(f\"DataFrame shape: {bioactivities_df.shape}\")"
],
"metadata": {
"id": "HxpzxMOn4RTA",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "ca76d3f2-0d6e-416d-810a-7a0ac3cea74c"
},
"execution_count": 40,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"DataFrame shape: (6823, 11)\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"5. Reset “DataFrame” index"
],
"metadata": {
"id": "0TCyGhxc7lG0"
}
},
{
"cell_type": "code",
"source": [
"bioactivities_df.reset_index(drop=True, inplace=True)\n",
"bioactivities_df.head()\n"
],
"metadata": {
"id": "6E8aR1_h5ODX",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 479
},
"outputId": "af732783-6938-48be-ca2b-ef371ed89ba5"
},
"execution_count": 41,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" activity_id assay_chembl_id \\\n",
"0 32260 CHEMBL674637 \n",
"1 32267 CHEMBL674637 \n",
"2 32680 CHEMBL677833 \n",
"3 32770 CHEMBL674643 \n",
"4 32772 CHEMBL674643 \n",
"\n",
" assay_description assay_type \\\n",
"0 Inhibitory activity towards tyrosine phosphory... B \n",
"1 Inhibitory activity towards tyrosine phosphory... B \n",
"2 In vitro inhibition of Epidermal growth factor... B \n",
"3 Inhibitory concentration of EGF dependent auto... B \n",
"4 Inhibitory concentration of EGF dependent auto... B \n",
"\n",
" molecule_chembl_id relation standard_units standard_value target_chembl_id \\\n",
"0 CHEMBL68920 = nM 41.0 CHEMBL203 \n",
"1 CHEMBL69960 = nM 170.0 CHEMBL203 \n",
"2 CHEMBL137635 = nM 9300.0 CHEMBL203 \n",
"3 CHEMBL306988 = nM 500000.0 CHEMBL203 \n",
"4 CHEMBL66879 = nM 3000000.0 CHEMBL203 \n",
"\n",
" target_organism type \n",
"0 Homo sapiens IC50 \n",
"1 Homo sapiens IC50 \n",
"2 Homo sapiens IC50 \n",
"3 Homo sapiens IC50 \n",
"4 Homo sapiens IC50 "
],
"text/html": [
"\n",
"\n",
" <div id=\"df-e8756456-057c-435e-9af6-2de19f7904d6\">\n",
" <div class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>activity_id</th>\n",
" <th>assay_chembl_id</th>\n",
" <th>assay_description</th>\n",
" <th>assay_type</th>\n",
" <th>molecule_chembl_id</th>\n",
" <th>relation</th>\n",
" <th>standard_units</th>\n",
" <th>standard_value</th>\n",
" <th>target_chembl_id</th>\n",
" <th>target_organism</th>\n",
" <th>type</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>32260</td>\n",
" <td>CHEMBL674637</td>\n",
" <td>Inhibitory activity towards tyrosine phosphory...</td>\n",
" <td>B</td>\n",
" <td>CHEMBL68920</td>\n",
" <td>=</td>\n",
" <td>nM</td>\n",
" <td>41.0</td>\n",
" <td>CHEMBL203</td>\n",
" <td>Homo sapiens</td>\n",
" <td>IC50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>32267</td>\n",
" <td>CHEMBL674637</td>\n",
" <td>Inhibitory activity towards tyrosine phosphory...</td>\n",
" <td>B</td>\n",
" <td>CHEMBL69960</td>\n",
" <td>=</td>\n",
" <td>nM</td>\n",
" <td>170.0</td>\n",
" <td>CHEMBL203</td>\n",
" <td>Homo sapiens</td>\n",
" <td>IC50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>32680</td>\n",
" <td>CHEMBL677833</td>\n",
" <td>In vitro inhibition of Epidermal growth factor...</td>\n",
" <td>B</td>\n",
" <td>CHEMBL137635</td>\n",
" <td>=</td>\n",
" <td>nM</td>\n",
" <td>9300.0</td>\n",
" <td>CHEMBL203</td>\n",
" <td>Homo sapiens</td>\n",
" <td>IC50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>32770</td>\n",
" <td>CHEMBL674643</td>\n",
" <td>Inhibitory concentration of EGF dependent auto...</td>\n",
" <td>B</td>\n",
" <td>CHEMBL306988</td>\n",
" <td>=</td>\n",
" <td>nM</td>\n",
" <td>500000.0</td>\n",
" <td>CHEMBL203</td>\n",
" <td>Homo sapiens</td>\n",
" <td>IC50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>32772</td>\n",
" <td>CHEMBL674643</td>\n",
" <td>Inhibitory concentration of EGF dependent auto...</td>\n",
" <td>B</td>\n",
" <td>CHEMBL66879</td>\n",
" <td>=</td>\n",
" <td>nM</td>\n",
" <td>3000000.0</td>\n",
" <td>CHEMBL203</td>\n",
" <td>Homo sapiens</td>\n",
" <td>IC50</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-e8756456-057c-435e-9af6-2de19f7904d6')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
"\n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
" <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
" </svg>\n",
" </button>\n",
"\n",
"\n",
"\n",
" <div id=\"df-99c21fcc-6e48-402c-b666-75ed683f59d0\">\n",
" <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-99c21fcc-6e48-402c-b666-75ed683f59d0')\"\n",
" title=\"Suggest charts.\"\n",
" style=\"display:none;\">\n",
"\n",
"<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <g>\n",
" <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
" </g>\n",
"</svg>\n",
" </button>\n",
" </div>\n",
"\n",
"<style>\n",
" .colab-df-quickchart {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-quickchart:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-quickchart {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-quickchart:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
"</style>\n",
"\n",
" <script>\n",
" async function quickchart(key) {\n",
" const containerElement = document.querySelector('#' + key);\n",
" const charts = await google.colab.kernel.invokeFunction(\n",
" 'suggestCharts', [key], {});\n",
" }\n",
" </script>\n",
"\n",
" <script>\n",
"\n",
"function displayQuickchartButton(domScope) {\n",
" let quickchartButtonEl =\n",
" domScope.querySelector('#df-99c21fcc-6e48-402c-b666-75ed683f59d0 button.colab-df-quickchart');\n",
" quickchartButtonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"}\n",
"\n",
" displayQuickchartButton(document);\n",
" </script>\n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" flex-wrap:wrap;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-e8756456-057c-435e-9af6-2de19f7904d6 button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-e8756456-057c-435e-9af6-2de19f7904d6');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
" </div>\n"
]
},
"metadata": {},
"execution_count": 41
}
]
},
{
"cell_type": "markdown",
"source": [
"6. Rename columns"
],
"metadata": {
"id": "tW6-uahs7wrL"
}
},
{
"cell_type": "code",
"source": [
"bioactivities_df.rename(\n",
" columns={\"standard_value\": \"IC50\", \"standard_units\": \"units\"}, inplace=True\n",
")\n",
"bioactivities_df.head()"
],
"metadata": {
"id": "KaAy_0Mc7xOs",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 479
},
"outputId": "8be364cc-d848-4d0b-9aec-5e09252fe77d"
},
"execution_count": 42,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" activity_id assay_chembl_id \\\n",
"0 32260 CHEMBL674637 \n",
"1 32267 CHEMBL674637 \n",
"2 32680 CHEMBL677833 \n",
"3 32770 CHEMBL674643 \n",
"4 32772 CHEMBL674643 \n",
"\n",
" assay_description assay_type \\\n",
"0 Inhibitory activity towards tyrosine phosphory... B \n",
"1 Inhibitory activity towards tyrosine phosphory... B \n",
"2 In vitro inhibition of Epidermal growth factor... B \n",
"3 Inhibitory concentration of EGF dependent auto... B \n",
"4 Inhibitory concentration of EGF dependent auto... B \n",
"\n",
" molecule_chembl_id relation units IC50 target_chembl_id \\\n",
"0 CHEMBL68920 = nM 41.0 CHEMBL203 \n",
"1 CHEMBL69960 = nM 170.0 CHEMBL203 \n",
"2 CHEMBL137635 = nM 9300.0 CHEMBL203 \n",
"3 CHEMBL306988 = nM 500000.0 CHEMBL203 \n",
"4 CHEMBL66879 = nM 3000000.0 CHEMBL203 \n",
"\n",
" target_organism type \n",
"0 Homo sapiens IC50 \n",
"1 Homo sapiens IC50 \n",
"2 Homo sapiens IC50 \n",
"3 Homo sapiens IC50 \n",
"4 Homo sapiens IC50 "
],
"text/html": [
"\n",
"\n",
" <div id=\"df-67b9c8dd-c35c-4cac-bfef-d9ca6f250da3\">\n",
" <div class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>activity_id</th>\n",
" <th>assay_chembl_id</th>\n",
" <th>assay_description</th>\n",
" <th>assay_type</th>\n",
" <th>molecule_chembl_id</th>\n",
" <th>relation</th>\n",
" <th>units</th>\n",
" <th>IC50</th>\n",
" <th>target_chembl_id</th>\n",
" <th>target_organism</th>\n",
" <th>type</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>32260</td>\n",
" <td>CHEMBL674637</td>\n",
" <td>Inhibitory activity towards tyrosine phosphory...</td>\n",
" <td>B</td>\n",
" <td>CHEMBL68920</td>\n",
" <td>=</td>\n",
" <td>nM</td>\n",
" <td>41.0</td>\n",
" <td>CHEMBL203</td>\n",
" <td>Homo sapiens</td>\n",
" <td>IC50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>32267</td>\n",
" <td>CHEMBL674637</td>\n",
" <td>Inhibitory activity towards tyrosine phosphory...</td>\n",
" <td>B</td>\n",
" <td>CHEMBL69960</td>\n",
" <td>=</td>\n",
" <td>nM</td>\n",
" <td>170.0</td>\n",
" <td>CHEMBL203</td>\n",
" <td>Homo sapiens</td>\n",
" <td>IC50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>32680</td>\n",
" <td>CHEMBL677833</td>\n",
" <td>In vitro inhibition of Epidermal growth factor...</td>\n",
" <td>B</td>\n",
" <td>CHEMBL137635</td>\n",
" <td>=</td>\n",
" <td>nM</td>\n",
" <td>9300.0</td>\n",
" <td>CHEMBL203</td>\n",
" <td>Homo sapiens</td>\n",
" <td>IC50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>32770</td>\n",
" <td>CHEMBL674643</td>\n",
" <td>Inhibitory concentration of EGF dependent auto...</td>\n",
" <td>B</td>\n",
" <td>CHEMBL306988</td>\n",
" <td>=</td>\n",
" <td>nM</td>\n",
" <td>500000.0</td>\n",
" <td>CHEMBL203</td>\n",
" <td>Homo sapiens</td>\n",
" <td>IC50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>32772</td>\n",
" <td>CHEMBL674643</td>\n",
" <td>Inhibitory concentration of EGF dependent auto...</td>\n",
" <td>B</td>\n",
" <td>CHEMBL66879</td>\n",
" <td>=</td>\n",
" <td>nM</td>\n",
" <td>3000000.0</td>\n",
" <td>CHEMBL203</td>\n",
" <td>Homo sapiens</td>\n",
" <td>IC50</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-67b9c8dd-c35c-4cac-bfef-d9ca6f250da3')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
"\n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
" <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
" </svg>\n",
" </button>\n",
"\n",
"\n",
"\n",
" <div id=\"df-9a2e74a9-2bdd-4398-b163-6674cbb79c55\">\n",
" <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-9a2e74a9-2bdd-4398-b163-6674cbb79c55')\"\n",
" title=\"Suggest charts.\"\n",
" style=\"display:none;\">\n",
"\n",
"<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <g>\n",
" <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
" </g>\n",
"</svg>\n",
" </button>\n",
" </div>\n",
"\n",
"<style>\n",
" .colab-df-quickchart {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-quickchart:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-quickchart {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-quickchart:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
"</style>\n",
"\n",
" <script>\n",
" async function quickchart(key) {\n",
" const containerElement = document.querySelector('#' + key);\n",
" const charts = await google.colab.kernel.invokeFunction(\n",
" 'suggestCharts', [key], {});\n",
" }\n",
" </script>\n",
"\n",
" <script>\n",
"\n",
"function displayQuickchartButton(domScope) {\n",
" let quickchartButtonEl =\n",
" domScope.querySelector('#df-9a2e74a9-2bdd-4398-b163-6674cbb79c55 button.colab-df-quickchart');\n",
" quickchartButtonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"}\n",
"\n",
" displayQuickchartButton(document);\n",
" </script>\n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" flex-wrap:wrap;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-67b9c8dd-c35c-4cac-bfef-d9ca6f250da3 button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-67b9c8dd-c35c-4cac-bfef-d9ca6f250da3');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
" </div>\n"
]
},
"metadata": {},
"execution_count": 42
}
]
},
{
"cell_type": "code",
"source": [
"print(f\"DataFrame shape: {bioactivities_df.shape}\")"
],
"metadata": {
"id": "Hof3VZxZ77gj",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "f6cf9905-b07f-4574-d389-7f59288e2244"
},
"execution_count": 43,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"DataFrame shape: (6823, 11)\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"Fetch compound data (molecular structures) from ChEMBL ..."
],
"metadata": {
"id": "aIS9YaVV8TQK"
}
},
{
"cell_type": "code",
"source": [
"compounds_provider = compounds_api.filter(\n",
" molecule_chembl_id__in=list(bioactivities_df[\"molecule_chembl_id\"])\n",
").only(\"molecule_chembl_id\", \"molecule_structures\")"
],
"metadata": {
"id": "hXA6DPwi8Vxb"
},
"execution_count": 44,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"... and download it"
],
"metadata": {
"id": "0S6rNxRi8flL"
}
},
{
"cell_type": "code",
"source": [
"compounds = list(tqdm(compounds_provider))"
],
"metadata": {
"id": "yAknPJXu8gOD",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 49,
"referenced_widgets": [
"b15d0f15b84b4def826e1d239e28fdf1",
"eeda925759c04afb881a76e88e3b0c66",
"e83ec3ff21c7471894cdcf073964d992",
"751660ec13fa410c8eb1385f719103d4",
"d16a5b33961e4a849bf687d8746fd2e4",
"0ae3990397f94a449a159faab3251615",
"fa08ea49c3ac43afb324fcaf8340a1e6",
"7f61a7b60b274044a7618ced159a9597",
"d98b21f92e444e408761d83a7c8d9b1a",
"2039e2ed16ed479fb7f26731397481f9",
"79ad69fdca97433984baf06f544de798"
]
},
"outputId": "a7e3b99b-2d6a-42ba-9312-ff3e4397e79a"
},
"execution_count": 45,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/plain": [
" 0%| | 0/6823 [00:00<?, ?it/s]"
],
"application/vnd.jupyter.widget-view+json": {
"version_major": 2,
"version_minor": 0,
"model_id": "b15d0f15b84b4def826e1d239e28fdf1"
}
},
"metadata": {}
}
]
},
{
"cell_type": "code",
"source": [
"compounds_df = pd.DataFrame.from_records(\n",
" compounds,\n",
")\n",
"print(f\"DataFrame shape: {compounds_df.shape}\")"
],
"metadata": {
"id": "D5GZ956xStWC",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "2327bcb4-08e9-4d5a-f09d-60ad8fe805b6"
},
"execution_count": 46,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"DataFrame shape: (6823, 2)\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"compounds_df.head()"
],
"metadata": {
"id": "rp7AZJuoTAkJ",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 206
},
"outputId": "69392761-eaee-451a-a51e-d1cf5b134067"
},
"execution_count": 47,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" molecule_chembl_id molecule_structures\n",
"0 CHEMBL6246 {'canonical_smiles': 'O=c1oc2c(O)c(O)cc3c(=O)o...\n",
"1 CHEMBL10 {'canonical_smiles': 'C[S+]([O-])c1ccc(-c2nc(-...\n",
"2 CHEMBL6976 {'canonical_smiles': 'COc1cc2c(cc1OC)Nc1ncn(C)...\n",
"3 CHEMBL7002 {'canonical_smiles': 'CC1(COc2ccc(CC3SC(=O)NC3...\n",
"4 CHEMBL414013 {'canonical_smiles': 'COc1cc2c(cc1OC)Nc1ncnc(O..."
],
"text/html": [
"\n",
"\n",
" <div id=\"df-564d03ac-e511-44b8-a476-44f8ec3086d9\">\n",
" <div class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>molecule_chembl_id</th>\n",
" <th>molecule_structures</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>CHEMBL6246</td>\n",
" <td>{'canonical_smiles': 'O=c1oc2c(O)c(O)cc3c(=O)o...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>CHEMBL10</td>\n",
" <td>{'canonical_smiles': 'C[S+]([O-])c1ccc(-c2nc(-...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>CHEMBL6976</td>\n",
" <td>{'canonical_smiles': 'COc1cc2c(cc1OC)Nc1ncn(C)...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>CHEMBL7002</td>\n",
" <td>{'canonical_smiles': 'CC1(COc2ccc(CC3SC(=O)NC3...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>CHEMBL414013</td>\n",
" <td>{'canonical_smiles': 'COc1cc2c(cc1OC)Nc1ncnc(O...</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-564d03ac-e511-44b8-a476-44f8ec3086d9')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
"\n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
" <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
" </svg>\n",
" </button>\n",
"\n",
"\n",
"\n",
" <div id=\"df-97f702ff-b084-4446-a681-865032f54046\">\n",
" <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-97f702ff-b084-4446-a681-865032f54046')\"\n",
" title=\"Suggest charts.\"\n",
" style=\"display:none;\">\n",
"\n",
"<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <g>\n",
" <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
" </g>\n",
"</svg>\n",
" </button>\n",
" </div>\n",
"\n",
"<style>\n",
" .colab-df-quickchart {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-quickchart:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-quickchart {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-quickchart:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
"</style>\n",
"\n",
" <script>\n",
" async function quickchart(key) {\n",
" const containerElement = document.querySelector('#' + key);\n",
" const charts = await google.colab.kernel.invokeFunction(\n",
" 'suggestCharts', [key], {});\n",
" }\n",
" </script>\n",
"\n",
" <script>\n",
"\n",
"function displayQuickchartButton(domScope) {\n",
" let quickchartButtonEl =\n",
" domScope.querySelector('#df-97f702ff-b084-4446-a681-865032f54046 button.colab-df-quickchart');\n",
" quickchartButtonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"}\n",
"\n",
" displayQuickchartButton(document);\n",
" </script>\n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" flex-wrap:wrap;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-564d03ac-e511-44b8-a476-44f8ec3086d9 button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-564d03ac-e511-44b8-a476-44f8ec3086d9');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
" </div>\n"
]
},
"metadata": {},
"execution_count": 47
}
]
},
{
"cell_type": "markdown",
"source": [
"Preprocess and filter compound data"
],
"metadata": {
"id": "aYNuT613TY3A"
}
},
{
"cell_type": "markdown",
"source": [
"1. Remove entries with missing molecule structure entry"
],
"metadata": {
"id": "mKvosdsvTetA"
}
},
{
"cell_type": "code",
"source": [
"compounds_df.dropna(axis=0, how=\"any\", inplace=True)\n",
"print(f\"DataFrame shape: {compounds_df.shape}\")"
],
"metadata": {
"id": "ih416BHBTcLw",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "c43f3bb4-22c8-4053-f7fc-806ba6844a8f"
},
"execution_count": 48,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"DataFrame shape: (6816, 2)\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"2. Delete duplicate molecules"
],
"metadata": {
"id": "4mn6curmUYgY"
}
},
{
"cell_type": "code",
"source": [
"compounds_df.drop_duplicates(\"molecule_chembl_id\", keep=\"first\", inplace=True)\n",
"print(f\"DataFrame shape: {compounds_df.shape}\")"
],
"metadata": {
"id": "eeDEjxiMUZEU",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "bea2b3c6-13ad-4ddc-b9d8-7e234a67ed04"
},
"execution_count": 49,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"DataFrame shape: (6816, 2)\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"3. Get molecules with canonical SMILES"
],
"metadata": {
"id": "DNAz4-D2VraP"
}
},
{
"cell_type": "code",
"source": [
"compounds_df.iloc[0].molecule_structures.keys()"
],
"metadata": {
"id": "34Jox4MjVxlH",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "abf6526c-1f8c-40f7-cfe9-2dccdc2725f7"
},
"execution_count": 50,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"dict_keys(['canonical_smiles', 'molfile', 'standard_inchi', 'standard_inchi_key'])"
]
},
"metadata": {},
"execution_count": 50
}
]
},
{
"cell_type": "code",
"source": [
"canonical_smiles = []\n",
"\n",
"for i, compounds in compounds_df.iterrows():\n",
" try:\n",
" canonical_smiles.append(compounds[\"molecule_structures\"][\"canonical_smiles\"])\n",
" except KeyError:\n",
" canonical_smiles.append(None)\n",
"\n",
"compounds_df[\"smiles\"] = canonical_smiles\n",
"compounds_df.drop(\"molecule_structures\", axis=1, inplace=True)\n",
"print(f\"DataFrame shape: {compounds_df.shape}\")"
],
"metadata": {
"id": "X5n6vGUBWBxw",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "f59b106a-2287-4f18-b219-b08c02654639"
},
"execution_count": 51,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"DataFrame shape: (6816, 2)\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"compounds_df.dropna(axis=0, how=\"any\", inplace=True)\n",
"print(f\"DataFrame shape: {compounds_df.shape}\")"
],
"metadata": {
"id": "Dl68xc3NWKag",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "5988d653-d8ff-4492-8457-7f1111d6b109"
},
"execution_count": 52,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"DataFrame shape: (6816, 2)\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"Summary of compound and bioactivity data"
],
"metadata": {
"id": "afNMfjvpWTFd"
}
},
{
"cell_type": "code",
"source": [
"print(f\"Bioactivities filtered: {bioactivities_df.shape[0]}\")\n",
"bioactivities_df.columns"
],
"metadata": {
"id": "sGYwdkcCWTq6",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "6ce8ca91-bb21-4903-c5fb-2d31a4782306"
},
"execution_count": 53,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Bioactivities filtered: 6823\n"
]
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"Index(['activity_id', 'assay_chembl_id', 'assay_description', 'assay_type',\n",
" 'molecule_chembl_id', 'relation', 'units', 'IC50', 'target_chembl_id',\n",
" 'target_organism', 'type'],\n",
" dtype='object')"
]
},
"metadata": {},
"execution_count": 53
}
]
},
{
"cell_type": "code",
"source": [
"print(f\"Compounds filtered: {compounds_df.shape[0]}\")\n",
"compounds_df.columns"
],
"metadata": {
"id": "7MGMVVJvZt0n",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "530e5bf9-bfae-4526-a0ef-7af9fa87e55a"
},
"execution_count": 54,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Compounds filtered: 6816\n"
]
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"Index(['molecule_chembl_id', 'smiles'], dtype='object')"
]
},
"metadata": {},
"execution_count": 54
}
]
},
{
"cell_type": "markdown",
"source": [
"Merge both datasets"
],
"metadata": {
"id": "FGcP0j4uZ2nA"
}
},
{
"cell_type": "code",
"source": [
"# Merge DataFrames\n",
"output_df = pd.merge(\n",
" bioactivities_df[[\"molecule_chembl_id\", \"IC50\", \"units\"]],\n",
" compounds_df,\n",
" on=\"molecule_chembl_id\",\n",
")\n",
"\n",
"# Reset row indices\n",
"output_df.reset_index(drop=True, inplace=True)\n",
"\n",
"print(f\"Dataset with {output_df.shape[0]} entries.\")"
],
"metadata": {
"id": "0zDTBibAZ_Rf",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "411bc6c4-a08d-4d7e-9d84-cac4b7260ad2"
},
"execution_count": 55,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Dataset with 6816 entries.\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"output_df.dtypes"
],
"metadata": {
"id": "pDdiJPS5a94A",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "485a3859-1d7a-4523-c3e6-b3a657ea6610"
},
"execution_count": 56,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"molecule_chembl_id object\n",
"IC50 float64\n",
"units object\n",
"smiles object\n",
"dtype: object"
]
},
"metadata": {},
"execution_count": 56
}
]
},
{
"cell_type": "code",
"source": [
"output_df.head(10)"
],
"metadata": {
"id": "aBuWZ-YIbDZY",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 363
},
"outputId": "5634f1cb-3adb-4de8-bf8c-bd56f5fa5f55"
},
"execution_count": 57,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" molecule_chembl_id IC50 units \\\n",
"0 CHEMBL68920 41.0 nM \n",
"1 CHEMBL69960 170.0 nM \n",
"2 CHEMBL137635 9300.0 nM \n",
"3 CHEMBL306988 500000.0 nM \n",
"4 CHEMBL66879 3000000.0 nM \n",
"5 CHEMBL77085 96000.0 nM \n",
"6 CHEMBL443268 5310.0 nM \n",
"7 CHEMBL76979 264000.0 nM \n",
"8 CHEMBL76589 125.0 nM \n",
"9 CHEMBL76904 35000.0 nM \n",
"\n",
" smiles \n",
"0 Cc1cc(C)c(/C=C2\\C(=O)Nc3ncnc(Nc4ccc(F)c(Cl)c4)... \n",
"1 Cc1cc(C(=O)N2CCOCC2)[nH]c1/C=C1\\C(=O)Nc2ncnc(N... \n",
"2 CN(c1ccccc1)c1ncnc2ccc(N/N=N/Cc3ccccn3)cc12 \n",
"3 CC(=C(C#N)C#N)c1ccc(NC(=O)CCC(=O)O)cc1 \n",
"4 O=C(O)/C=C/c1ccc(O)cc1 \n",
"5 N#CC(C#N)=Cc1cc(O)ccc1[N+](=O)[O-] \n",
"6 Cc1cc(C(=O)NCCN2CCOCC2)[nH]c1/C=C1\\C(=O)N(C)c2... \n",
"7 COc1cc(/C=C(\\C#N)C(=O)O)cc(OC)c1O \n",
"8 N#CC(C#N)=C(N)/C(C#N)=C/c1ccc(O)cc1 \n",
"9 N#CC(C#N)=Cc1ccc(O)c(O)c1 "
],
"text/html": [
"\n",
"\n",
" <div id=\"df-416c8062-6f38-4d01-b42e-e6b854f0fb5e\">\n",
" <div class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>molecule_chembl_id</th>\n",
" <th>IC50</th>\n",
" <th>units</th>\n",
" <th>smiles</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>CHEMBL68920</td>\n",
" <td>41.0</td>\n",
" <td>nM</td>\n",
" <td>Cc1cc(C)c(/C=C2\\C(=O)Nc3ncnc(Nc4ccc(F)c(Cl)c4)...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>CHEMBL69960</td>\n",
" <td>170.0</td>\n",
" <td>nM</td>\n",
" <td>Cc1cc(C(=O)N2CCOCC2)[nH]c1/C=C1\\C(=O)Nc2ncnc(N...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>CHEMBL137635</td>\n",
" <td>9300.0</td>\n",
" <td>nM</td>\n",
" <td>CN(c1ccccc1)c1ncnc2ccc(N/N=N/Cc3ccccn3)cc12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>CHEMBL306988</td>\n",
" <td>500000.0</td>\n",
" <td>nM</td>\n",
" <td>CC(=C(C#N)C#N)c1ccc(NC(=O)CCC(=O)O)cc1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>CHEMBL66879</td>\n",
" <td>3000000.0</td>\n",
" <td>nM</td>\n",
" <td>O=C(O)/C=C/c1ccc(O)cc1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>CHEMBL77085</td>\n",
" <td>96000.0</td>\n",
" <td>nM</td>\n",
" <td>N#CC(C#N)=Cc1cc(O)ccc1[N+](=O)[O-]</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>CHEMBL443268</td>\n",
" <td>5310.0</td>\n",
" <td>nM</td>\n",
" <td>Cc1cc(C(=O)NCCN2CCOCC2)[nH]c1/C=C1\\C(=O)N(C)c2...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>CHEMBL76979</td>\n",
" <td>264000.0</td>\n",
" <td>nM</td>\n",
" <td>COc1cc(/C=C(\\C#N)C(=O)O)cc(OC)c1O</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>CHEMBL76589</td>\n",
" <td>125.0</td>\n",
" <td>nM</td>\n",
" <td>N#CC(C#N)=C(N)/C(C#N)=C/c1ccc(O)cc1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>CHEMBL76904</td>\n",
" <td>35000.0</td>\n",
" <td>nM</td>\n",
" <td>N#CC(C#N)=Cc1ccc(O)c(O)c1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-416c8062-6f38-4d01-b42e-e6b854f0fb5e')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
"\n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
" <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
" </svg>\n",
" </button>\n",
"\n",
"\n",
"\n",
" <div id=\"df-bbee8165-7caa-4792-a790-ae3807e42167\">\n",
" <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-bbee8165-7caa-4792-a790-ae3807e42167')\"\n",
" title=\"Suggest charts.\"\n",
" style=\"display:none;\">\n",
"\n",
"<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <g>\n",
" <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
" </g>\n",
"</svg>\n",
" </button>\n",
" </div>\n",
"\n",
"<style>\n",
" .colab-df-quickchart {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-quickchart:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-quickchart {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-quickchart:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
"</style>\n",
"\n",
" <script>\n",
" async function quickchart(key) {\n",
" const containerElement = document.querySelector('#' + key);\n",
" const charts = await google.colab.kernel.invokeFunction(\n",
" 'suggestCharts', [key], {});\n",
" }\n",
" </script>\n",
"\n",
" <script>\n",
"\n",
"function displayQuickchartButton(domScope) {\n",
" let quickchartButtonEl =\n",
" domScope.querySelector('#df-bbee8165-7caa-4792-a790-ae3807e42167 button.colab-df-quickchart');\n",
" quickchartButtonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"}\n",
"\n",
" displayQuickchartButton(document);\n",
" </script>\n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" flex-wrap:wrap;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-416c8062-6f38-4d01-b42e-e6b854f0fb5e button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-416c8062-6f38-4d01-b42e-e6b854f0fb5e');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
" </div>\n"
]
},
"metadata": {},
"execution_count": 57
}
]
},
{
"cell_type": "markdown",
"source": [
"Add pIC50 values (for a better visualization and processing)"
],
"metadata": {
"id": "PNqDKdFlbH2_"
}
},
{
"cell_type": "code",
"source": [
"def convert_ic50_to_pic50(IC50_value):\n",
" pIC50_value = 9 - math.log10(IC50_value)\n",
" return pIC50_value"
],
"metadata": {
"id": "MSdbsgQgbQvg"
},
"execution_count": 58,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Apply conversion to each row of the compounds DataFrame\n",
"output_df[\"pIC50\"] = output_df.apply(lambda x: convert_ic50_to_pic50(x.IC50), axis=1)"
],
"metadata": {
"id": "gCjIDvlYbVUv"
},
"execution_count": 59,
"outputs": []
},
{
"cell_type": "code",
"source": [
"output_df.head()"
],
"metadata": {
"id": "tH7o_kLibyLy",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 206
},
"outputId": "174d4c58-cb45-4b92-9773-3e30d4164cae"
},
"execution_count": 60,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" molecule_chembl_id IC50 units \\\n",
"0 CHEMBL68920 41.0 nM \n",
"1 CHEMBL69960 170.0 nM \n",
"2 CHEMBL137635 9300.0 nM \n",
"3 CHEMBL306988 500000.0 nM \n",
"4 CHEMBL66879 3000000.0 nM \n",
"\n",
" smiles pIC50 \n",
"0 Cc1cc(C)c(/C=C2\\C(=O)Nc3ncnc(Nc4ccc(F)c(Cl)c4)... 7.387216 \n",
"1 Cc1cc(C(=O)N2CCOCC2)[nH]c1/C=C1\\C(=O)Nc2ncnc(N... 6.769551 \n",
"2 CN(c1ccccc1)c1ncnc2ccc(N/N=N/Cc3ccccn3)cc12 5.031517 \n",
"3 CC(=C(C#N)C#N)c1ccc(NC(=O)CCC(=O)O)cc1 3.301030 \n",
"4 O=C(O)/C=C/c1ccc(O)cc1 2.522879 "
],
"text/html": [
"\n",
"\n",
" <div id=\"df-9b30a801-9db0-4f73-971f-d78e7db8dfcb\">\n",
" <div class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>molecule_chembl_id</th>\n",
" <th>IC50</th>\n",
" <th>units</th>\n",
" <th>smiles</th>\n",
" <th>pIC50</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>CHEMBL68920</td>\n",
" <td>41.0</td>\n",
" <td>nM</td>\n",
" <td>Cc1cc(C)c(/C=C2\\C(=O)Nc3ncnc(Nc4ccc(F)c(Cl)c4)...</td>\n",
" <td>7.387216</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>CHEMBL69960</td>\n",
" <td>170.0</td>\n",
" <td>nM</td>\n",
" <td>Cc1cc(C(=O)N2CCOCC2)[nH]c1/C=C1\\C(=O)Nc2ncnc(N...</td>\n",
" <td>6.769551</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>CHEMBL137635</td>\n",
" <td>9300.0</td>\n",
" <td>nM</td>\n",
" <td>CN(c1ccccc1)c1ncnc2ccc(N/N=N/Cc3ccccn3)cc12</td>\n",
" <td>5.031517</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>CHEMBL306988</td>\n",
" <td>500000.0</td>\n",
" <td>nM</td>\n",
" <td>CC(=C(C#N)C#N)c1ccc(NC(=O)CCC(=O)O)cc1</td>\n",
" <td>3.301030</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>CHEMBL66879</td>\n",
" <td>3000000.0</td>\n",
" <td>nM</td>\n",
" <td>O=C(O)/C=C/c1ccc(O)cc1</td>\n",
" <td>2.522879</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-9b30a801-9db0-4f73-971f-d78e7db8dfcb')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
"\n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
" <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
" </svg>\n",
" </button>\n",
"\n",
"\n",
"\n",
" <div id=\"df-05f32a15-054d-4ea7-932d-44fcc49b6860\">\n",
" <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-05f32a15-054d-4ea7-932d-44fcc49b6860')\"\n",
" title=\"Suggest charts.\"\n",
" style=\"display:none;\">\n",
"\n",
"<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <g>\n",
" <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
" </g>\n",
"</svg>\n",
" </button>\n",
" </div>\n",
"\n",
"<style>\n",
" .colab-df-quickchart {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-quickchart:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-quickchart {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-quickchart:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
"</style>\n",
"\n",
" <script>\n",
" async function quickchart(key) {\n",
" const containerElement = document.querySelector('#' + key);\n",
" const charts = await google.colab.kernel.invokeFunction(\n",
" 'suggestCharts', [key], {});\n",
" }\n",
" </script>\n",
"\n",
" <script>\n",
"\n",
"function displayQuickchartButton(domScope) {\n",
" let quickchartButtonEl =\n",
" domScope.querySelector('#df-05f32a15-054d-4ea7-932d-44fcc49b6860 button.colab-df-quickchart');\n",
" quickchartButtonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"}\n",
"\n",
" displayQuickchartButton(document);\n",
" </script>\n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" flex-wrap:wrap;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-9b30a801-9db0-4f73-971f-d78e7db8dfcb button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-9b30a801-9db0-4f73-971f-d78e7db8dfcb');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
" </div>\n"
]
},
"metadata": {},
"execution_count": 60
}
]
},
{
"cell_type": "markdown",
"source": [
"Draw compound data"
],
"metadata": {
"id": "Fa3HGN67cNgK"
}
},
{
"cell_type": "code",
"source": [
"output_df.hist(column=\"pIC50\")"
],
"metadata": {
"id": "AerCnuTFcMaI",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 469
},
"outputId": "930e6d2f-cbd2-497a-eda3-faf4eb15ac8e"
},
"execution_count": 61,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"array([[<Axes: title={'center': 'pIC50'}>]], dtype=object)"
]
},
"metadata": {},
"execution_count": 61
},
{
"output_type": "display_data",
"data": {
"text/plain": [
"<Figure size 640x480 with 1 Axes>"
],
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAjgAAAGzCAYAAAAi6m1wAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/bCgiHAAAACXBIWXMAAA9hAAAPYQGoP6dpAAA2J0lEQVR4nO3de3RU5b3/8c+EJBNAJiF4SJhjwNRl5SqhRCFKFUpIuIgiWEpJMS0cOIcmKqRFoBIMF41EiggilJ4Kugq1tS0UkUJGoERr5BJM5VbEI4qnOEnbACNwmAyZ+f3hyvwcA9bAnkx48n6txYp7P8/e+7u/TXY+nT07YwsEAgEBAAAYJCrSBQAAAFiNgAMAAIxDwAEAAMYh4AAAAOMQcAAAgHEIOAAAwDgEHAAAYBwCDgAAMA4BBwAAGIeAAwAAjEPAAdAs3HjjjbrnnnsarL9w4YKeeeYZ9evXT/Hx8YqLi9PXv/515efn67333gvOW7t2rWw22yX/ud3uBvvdtGmTvvGNbyguLk6dO3fW448/rosXL4b1HAE0nehIFwAAl/OPf/xDQ4cOVUVFhe655x6NHz9e1113nY4ePaqXX35Zq1evVm1tbcg28+fPV2pqasi6hISEkOU//vGPGjVqlAYOHKjly5frwIEDWrhwoaqrq7Vy5cpwnxaAJkDAAdBsff/739c777yj3/72txozZkzI2IIFC/TYY4812GbYsGFKT0//0v3++Mc/1q233qrS0lJFR392GXQ4HHryySf1yCOPqGvXrtadBICI4BYVgLAqKiqSzWbTX//6V40dO1YOh0MdOnTQI488ogsXLlx2u927d+u1117TpEmTGoQbSbLb7Vq8ePElt/30009VV1d3ybHDhw/r8OHDmjJlSjDcSNIPf/hDBQIB/fa3v23kGQJojgg4AJrE2LFjdeHCBRUXF2v48OFatmyZpkyZctn5mzZtkiRNmDChUccZNGiQHA6H2rRpo3vvvVfHjh0LGX/nnXckqcGrPE6nUzfccENwHMC1jVtUAJpEamqq/vCHP0iS8vLy5HA49PzzzwdvF33RkSNHJEm9evX6Svtv06aNvv/97wcDTkVFhZYsWaI77rhD+/fvV0pKiiTpk08+kSR16tSpwT46deqkkydPXtH5AWheeAUHQJPIy8sLWX7ooYckSVu2bLnkfI/HI0lq167dV9r/2LFjtWbNGj344IMaNWqUFixYoG3btumf//ynnnjiieC8//u//5P02S2uL4qLiwuOA7i2EXAANImbb745ZPmmm25SVFSUPvzww0vOdzgckj57P82VGjBggPr166fXX389uK5169aSJK/X22D+hQsXguMArm0EHAARYbPZvnS8/kmmAwcOXNVxUlJSVFNTE1yuvzVVf6vq8z755BM5nc6rOh6A5oGAA6BJfPHNvu+//778fr9uvPHGS84fOXKkJOmXv/zlVR33gw8+0L/9278Fl9PS0iRJ+/btC5l38uRJ/e///m9wHMC1jYADoEmsWLEiZHn58uWSPvu7NZeSkZGhoUOH6r//+7+1cePGBuO1tbX68Y9/HFz++9//3mDOli1bVFFRoaFDhwbX9ejRQ127dtXq1atDHiVfuXKlbDabHnjggUadF4DmiaeoADSJ48eP695779XQoUNVXl6uX/7ylxo/frx69+592W1eeuklZWVlafTo0Ro5cqQGDx6stm3b6tixY3r55Zf1ySefBP8Wzh133KE+ffooPT1d8fHx2r9/v1544QWlpKToJz/5Sch+n376ad17773KysrSuHHjdPDgQT333HP6j//4D3Xr1i2sfQDQRAIAEEaPP/54QFLg8OHDgQceeCDQrl27QPv27QP5+fmB//u//wvO69KlS2DEiBENtj9//nxg8eLFgdtuuy1w3XXXBWJjYwM333xz4KGHHgq8//77wXmPPfZYIC0tLRAfHx+IiYkJdO7cOTB16tSA2+2+ZF0bNmwIpKWlBex2e+CGG24IzJkzJ1BbW2t9AwBEhC0QCAQiHbIAmKuoqEjz5s3T3//+d11//fWRLgdAC8F7cAAAgHEIOAAAwDgEHAAAYBzegwMAAIzDKzgAAMA4BBwAAGAcY//Qn9/v18mTJ9WuXbt/+Zk3AACgeQgEAvr000/ldDoVFXXlr8MYG3BOnjyplJSUSJcBAACuwMcff6wbbrjhirc3NuC0a9dO0mcNcjgcEa7GWj6fT6WlpcrKylJMTEykyzEO/Q0v+hte9De86G/41Pc2IyNDqampwd/jV8rYgFN/W8rhcBgZcNq0aSOHw8EPWBjQ3/Civ+FFf8OL/oZPfW/rg83Vvr2ENxkDAADjEHAAAIBxCDgAAMA4BBwAAGAcAg4AADAOAQcAABiHgAMAAIxDwAEAAMYh4AAAAOMQcAAAgHEIOAAAwDgEHAAAYBwCDgAAMA4BBwAAGCc60gUAwJW4cdZrkS6h0T58akSkSwBaDF7BAQAAxiHgAAAA4xBwAACAcQg4AADAOAQcAABgHAIOAAAwDgEHAAAYh4ADAACMQ8ABAADGIeAAAADjEHAAAIBxCDgAAMA4BBwAAGAcAg4AADBOowNOWVmZRo4cKafTKZvNpo0bNzaYc+TIEd17772Kj49X27Ztddttt+nEiRPB8QsXLigvL08dOnTQddddpzFjxqiqqipkHydOnNCIESPUpk0bdezYUTNmzNDFixcbf4YAAKDFaXTAOXfunHr37q0VK1Zccvx//ud/NGDAAHXt2lV/+tOf9O6776qwsFBxcXHBOdOnT9err76qV155Rbt27dLJkyc1evTo4HhdXZ1GjBih2tpavfXWW3rxxRe1du1azZ079wpOEQAAtDTRjd1g2LBhGjZs2GXHH3vsMQ0fPlwlJSXBdTfddFPwv8+cOaNf/OIXWr9+vb71rW9JktasWaNu3brp7bffVv/+/VVaWqrDhw/r9ddfV1JSktLS0rRgwQLNnDlTRUVFio2NbXBcr9crr9cbXPZ4PJIkn88nn8/X2NNs1urPx7Tzai7ob3hZ1V97q4AV5TSppvie4vs3vOhv+FjdW1sgELjiq4TNZtOGDRs0atQoSZLf71d8fLweffRRvfnmm3rnnXeUmpqq2bNnB+fs2LFDgwcP1qlTp5SQkBDcV5cuXTRt2jRNnz5dc+fO1aZNm1RZWRkcP378uL72ta9p//796tOnT4NaioqKNG/evAbr169frzZt2lzpKQIAgCZ0/vx5jR8/XmfOnJHD4bji/TT6FZwvU11drbNnz+qpp57SwoULtWjRIm3dulWjR4/Wzp07dffdd8vtdis2NjYk3EhSUlKS3G63JMntdispKanBeP3YpcyePVsFBQXBZY/Ho5SUFGVlZV1Vg5ojn88nl8ulIUOGKCYmJtLlGIf+hpdV/e1ZtM3CqsxhjwpoQbpfhfui5PXbrnp/B4uyLajKHFwfwqe+t4MGDbJkf5YGHL/fL0m67777NH36dElSWlqa3nrrLa1atUp33323lYcLYbfbZbfbG6yPiYkx9pvQ5HNrDuhveF1tf711V//L22Rev82SHvEzcGlcH8LHqr5a+pj49ddfr+joaHXv3j1kfbdu3YJPUSUnJ6u2tlanT58OmVNVVaXk5OTgnC8+VVW/XD8HAADgciwNOLGxsbrtttt09OjRkPXvvfeeunTpIknq27evYmJitH379uD40aNHdeLECWVkZEiSMjIydODAAVVXVwfnuFwuORyOBuEJAADgixp9i+rs2bN6//33g8vHjx9XZWWlEhMT1blzZ82YMUPf+c53dNddd2nQoEHaunWrXn31Vf3pT3+SJMXHx2vSpEkqKChQYmKiHA6HHnroIWVkZKh///6SpKysLHXv3l0TJkxQSUmJ3G635syZo7y8vEvehgIAAPi8Rgecffv2hbwBqP6Nvbm5uVq7dq3uv/9+rVq1SsXFxXr44Yd1yy236He/+50GDBgQ3OaZZ55RVFSUxowZI6/Xq+zsbD3//PPB8VatWmnz5s2aOnWqMjIy1LZtW+Xm5mr+/PlXc64AAKCFaHTAGThwoP7Vk+UTJ07UxIkTLzseFxenFStWXPaPBUqfPTa+ZcuWxpYHAADAZ1EBAADzEHAAAIBxCDgAAMA4BBwAAGAcAg4AADAOAQcAABiHgAMAAIxDwAEAAMYh4AAAAOMQcAAAgHEIOAAAwDgEHAAAYBwCDgAAMA4BBwAAGIeAAwAAjEPAAQAAxiHgAAAA4xBwAACAcQg4AADAOAQcAABgHAIOAAAwTnSkCwAQeTfOeq3JjmVvFVDJ7VLPom3y1tma7LgAWhZewQEAAMYh4AAAAOMQcAAAgHEIOAAAwDgEHAAAYBwCDgAAMA4BBwAAGIeAAwAAjNPogFNWVqaRI0fK6XTKZrNp48aNl537X//1X7LZbFq6dGnI+pqaGuXk5MjhcCghIUGTJk3S2bNnQ+a8++67+uY3v6m4uDilpKSopKSksaUCAIAWqtEB59y5c+rdu7dWrFjxpfM2bNigt99+W06ns8FYTk6ODh06JJfLpc2bN6usrExTpkwJjns8HmVlZalLly6qqKjQ008/raKiIq1evbqx5QIAgBao0R/VMGzYMA0bNuxL5/ztb3/TQw89pG3btmnEiBEhY0eOHNHWrVu1d+9epaenS5KWL1+u4cOHa/HixXI6nVq3bp1qa2v1wgsvKDY2Vj169FBlZaWWLFkSEoQAAAAuxfLPovL7/ZowYYJmzJihHj16NBgvLy9XQkJCMNxIUmZmpqKiorR7927df//9Ki8v11133aXY2NjgnOzsbC1atEinTp1S+/btG+zX6/XK6/UGlz0ejyTJ5/PJ5/NZeYoRV38+pp1Xc9ES+2tvFWi6Y0UFQr7CWlb3tyX9HHwVLfH60FSs7q3lAWfRokWKjo7Www8/fMlxt9utjh07hhYRHa3ExES53e7gnNTU1JA5SUlJwbFLBZzi4mLNmzevwfrS0lK1adPmis6luXO5XJEuwWgtqb8ltzf9MRek+5v+oC2IVf3dsmWLJfsxTUu6PjS1nTt3WrIfSwNORUWFnn32We3fv182W9N+SvDs2bNVUFAQXPZ4PEpJSVFWVpYcDkeT1hJuPp9PLpdLQ4YMUUxMTKTLMU5L7G/Pom1Ndix7VEAL0v0q3Bclr59PE7ea1f09WJRtQVXmaInXh6ZS39tBgwZZsj9LA84bb7yh6upqde7cObiurq5OP/rRj7R06VJ9+OGHSk5OVnV1dch2Fy9eVE1NjZKTkyVJycnJqqqqCplTv1w/54vsdrvsdnuD9TExMcZ+E5p8bs1BS+qvt67pg4bXb4vIcVsKq/rbUn4GGqslXR+amlV9tfTv4EyYMEHvvvuuKisrg/+cTqdmzJihbds++3+IGRkZOn36tCoqKoLb7dixQ36/X/369QvOKSsrC7kP53K5dMstt1zy9hQAAMDnNfoVnLNnz+r9998PLh8/flyVlZVKTExU586d1aFDh5D5MTExSk5O1i233CJJ6tatm4YOHarJkydr1apV8vl8ys/P17hx44KPlI8fP17z5s3TpEmTNHPmTB08eFDPPvusnnnmmas5VwAA0EI0OuDs27cv5P5Y/ftecnNztXbt2q+0j3Xr1ik/P1+DBw9WVFSUxowZo2XLlgXH4+PjVVpaqry8PPXt21fXX3+95s6dyyPiAADgK2l0wBk4cKACga/++OGHH37YYF1iYqLWr1//pdvdeuuteuONNxpbHgAAAJ9FBQAAzEPAAQAAxiHgAAAA4xBwAACAcQg4AADAOAQcAABgHAIOAAAwDgEHAAAYh4ADAACMQ8ABAADGIeAAAADjEHAAAIBxCDgAAMA4BBwAAGAcAg4AADAOAQcAABiHgAMAAIxDwAEAAMYh4AAAAOMQcAAAgHEIOAAAwDgEHAAAYBwCDgAAMA4BBwAAGIeAAwAAjEPAAQAAxiHgAAAA4xBwAACAcQg4AADAOAQcAABgHAIOAAAwTqMDTllZmUaOHCmn0ymbzaaNGzcGx3w+n2bOnKlevXqpbdu2cjqdevDBB3Xy5MmQfdTU1CgnJ0cOh0MJCQmaNGmSzp49GzLn3Xff1Te/+U3FxcUpJSVFJSUlV3aGAACgxWl0wDl37px69+6tFStWNBg7f/689u/fr8LCQu3fv1+///3vdfToUd17770h83JycnTo0CG5XC5t3rxZZWVlmjJlSnDc4/EoKytLXbp0UUVFhZ5++mkVFRVp9erVV3CKAACgpYlu7AbDhg3TsGHDLjkWHx8vl8sVsu65557T7bffrhMnTqhz5846cuSItm7dqr179yo9PV2StHz5cg0fPlyLFy+W0+nUunXrVFtbqxdeeEGxsbHq0aOHKisrtWTJkpAgBAAAcCmNDjiNdebMGdlsNiUkJEiSysvLlZCQEAw3kpSZmamoqCjt3r1b999/v8rLy3XXXXcpNjY2OCc7O1uLFi3SqVOn1L59+wbH8Xq98nq9wWWPxyPps9tmPp8vTGcXGfXnY9p5NRctsb/2VoGmO1ZUIOQrrGV1f1vSz8FX0RKvD03F6t6GNeBcuHBBM2fO1He/+105HA5JktvtVseOHUOLiI5WYmKi3G53cE5qamrInKSkpODYpQJOcXGx5s2b12B9aWmp2rRpY8n5NDdffLUM1mpJ/S25vemPuSDd3/QHbUGs6u+WLVss2Y9pWtL1oant3LnTkv2ELeD4fD6NHTtWgUBAK1euDNdhgmbPnq2CgoLgssfjUUpKirKysoLhyhQ+n08ul0tDhgxRTExMpMsxTkvsb8+ibU12LHtUQAvS/SrcFyWv39Zkx20prO7vwaJsC6oyR0u8PjSV+t4OGjTIkv2FJeDUh5uPPvpIO3bsCAkYycnJqq6uDpl/8eJF1dTUKDk5OTinqqoqZE79cv2cL7Lb7bLb7Q3Wx8TEGPtNaPK5NQctqb/euqYPGl6/LSLHbSms6m9L+RlorJZ0fWhqVvXV8r+DUx9ujh07ptdff10dOnQIGc/IyNDp06dVUVERXLdjxw75/X7169cvOKesrCzkPpzL5dItt9xyydtTAAAAn9fogHP27FlVVlaqsrJSknT8+HFVVlbqxIkT8vl8euCBB7Rv3z6tW7dOdXV1crvdcrvdqq2tlSR169ZNQ4cO1eTJk7Vnzx79+c9/Vn5+vsaNGyen0ylJGj9+vGJjYzVp0iQdOnRIv/71r/Xss8+G3IICAAC4nEbfotq3b1/I/bH60JGbm6uioiJt2rRJkpSWlhay3c6dOzVw4EBJ0rp165Sfn6/BgwcrKipKY8aM0bJly4Jz4+PjVVpaqry8PPXt21fXX3+95s6dyyPiAADgK2l0wBk4cKACgcs/fvhlY/USExO1fv36L51z66236o033mhseQAAAHwWFQAAMA8BBwAAGIeAAwAAjEPAAQAAxiHgAAAA4xBwAACAcQg4AADAOAQcAABgHAIOAAAwDgEHAAAYh4ADAACMQ8ABAADGIeAAAADjEHAAAIBxCDgAAMA4BBwAAGAcAg4AADAOAQcAABiHgAMAAIxDwAEAAMYh4AAAAOMQcAAAgHEIOAAAwDgEHAAAYBwCDgAAMA4BBwAAGIeAAwAAjEPAAQAAxiHgAAAA4xBwAACAcQg4AADAOI0OOGVlZRo5cqScTqdsNps2btwYMh4IBDR37lx16tRJrVu3VmZmpo4dOxYyp6amRjk5OXI4HEpISNCkSZN09uzZkDnvvvuuvvnNbyouLk4pKSkqKSlp/NkBAIAWKbqxG5w7d069e/fWxIkTNXr06AbjJSUlWrZsmV588UWlpqaqsLBQ2dnZOnz4sOLi4iRJOTk5+uSTT+RyueTz+fSDH/xAU6ZM0fr16yVJHo9HWVlZyszM1KpVq3TgwAFNnDhRCQkJmjJlylWeMhBeN856LdIlAECL1+iAM2zYMA0bNuySY4FAQEuXLtWcOXN03333SZJeeuklJSUlaePGjRo3bpyOHDmirVu3au/evUpPT5ckLV++XMOHD9fixYvldDq1bt061dbW6oUXXlBsbKx69OihyspKLVmyhIADAAD+pUYHnC9z/Phxud1uZWZmBtfFx8erX79+Ki8v17hx41ReXq6EhIRguJGkzMxMRUVFaffu3br//vtVXl6uu+66S7GxscE52dnZWrRokU6dOqX27ds3OLbX65XX6w0uezweSZLP55PP57PyNCOu/nxMO6/m4mr7a28VsLIc49ijAiFfYS2r+8t1JhTX3/CxureWBhy32y1JSkpKClmflJQUHHO73erYsWNoEdHRSkxMDJmTmpraYB/1Y5cKOMXFxZo3b16D9aWlpWrTps0VnlHz5nK5Il2C0a60vyW3W1yIoRak+yNdgtGs6u+WLVss2Y9puP6Gz86dOy3Zj6UBJ5Jmz56tgoKC4LLH41FKSoqysrLkcDgiWJn1fD6fXC6XhgwZopiYmEiXY5yr7W/Pom1hqMoc9qiAFqT7VbgvSl6/LdLlGMfq/h4syragKnNw/Q2f+t4OGjTIkv1ZGnCSk5MlSVVVVerUqVNwfVVVldLS0oJzqqurQ7a7ePGiampqgtsnJyerqqoqZE79cv2cL7Lb7bLb7Q3Wx8TEGPtNaPK5NQdX2l9vHb+0vwqv30avwsiq/nKNuTSuv+FjVV8t/Ts4qampSk5O1vbt24PrPB6Pdu/erYyMDElSRkaGTp8+rYqKiuCcHTt2yO/3q1+/fsE5ZWVlIffhXC6XbrnllkvengIAAPi8Rgecs2fPqrKyUpWVlZI+e2NxZWWlTpw4IZvNpmnTpmnhwoXatGmTDhw4oAcffFBOp1OjRo2SJHXr1k1Dhw7V5MmTtWfPHv35z39Wfn6+xo0bJ6fTKUkaP368YmNjNWnSJB06dEi//vWv9eyzz4bcggIAALicRt+i2rdvX8j9sfrQkZubq7Vr1+rRRx/VuXPnNGXKFJ0+fVoDBgzQ1q1bg38DR5LWrVun/Px8DR48WFFRURozZoyWLVsWHI+Pj1dpaany8vLUt29fXX/99Zo7dy6PiAMAgK+k0QFn4MCBCgQu//ihzWbT/PnzNX/+/MvOSUxMDP5Rv8u59dZb9cYbbzS2PAAAAD6LCgAAmIeAAwAAjEPAAQAAxiHgAAAA4xBwAACAcQg4AADAOAQcAABgHAIOAAAwDgEHAAAYh4ADAACMQ8ABAADGIeAAAADjEHAAAIBxCDgAAMA4BBwAAGAcAg4AADAOAQcAABiHgAMAAIxDwAEAAMYh4AAAAOMQcAAAgHEIOAAAwDgEHAAAYBwCDgAAMA4BBwAAGIeAAwAAjEPAAQAAxiHgAAAA4xBwAACAcQg4AADAOAQcAABgHMsDTl1dnQoLC5WamqrWrVvrpptu0oIFCxQIBIJzAoGA5s6dq06dOql169bKzMzUsWPHQvZTU1OjnJwcORwOJSQkaNKkSTp79qzV5QIAAANZHnAWLVqklStX6rnnntORI0e0aNEilZSUaPny5cE5JSUlWrZsmVatWqXdu3erbdu2ys7O1oULF4JzcnJydOjQIblcLm3evFllZWWaMmWK1eUCAAADRVu9w7feekv33XefRowYIUm68cYb9atf/Up79uyR9NmrN0uXLtWcOXN03333SZJeeuklJSUlaePGjRo3bpyOHDmirVu3au/evUpPT5ckLV++XMOHD9fixYvldDobHNfr9crr9QaXPR6PJMnn88nn81l9mhFVfz6mnVdzcbX9tbcK/OtJLZg9KhDyFdayur9cZ0Jx/Q0fq3trC3z+3pEFnnzySa1evVqlpaX6+te/rr/85S/KysrSkiVLlJOTow8++EA33XST3nnnHaWlpQW3u/vuu5WWlqZnn31WL7zwgn70ox/p1KlTwfGLFy8qLi5Or7zyiu6///4Gxy0qKtK8efMarF+/fr3atGlj5SkCAIAwOX/+vMaPH68zZ87I4XBc8X4sfwVn1qxZ8ng86tq1q1q1aqW6ujo98cQTysnJkSS53W5JUlJSUsh2SUlJwTG3262OHTuGFhodrcTExOCcL5o9e7YKCgqCyx6PRykpKcrKyrqqBjVHPp9PLpdLQ4YMUUxMTKTLMc7V9rdn0bYwVGUOe1RAC9L9KtwXJa/fFulyjGN1fw8WZVtQlTm4/oZPfW8HDRpkyf4sDzi/+c1vtG7dOq1fv149evRQZWWlpk2bJqfTqdzcXKsPF2S322W32xusj4mJMfab0ORzaw6utL/eOn5pfxVev41ehZFV/eUac2lcf8PHqr5aHnBmzJihWbNmady4cZKkXr166aOPPlJxcbFyc3OVnJwsSaqqqlKnTp2C21VVVQVvWSUnJ6u6ujpkvxcvXlRNTU1wewAAgMux/Cmq8+fPKyoqdLetWrWS3++XJKWmpio5OVnbt28Pjns8Hu3evVsZGRmSpIyMDJ0+fVoVFRXBOTt27JDf71e/fv2sLhkAABjG8ldwRo4cqSeeeEKdO3dWjx499M4772jJkiWaOHGiJMlms2natGlauHChbr75ZqWmpqqwsFBOp1OjRo2SJHXr1k1Dhw7V5MmTtWrVKvl8PuXn52vcuHGXfIIKAADg8ywPOMuXL1dhYaF++MMfqrq6Wk6nU//5n/+puXPnBuc8+uijOnfunKZMmaLTp09rwIAB2rp1q+Li4oJz1q1bp/z8fA0ePFhRUVEaM2aMli1bZnW5AADAQJYHnHbt2mnp0qVaunTpZefYbDbNnz9f8+fPv+ycxMRErV+/3uryAABAC8BnUQEAAOMQcAAAgHEIOAAAwDgEHAAAYBwCDgAAMA4BBwAAGIeAAwAAjEPAAQAAxiHgAAAA4xBwAACAcQg4AADAOAQcAABgHAIOAAAwDgEHAAAYh4ADAACMQ8ABAADGIeAAAADjEHAAAIBxCDgAAMA40ZEuAADQfN0467VIl9BoHz41ItIloBngFRwAAGAcAg4AADAOAQcAABiHgAMAAIxDwAEAAMYh4AAAAOMQcAAAgHEIOAAAwDgEHAAAYBwCDgAAME5YAs7f/vY3fe9731OHDh3UunVr9erVS/v27QuOBwIBzZ07V506dVLr1q2VmZmpY8eOheyjpqZGOTk5cjgcSkhI0KRJk3T27NlwlAsAAAxjecA5deqU7rzzTsXExOiPf/yjDh8+rJ/+9Kdq3759cE5JSYmWLVumVatWaffu3Wrbtq2ys7N14cKF4JycnBwdOnRILpdLmzdvVllZmaZMmWJ1uQAAwECWf9jmokWLlJKSojVr1gTXpaamBv87EAho6dKlmjNnju677z5J0ksvvaSkpCRt3LhR48aN05EjR7R161bt3btX6enpkqTly5dr+PDhWrx4sZxOp9VlAwAAg1gecDZt2qTs7Gx9+9vf1q5du/Tv//7v+uEPf6jJkydLko4fPy63263MzMzgNvHx8erXr5/Ky8s1btw4lZeXKyEhIRhuJCkzM1NRUVHavXu37r///gbH9Xq98nq9wWWPxyNJ8vl88vl8Vp9mRNWfj2nn1VxcbX/trQJWlmMce1Qg5CusRX/De23k+hs+VvfW8oDzwQcfaOXKlSooKNBPfvIT7d27Vw8//LBiY2OVm5srt9stSUpKSgrZLikpKTjmdrvVsWPH0EKjo5WYmBic80XFxcWaN29eg/WlpaVq06aNFafW7LhcrkiXYLQr7W/J7RYXYqgF6f5Il2C0ltzfLVu2hP0YXH/DZ+fOnZbsx/KA4/f7lZ6erieffFKS1KdPHx08eFCrVq1Sbm6u1YcLmj17tgoKCoLLHo9HKSkpysrKksPhCNtxI8Hn88nlcmnIkCGKiYmJdDnGudr+9izaFoaqzGGPCmhBul+F+6Lk9dsiXY5x6K90sCg7bPvm+hs+9b0dNGiQJfuzPOB06tRJ3bt3D1nXrVs3/e53v5MkJScnS5KqqqrUqVOn4JyqqiqlpaUF51RXV4fs4+LFi6qpqQlu/0V2u112u73B+piYGGO/CU0+t+bgSvvrrWuZv1Qay+u30aswasn9bYrrItff8LGqr5Y/RXXnnXfq6NGjIevee+89denSRdJnbzhOTk7W9u3bg+Mej0e7d+9WRkaGJCkjI0OnT59WRUVFcM6OHTvk9/vVr18/q0sGAACGsfwVnOnTp+uOO+7Qk08+qbFjx2rPnj1avXq1Vq9eLUmy2WyaNm2aFi5cqJtvvlmpqakqLCyU0+nUqFGjJH32is/QoUM1efJkrVq1Sj6fT/n5+Ro3bhxPUAEAgH/J8oBz2223acOGDZo9e7bmz5+v1NRULV26VDk5OcE5jz76qM6dO6cpU6bo9OnTGjBggLZu3aq4uLjgnHXr1ik/P1+DBw9WVFSUxowZo2XLllldLgAAMJDlAUeS7rnnHt1zzz2XHbfZbJo/f77mz59/2TmJiYlav359OMoDAACG47OoAACAcQg4AADAOAQcAABgHAIOAAAwDgEHAAAYh4ADAACMQ8ABAADGIeAAAADjEHAAAIBxCDgAAMA4BBwAAGAcAg4AADAOAQcAABiHgAMAAIxDwAEAAMYh4AAAAOMQcAAAgHEIOAAAwDgEHAAAYBwCDgAAMA4BBwAAGIeAAwAAjEPAAQAAxiHgAAAA4xBwAACAcQg4AADAOAQcAABgHAIOAAAwDgEHAAAYh4ADAACMQ8ABAADGCXvAeeqpp2Sz2TRt2rTgugsXLigvL08dOnTQddddpzFjxqiqqipkuxMnTmjEiBFq06aNOnbsqBkzZujixYvhLhcAABggrAFn7969+tnPfqZbb701ZP306dP16quv6pVXXtGuXbt08uRJjR49OjheV1enESNGqLa2Vm+99ZZefPFFrV27VnPnzg1nuQAAwBBhCzhnz55VTk6Ofv7zn6t9+/bB9WfOnNEvfvELLVmyRN/61rfUt29frVmzRm+99ZbefvttSVJpaakOHz6sX/7yl0pLS9OwYcO0YMECrVixQrW1teEqGQAAGCI6XDvOy8vTiBEjlJmZqYULFwbXV1RUyOfzKTMzM7iua9eu6ty5s8rLy9W/f3+Vl5erV69eSkpKCs7Jzs7W1KlTdejQIfXp06fB8bxer7xeb3DZ4/FIknw+n3w+XzhOMWLqz8e082ourra/9lYBK8sxjj0qEPIV1qK/4b02cv0NH6t7G5aA8/LLL2v//v3au3dvgzG3263Y2FglJCSErE9KSpLb7Q7O+Xy4qR+vH7uU4uJizZs3r8H60tJStWnT5kpOo9lzuVyRLsFoV9rfktstLsRQC9L9kS7BaC25v1u2bAn7Mbj+hs/OnTst2Y/lAefjjz/WI488IpfLpbi4OKt3f1mzZ89WQUFBcNnj8SglJUVZWVlyOBxNVkdT8Pl8crlcGjJkiGJiYiJdjnGutr89i7aFoSpz2KMCWpDuV+G+KHn9tkiXYxz6Kx0syg7bvrn+hk99bwcNGmTJ/iwPOBUVFaqurtY3vvGN4Lq6ujqVlZXpueee07Zt21RbW6vTp0+HvIpTVVWl5ORkSVJycrL27NkTst/6p6zq53yR3W6X3W5vsD4mJsbYb0KTz605uNL+euta5i+VxvL6bfQqjFpyf5viusj1N3ys6qvlbzIePHiwDhw4oMrKyuC/9PR05eTkBP87JiZG27dvD25z9OhRnThxQhkZGZKkjIwMHThwQNXV1cE5LpdLDodD3bt3t7pkAABgGMtfwWnXrp169uwZsq5t27bq0KFDcP2kSZNUUFCgxMREORwOPfTQQ8rIyFD//v0lSVlZWerevbsmTJigkpISud1uzZkzR3l5eZd8lQYAAODzwvYU1Zd55plnFBUVpTFjxsjr9So7O1vPP/98cLxVq1bavHmzpk6dqoyMDLVt21a5ubmaP39+JMoFAADXmCYJOH/6059CluPi4rRixQqtWLHistt06dKlSd4JDwAAzMNnUQEAAOMQcAAAgHEIOAAAwDgEHAAAYBwCDgAAMA4BBwAAGIeAAwAAjEPAAQAAxiHgAAAA4xBwAACAcQg4AADAOAQcAABgHAIOAAAwDgEHAAAYh4ADAACMQ8ABAADGIeAAAADjEHAAAIBxCDgAAMA4BBwAAGAcAg4AADAOAQcAABiHgAMAAIxDwAEAAMYh4AAAAOMQcAAAgHEIOAAAwDgEHAAAYBwCDgAAMA4BBwAAGIeAAwAAjBNt9Q6Li4v1+9//Xn/961/VunVr3XHHHVq0aJFuueWW4JwLFy7oRz/6kV5++WV5vV5lZ2fr+eefV1JSUnDOiRMnNHXqVO3cuVPXXXedcnNzVVxcrOhoy0tGM3bjrNea/Jj2VgGV3C71LNomb52tyY8PALh6lr+Cs2vXLuXl5entt9+Wy+WSz+dTVlaWzp07F5wzffp0vfrqq3rllVe0a9cunTx5UqNHjw6O19XVacSIEaqtrdVbb72lF198UWvXrtXcuXOtLhcAABjI8pdDtm7dGrK8du1adezYURUVFbrrrrt05swZ/eIXv9D69ev1rW99S5K0Zs0adevWTW+//bb69++v0tJSHT58WK+//rqSkpKUlpamBQsWaObMmSoqKlJsbKzVZQMAAIOE/X7PmTNnJEmJiYmSpIqKCvl8PmVmZgbndO3aVZ07d1Z5ebn69++v8vJy9erVK+SWVXZ2tqZOnapDhw6pT58+DY7j9Xrl9XqDyx6PR5Lk8/nk8/nCcm6RUn8+pp3XpdhbBZr+mFGBkK+wFv0NL/ob3mtjS7r+NjWrexvWgOP3+zVt2jTdeeed6tmzpyTJ7XYrNjZWCQkJIXOTkpLkdruDcz4fburH68cupbi4WPPmzWuwvrS0VG3atLnaU2mWXC5XpEsIu5LbI3fsBen+yB28BaC/4dWS+7tly5awH6MlXH8jZefOnZbsJ6wBJy8vTwcPHtSbb74ZzsNIkmbPnq2CgoLgssfjUUpKirKysuRwOMJ+/Kbk8/nkcrk0ZMgQxcTERLqcsOpZtK3Jj2mPCmhBul+F+6Lk9fMmY6vR3/Civ9LBouyw7bslXX+bWn1vBw0aZMn+whZw8vPztXnzZpWVlemGG24Irk9OTlZtba1Onz4d8ipOVVWVkpOTg3P27NkTsr+qqqrg2KXY7XbZ7fYG62NiYoz9JjT53OpF8ikmr9/GU1RhRH/DqyX3tymuiy3h+hspVvXV8qeoAoGA8vPztWHDBu3YsUOpqakh43379lVMTIy2b98eXHf06FGdOHFCGRkZkqSMjAwdOHBA1dXVwTkul0sOh0Pdu3e3umQAAGAYy1/BycvL0/r16/WHP/xB7dq1C75nJj4+Xq1bt1Z8fLwmTZqkgoICJSYmyuFw6KGHHlJGRob69+8vScrKylL37t01YcIElZSUyO12a86cOcrLy7vkqzQAAACfZ3nAWblypSRp4MCBIevXrFmj73//+5KkZ555RlFRURozZkzIH/qr16pVK23evFlTp05VRkaG2rZtq9zcXM2fP9/qcgEAgIEsDziBwL9+NDEuLk4rVqzQihUrLjunS5cuTfJOeAAAYB4+iwoAABiHgAMAAIxDwAEAAMYh4AAAAOMQcAAAgHEIOAAAwDgEHAAAYBwCDgAAMA4BBwAAGIeAAwAAjGP5RzUAABBJN856LWz7trcKqOR2qWfRNnnrbJbt98OnRli2L3yGV3AAAIBxCDgAAMA4BBwAAGAcAg4AADAOAQcAABiHgAMAAIxDwAEAAMYh4AAAAOMQcAAAgHEIOAAAwDgEHAAAYBwCDgAAMA4BBwAAGIeAAwAAjEPAAQAAxiHgAAAA40RHugA0nRtnvRbpEgAAaBK8ggMAAIxDwAEAAMZp1gFnxYoVuvHGGxUXF6d+/fppz549kS4JAABcA5ptwPn1r3+tgoICPf7449q/f7969+6t7OxsVVdXR7o0AADQzDXbNxkvWbJEkydP1g9+8ANJ0qpVq/Taa6/phRde0KxZsyJcHQAA1rkWHwL58KkRkS7hSzXLgFNbW6uKigrNnj07uC4qKkqZmZkqLy+/5DZer1derze4fObMGUlSTU2NfD6f5TX2K95u+T6/KntUQHP6+JX22O/l9du+8nbN8n/sZijaH9D5835F+6JU14j+4quhv+FFf8OL/v5///znPy3dn8/n0/nz51VTUyNJCgQCV7W/Zvk77x//+Ifq6uqUlJQUsj4pKUl//etfL7lNcXGx5s2b12B9ampqWGqMtPGRLsBw9De86G940d/wor+fuf6n4d3/p59+qvj4+CvevlkGnCsxe/ZsFRQUBJf9fr9qamrUoUMH2WxmpWyPx6OUlBR9/PHHcjgckS7HOPQ3vOhveNHf8KK/4VPf2xMnTshms8npdF7V/pplwLn++uvVqlUrVVVVhayvqqpScnLyJbex2+2y2+0h6xISEsJVYrPgcDj4AQsj+hte9De86G940d/wiY+Pt6S3zfIpqtjYWPXt21fbt///97n4/X5t375dGRkZEawMAABcC5rlKziSVFBQoNzcXKWnp+v222/X0qVLde7cueBTVQAAAJfTbAPOd77zHf3973/X3Llz5Xa7lZaWpq1btzZ443FLZLfb9fjjjze4JQdr0N/wor/hRX/Di/6Gj9W9tQWu9jksAACAZqZZvgcHAADgahBwAACAcQg4AADAOAQcAABgHAIOAAAwDgHnGlJcXKzbbrtN7dq1U8eOHTVq1CgdPXo00mUZ6amnnpLNZtO0adMiXYox/va3v+l73/ueOnTooNatW6tXr17at29fpMsyQl1dnQoLC5WamqrWrVvrpptu0oIFC676wwpbqrKyMo0cOVJOp1M2m00bN24MGQ8EApo7d646deqk1q1bKzMzU8eOHYtMsdegL+uvz+fTzJkz1atXL7Vt21ZOp1MPPvigTp482ejjEHCuIbt27VJeXp7efvttuVwu+Xw+ZWVl6dy5c5EuzSh79+7Vz372M916662RLsUYp06d0p133qmYmBj98Y9/1OHDh/XTn/5U7du3j3RpRli0aJFWrlyp5557TkeOHNGiRYtUUlKi5cuXR7q0a9K5c+fUu3dvrVix4pLjJSUlWrZsmVatWqXdu3erbdu2ys7O1oULF5q40mvTl/X3/Pnz2r9/vwoLC7V//379/ve/19GjR3Xvvfc2/kABXLOqq6sDkgK7du2KdCnG+PTTTwM333xzwOVyBe6+++7AI488EumSjDBz5szAgAEDIl2GsUaMGBGYOHFiyLrRo0cHcnJyIlSROSQFNmzYEFz2+/2B5OTkwNNPPx1cd/r06YDdbg/86le/ikCF17Yv9vdS9uzZE5AU+Oijjxq1b17BuYadOXNGkpSYmBjhSsyRl5enESNGKDMzM9KlGGXTpk1KT0/Xt7/9bXXs2FF9+vTRz3/+80iXZYw77rhD27dv13vvvSdJ+stf/qI333xTw4YNi3Bl5jl+/LjcbnfINSI+Pl79+vVTeXl5BCsz15kzZ2Sz2Rr9AdrN9qMa8OX8fr+mTZumO++8Uz179ox0OUZ4+eWXtX//fu3duzfSpRjngw8+0MqVK1VQUKCf/OQn2rt3rx5++GHFxsYqNzc30uVd82bNmiWPx6OuXbuqVatWqqur0xNPPKGcnJxIl2Yct9stSQ0+NigpKSk4ButcuHBBM2fO1He/+91Gf8I4AecalZeXp4MHD+rNN9+MdClG+Pjjj/XII4/I5XIpLi4u0uUYx+/3Kz09XU8++aQkqU+fPjp48KBWrVpFwLHAb37zG61bt07r169Xjx49VFlZqWnTpsnpdNJfXLN8Pp/Gjh2rQCCglStXNnp7blFdg/Lz87V582bt3LlTN9xwQ6TLMUJFRYWqq6v1jW98Q9HR0YqOjtauXbu0bNkyRUdHq66uLtIlXtM6deqk7t27h6zr1q2bTpw4EaGKzDJjxgzNmjVL48aNU69evTRhwgRNnz5dxcXFkS7NOMnJyZKkqqqqkPVVVVXBMVy9+nDz0UcfyeVyNfrVG4mAc00JBALKz8/Xhg0btGPHDqWmpka6JGMMHjxYBw4cUGVlZfBfenq6cnJyVFlZqVatWkW6xGvanXfe2eBPGrz33nvq0qVLhCoyy/nz5xUVFXo5b9Wqlfx+f4QqMldqaqqSk5O1ffv24DqPx6Pdu3crIyMjgpWZoz7cHDt2TK+//ro6dOhwRfvhFtU1JC8vT+vXr9cf/vAHtWvXLni/Nz4+Xq1bt45wdde2du3aNXgvU9u2bdWhQwfe42SB6dOn64477tCTTz6psWPHas+ePVq9erVWr14d6dKMMHLkSD3xxBPq3LmzevTooXfeeUdLlizRxIkTI13aNens2bN6//33g8vHjx9XZWWlEhMT1blzZ02bNk0LFy7UzTffrNTUVBUWFsrpdGrUqFGRK/oa8mX97dSpkx544AHt379fmzdvVl1dXfB3XWJiomJjY7/6ga742S40OUmX/LdmzZpIl2YkHhO31quvvhro2bNnwG63B7p27RpYvXp1pEsyhsfjCTzyyCOBzp07B+Li4gJf+9rXAo899ljA6/VGurRr0s6dOy95rc3NzQ0EAp89Kl5YWBhISkoK2O32wODBgwNHjx6NbNHXkC/r7/Hjxy/7u27nzp2NOo4tEOBPXQIAALPwHhwAAGAcAg4AADAOAQcAABiHgAMAAIxDwAEAAMYh4AAAAOMQcAAAgHEIOAAAwDgEHAAAYBwCDgAAMA4BBwAAGOf/AaIwOkGDv6xOAAAAAElFTkSuQmCC\n"
},
"metadata": {}
}
]
},
{
"cell_type": "code",
"source": [
"# Add molecule column\n",
"PandasTools.AddMoleculeColumnToFrame(output_df, smilesCol=\"smiles\")"
],
"metadata": {
"id": "YAYGFRulcybY"
},
"execution_count": 62,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Sort molecules by pIC50\n",
"output_df.sort_values(by=\"pIC50\", ascending=False, inplace=True)"
],
"metadata": {
"id": "hZFyN8i5c7WK"
},
"execution_count": 63,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# Reset index\n",
"output_df.reset_index(drop=True, inplace=True)"
],
"metadata": {
"id": "cahO1MlXc9xo"
},
"execution_count": 64,
"outputs": []
},
{
"cell_type": "code",
"source": [
"output_df.drop(\"smiles\", axis=1).head(10)"
],
"metadata": {
"id": "PjF0ghp5dAS3"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"print(f\"DataFrame shape: {output_df.shape}\")"
],
"metadata": {
"id": "vDcnL71Vd3_H",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "01bbd1ce-bfc2-49b5-87ff-1e3705723403"
},
"execution_count": 66,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"DataFrame shape: (6816, 6)\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"output_df.to_csv(\"EGFR_compounds.csv\")\n",
"output_df.head()"
],
"metadata": {
"id": "GuvoBlHmd_zh",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 337
},
"outputId": "90355bd5-153a-4154-9453-ceac65b2e778"
},
"execution_count": 67,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" molecule_chembl_id IC50 units smiles \\\n",
"0 CHEMBL63786 0.003 nM Brc1cccc(Nc2ncnc3cc4ccccc4cc23)c1 \n",
"1 CHEMBL53711 0.006 nM CN(C)c1cc2c(Nc3cccc(Br)c3)ncnc2cn1 \n",
"2 CHEMBL35820 0.006 nM CCOc1cc2ncnc(Nc3cccc(Br)c3)c2cc1OCC \n",
"3 CHEMBL53753 0.008 nM CNc1cc2c(Nc3cccc(Br)c3)ncnc2cn1 \n",
"4 CHEMBL66031 0.008 nM Brc1cccc(Nc2ncnc3cc4[nH]cnc4cc23)c1 \n",
"\n",
" pIC50 ROMol \n",
"0 11.522879 <rdkit.Chem.rdchem.Mol object at 0x78dddbff6ff0> \n",
"1 11.221849 <rdkit.Chem.rdchem.Mol object at 0x78dddbffa7a0> \n",
"2 11.221849 <rdkit.Chem.rdchem.Mol object at 0x78dddc005930> \n",
"3 11.096910 <rdkit.Chem.rdchem.Mol object at 0x78dddc0035a0> \n",
"4 11.096910 <rdkit.Chem.rdchem.Mol object at 0x78dddbffcb30> "
],
"text/html": [
"\n",
"\n",
" <div id=\"df-756707e0-08d9-46ed-860e-11c85cc2ec52\">\n",
" <div class=\"colab-df-container\">\n",
" <div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>molecule_chembl_id</th>\n",
" <th>IC50</th>\n",
" <th>units</th>\n",
" <th>smiles</th>\n",
" <th>pIC50</th>\n",
" <th>ROMol</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>CHEMBL63786</td>\n",
" <td>0.003</td>\n",
" <td>nM</td>\n",
" <td>Brc1cccc(Nc2ncnc3cc4ccccc4cc23)c1</td>\n",
" <td>11.522879</td>\n",
" <td><rdkit.Chem.rdchem.Mol object at 0x78dddbff6ff0></td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>CHEMBL53711</td>\n",
" <td>0.006</td>\n",
" <td>nM</td>\n",
" <td>CN(C)c1cc2c(Nc3cccc(Br)c3)ncnc2cn1</td>\n",
" <td>11.221849</td>\n",
" <td><rdkit.Chem.rdchem.Mol object at 0x78dddbffa7a0></td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>CHEMBL35820</td>\n",
" <td>0.006</td>\n",
" <td>nM</td>\n",
" <td>CCOc1cc2ncnc(Nc3cccc(Br)c3)c2cc1OCC</td>\n",
" <td>11.221849</td>\n",
" <td><rdkit.Chem.rdchem.Mol object at 0x78dddc005930></td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>CHEMBL53753</td>\n",
" <td>0.008</td>\n",
" <td>nM</td>\n",
" <td>CNc1cc2c(Nc3cccc(Br)c3)ncnc2cn1</td>\n",
" <td>11.096910</td>\n",
" <td><rdkit.Chem.rdchem.Mol object at 0x78dddc0035a0></td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>CHEMBL66031</td>\n",
" <td>0.008</td>\n",
" <td>nM</td>\n",
" <td>Brc1cccc(Nc2ncnc3cc4[nH]cnc4cc23)c1</td>\n",
" <td>11.096910</td>\n",
" <td><rdkit.Chem.rdchem.Mol object at 0x78dddbffcb30></td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>\n",
" <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-756707e0-08d9-46ed-860e-11c85cc2ec52')\"\n",
" title=\"Convert this dataframe to an interactive table.\"\n",
" style=\"display:none;\">\n",
"\n",
" <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
" <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
" </svg>\n",
" </button>\n",
"\n",
"\n",
"\n",
" <div id=\"df-8ca104ca-8aae-4952-98cc-56bb45da041a\">\n",
" <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-8ca104ca-8aae-4952-98cc-56bb45da041a')\"\n",
" title=\"Suggest charts.\"\n",
" style=\"display:none;\">\n",
"\n",
"<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
" width=\"24px\">\n",
" <g>\n",
" <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
" </g>\n",
"</svg>\n",
" </button>\n",
" </div>\n",
"\n",
"<style>\n",
" .colab-df-quickchart {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-quickchart:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-quickchart {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-quickchart:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
"</style>\n",
"\n",
" <script>\n",
" async function quickchart(key) {\n",
" const containerElement = document.querySelector('#' + key);\n",
" const charts = await google.colab.kernel.invokeFunction(\n",
" 'suggestCharts', [key], {});\n",
" }\n",
" </script>\n",
"\n",
" <script>\n",
"\n",
"function displayQuickchartButton(domScope) {\n",
" let quickchartButtonEl =\n",
" domScope.querySelector('#df-8ca104ca-8aae-4952-98cc-56bb45da041a button.colab-df-quickchart');\n",
" quickchartButtonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"}\n",
"\n",
" displayQuickchartButton(document);\n",
" </script>\n",
" <style>\n",
" .colab-df-container {\n",
" display:flex;\n",
" flex-wrap:wrap;\n",
" gap: 12px;\n",
" }\n",
"\n",
" .colab-df-convert {\n",
" background-color: #E8F0FE;\n",
" border: none;\n",
" border-radius: 50%;\n",
" cursor: pointer;\n",
" display: none;\n",
" fill: #1967D2;\n",
" height: 32px;\n",
" padding: 0 0 0 0;\n",
" width: 32px;\n",
" }\n",
"\n",
" .colab-df-convert:hover {\n",
" background-color: #E2EBFA;\n",
" box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
" fill: #174EA6;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert {\n",
" background-color: #3B4455;\n",
" fill: #D2E3FC;\n",
" }\n",
"\n",
" [theme=dark] .colab-df-convert:hover {\n",
" background-color: #434B5C;\n",
" box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
" filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
" fill: #FFFFFF;\n",
" }\n",
" </style>\n",
"\n",
" <script>\n",
" const buttonEl =\n",
" document.querySelector('#df-756707e0-08d9-46ed-860e-11c85cc2ec52 button.colab-df-convert');\n",
" buttonEl.style.display =\n",
" google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
"\n",
" async function convertToInteractive(key) {\n",
" const element = document.querySelector('#df-756707e0-08d9-46ed-860e-11c85cc2ec52');\n",
" const dataTable =\n",
" await google.colab.kernel.invokeFunction('convertToInteractive',\n",
" [key], {});\n",
" if (!dataTable) return;\n",
"\n",
" const docLinkHtml = 'Like what you see? Visit the ' +\n",
" '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
" + ' to learn more about interactive tables.';\n",
" element.innerHTML = '';\n",
" dataTable['output_type'] = 'display_data';\n",
" await google.colab.output.renderOutput(dataTable, element);\n",
" const docLink = document.createElement('div');\n",
" docLink.innerHTML = docLinkHtml;\n",
" element.appendChild(docLink);\n",
" }\n",
" </script>\n",
" </div>\n",
" </div>\n"
]
},
"metadata": {},
"execution_count": 67
}
]
}
],
"metadata": {
"colab": {
"provenance": [],
"authorship_tag": "ABX9TyNBRbQjyFSY30ZjBFgxg+ka",
"include_colab_link": true
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"name": "python"
},
"gpuClass": "standard",
"widgets": {
"application/vnd.jupyter.widget-state+json": {
"b15d0f15b84b4def826e1d239e28fdf1": {
"model_module": "@jupyter-widgets/controls",
"model_name": "HBoxModel",
"model_module_version": "1.5.0",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HBoxModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HBoxView",
"box_style": "",
"children": [
"IPY_MODEL_eeda925759c04afb881a76e88e3b0c66",
"IPY_MODEL_e83ec3ff21c7471894cdcf073964d992",
"IPY_MODEL_751660ec13fa410c8eb1385f719103d4"
],
"layout": "IPY_MODEL_d16a5b33961e4a849bf687d8746fd2e4"
}
},
"eeda925759c04afb881a76e88e3b0c66": {
"model_module": "@jupyter-widgets/controls",
"model_name": "HTMLModel",
"model_module_version": "1.5.0",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HTMLModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HTMLView",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_0ae3990397f94a449a159faab3251615",
"placeholder": "",
"style": "IPY_MODEL_fa08ea49c3ac43afb324fcaf8340a1e6",
"value": "100%"
}
},
"e83ec3ff21c7471894cdcf073964d992": {
"model_module": "@jupyter-widgets/controls",
"model_name": "FloatProgressModel",
"model_module_version": "1.5.0",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "FloatProgressModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "ProgressView",
"bar_style": "success",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_7f61a7b60b274044a7618ced159a9597",
"max": 6823,
"min": 0,
"orientation": "horizontal",
"style": "IPY_MODEL_d98b21f92e444e408761d83a7c8d9b1a",
"value": 6823
}
},
"751660ec13fa410c8eb1385f719103d4": {
"model_module": "@jupyter-widgets/controls",
"model_name": "HTMLModel",
"model_module_version": "1.5.0",
"state": {
"_dom_classes": [],
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "HTMLModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/controls",
"_view_module_version": "1.5.0",
"_view_name": "HTMLView",
"description": "",
"description_tooltip": null,
"layout": "IPY_MODEL_2039e2ed16ed479fb7f26731397481f9",
"placeholder": "",
"style": "IPY_MODEL_79ad69fdca97433984baf06f544de798",
"value": " 6823/6823 [11:21<00:00, 10.52it/s]"
}
},
"d16a5b33961e4a849bf687d8746fd2e4": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"0ae3990397f94a449a159faab3251615": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"fa08ea49c3ac43afb324fcaf8340a1e6": {
"model_module": "@jupyter-widgets/controls",
"model_name": "DescriptionStyleModel",
"model_module_version": "1.5.0",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "DescriptionStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"description_width": ""
}
},
"7f61a7b60b274044a7618ced159a9597": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"d98b21f92e444e408761d83a7c8d9b1a": {
"model_module": "@jupyter-widgets/controls",
"model_name": "ProgressStyleModel",
"model_module_version": "1.5.0",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "ProgressStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"bar_color": null,
"description_width": ""
}
},
"2039e2ed16ed479fb7f26731397481f9": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_model_module": "@jupyter-widgets/base",
"_model_module_version": "1.2.0",
"_model_name": "LayoutModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "LayoutView",
"align_content": null,
"align_items": null,
"align_self": null,
"border": null,
"bottom": null,
"display": null,
"flex": null,
"flex_flow": null,
"grid_area": null,
"grid_auto_columns": null,
"grid_auto_flow": null,
"grid_auto_rows": null,
"grid_column": null,
"grid_gap": null,
"grid_row": null,
"grid_template_areas": null,
"grid_template_columns": null,
"grid_template_rows": null,
"height": null,
"justify_content": null,
"justify_items": null,
"left": null,
"margin": null,
"max_height": null,
"max_width": null,
"min_height": null,
"min_width": null,
"object_fit": null,
"object_position": null,
"order": null,
"overflow": null,
"overflow_x": null,
"overflow_y": null,
"padding": null,
"right": null,
"top": null,
"visibility": null,
"width": null
}
},
"79ad69fdca97433984baf06f544de798": {
"model_module": "@jupyter-widgets/controls",
"model_name": "DescriptionStyleModel",
"model_module_version": "1.5.0",
"state": {
"_model_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"_model_name": "DescriptionStyleModel",
"_view_count": null,
"_view_module": "@jupyter-widgets/base",
"_view_module_version": "1.2.0",
"_view_name": "StyleView",
"description_width": ""
}
}
}
}
},
"nbformat": 4,
"nbformat_minor": 0
}