505 lines (504 with data), 95.4 kB
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Predicting JunD binding"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this use case we compare different models using different input features to predict the binding of the JunD transcription factor."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"#!pip install rpy2\n",
"#!pip install tzlocal\n",
"#!conda install --yes -c bioconda bedtools samtools\n",
"#!conda install --yes r-ggplot2"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"%load_ext rpy2.ipython\n",
"\n",
"from IPython.display import Image"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"datadir = '../data'\n",
"outputdir = '../jund_results'"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(['ggplot2', 'tools', 'stats', 'graphics', 'grDevices', 'utils',\n",
" 'datasets', 'methods', 'base'], dtype='<U9')"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%R library(ggplot2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Run a grid search for a DNA only model.\n",
"(Since the following parts are rather time consuming, they will be reran only if you remove the hashtags)."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"#!python dna_only.py -inputpath {datadir} -path {outputdir}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Run a grid search on a DNase only model"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"#!python dnase_only.py -inputpath {datadir} -path {outputdir}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Fit models that use DNA and DNase as input simultaneously"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"#!python dnase_dna_model.py -inputpath {datadir} -path {outputdir}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, the following script produces a few sample plots using the plotGenomeTrack functionality."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"#!python plot_dnase_dna.py -inputpath {datadir} -path {outputdir}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Illustration of an example genomic locus. The figure shows the concordance between the ground truth JunD bound region and the predicted binding site. Underneath, the DNase coverage is shown for thtwo samples."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<IPython.core.display.Image object>"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"Image(os.path.join(outputdir, 'jund_input_outout_line.png'))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Illustration of input feature importance discovered by integrated gradients. The site in the center which is highlighted as being important closely resembles as JunD binding motif."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<IPython.core.display.Image object>"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"Image(os.path.join(outputdir, 'jund_input_attribution_dna.png'))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The overall performances using only the DNA sequence as input shows that the use of higher order features improves the prediction quality."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"results = os.path.join(outputdir, 'dna_gridsearch_5.tsv')"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAeAAAAHgCAMAAABKCk6nAAACuFBMVEUAAAABAQECAgIDAwMEBAQGBgYHBwcICAgJCQkKCgoMDAwNDQ0PDw8SEhITExMUFBQVFRUWFhYXFxcYGBgZGRkaGhobGxscHBwdHR0eHh4fHx8gICAhISEiIiIjIyMkJCQlJSUmJiYnJycoKCgpKSkqKiorKyswMDAxMTEyMjIzMzM0NDQ1NTU2NjY3Nzc4ODg5OTk7Ozs8PDw9PT0+Pj5AQEBBQUFCQkJDQ0NERERFRUVGRkZHR0dISEhJSUlKSkpLS0tMTExNTU1OTk5PT09QUFBRUVFTU1NUVFRWVlZYWFhZWVlaWlpbW1tcXFxdXV1eXl5fX19gYGBhYWFjY2NkZGRlZWVmZmZnZ2doaGhpaWlqampra2tsbGxtbW1ubm5vb29wcHBxcXFycnJzc3N0dHR1dXV2dnZ3d3d4eHh5eXl6enp7e3t8fHx9fX1+fn5/f3+AgICBgYGCgoKDg4OEhISFhYWGhoaHh4eIiIiJiYmKioqLi4uMjIyNjY2Ojo6Pj4+QkJCRkZGSkpKTk5OUlJSVlZWWlpaXl5eYmJiZmZmbm5udnZ2enp6fn5+goKChoaGjo6OkpKSlpaWmpqanp6eoqKipqamqqqqrq6usrKytra2vr6+wsLCysrKzs7O0tLS1tbW2tra3t7e4uLi5ubm6urq7u7u8vLy9vb2+vr6/v7/AwMDBwcHCwsLDw8PExMTFxcXGxsbHx8fIyMjJycnKysrLy8vMzMzPz8/Q0NDR0dHS0tLT09PU1NTV1dXW1tbX19fY2NjZ2dnc3Nzd3d3e3t7f39/g4ODh4eHi4uLj4+Pk5OTl5eXm5ubn5+fo6Ojp6enq6urr6+vs7Ozt7e3u7u7v7+/w8PDx8fHy8vLz8/P09PT19fX29vb39/f4+Pj5+fn6+vr7+/v8/Pz9/f3+/v7///8jIy7/AAAP8klEQVR4nO3dj39V9X3HcdZ1zq52s11ni7SCTlwXa107LNIxqtJhcdi6sG7FWkVrsiClIFoUs+YHgqAdpLYQqbQ0tGogdIPqCBvODYt0K2hB8ov8IiG5yfff2LnnnEs+TbvDydl535PevJ6PB7nnnnu/H7/wetx7cxMi0xxK2rSsNwAtApc4Apc4Apc4Apc4Ape4C4Grpj2T5T4gciHwh668Oct9QKQQ+F9/54e/fTrTnUCiEPiBhaNXbM5yI9AIA49esd19+aZstwKFMPA/X9qV2/uOX2S7FwiEgb80LW9jtnuBQBA4995veR/v/kS2e4FAELj5krPex32/9Wa2m0H6gsB/+1f5j7nL6zLdCwT4UmWJSxD42SdTVbsh3XlJbKjNegeeuvqUB55PGPjJiS+J0jWU7rwkhjuz3oGntz/VcSNlvfkLAucR2CKwBoF1CGwRWIPAOgS2CKxBYB0CWwTWILAOgS0CaxBYh8AWgTUIrENgi8AaBNYhsEVgDQLrENgisEZbV6rjCGxMisBf35zqOAIbBLYIrEFgHQJbBNYgsA6BLQJrEFiHwBaBNQisQ2CLwBoE1iGwRWANAusQ2CKwBoF1CGwRWIPAOgS2CKxBYB0CWwTWILAOgS0CaxBYh8AWgTUIrENgi8AaBNYhsEVgDQLrTNXAwzVrt/kHR2pd210VFW/5VwiskUHgAzvcYye9y/6qGne0sXCWwBoZBN56yDW2eJebD9e4/bUb9wdnCayRQeBNx13zbudad71d4w691LbqVeceLyv7WhsU1m5IddzbcR7Brf4jeOUjq5e+4F1t+a5zA93dG0ZT1XU+3XlJDHVmvQPP159KdVwuzmvwTld9In/gPYK3H3ENB/2zPEVrZPFZdF11gztW7wc+s2JV3Yh/lsAaWb0Pzr047gSBNbIK3D8w7gSB4/jPzyycoJvnTXTFoqh/TJSvZBmCwC13HpW75T8iNkBgQxH471If+SsWETgmAlsEjoPABQROjMBxEdgicBwELiBwYgSOi8AWgeMgcAGBEyNwXAS2CBwHgQsInBiB4yKwReA4CFxA4MQIHBeBLQLHQeACAidG4LgIbBE4DgIXEDgxAsd19kDqIwlcMBkCv3ZL6iMJXEDgxAgcF4EtAsdB4AJB4B82TNATN010RcN/XWQPBC4QBP7kQ+vU7lp3kT0QuEAR+ETqI8fbRuC4CJwYgXUIHBuBEyOwDoFjI3BiBNa5eOC/eVPuMwSWuXjgOWVyf05gGQLHJgj8cf2fbdmai+yB1+CCUg18xyG5BVM18NMrJ+jeeRNdsfJifwfk2Oc/N0ELbp3oivLTERso5cATNqX/X5W/gsAaBNYhsEVgDQLrENgisAaBdSZF4O/vTXUcgY1JEbi3P9VxBDYIbBFYg8A6BLY25FLVOZjuvCTOd2a9A09Pb6rjhhIHrhtIVWdfuvOS6O/Iegees92pjuvnKXoMT9EWgTUIrENgi8AaBNYhsEVgDQLrENgisAaBdQhsEViDwDoEtgisQWAdAlsE1iCwDoEtAmsQWIfAFoE1CKxDYIvAGgTWIbBFYA0C6xDYIrAGgXUIbBFYg8A6BLYIrEFgHQJbBNYgsA6BLQJrEFiHwBaBNQisQ2CLwBoE1iGwRWANAusQ2CKwBoF1CGwRWIPAOgS2CKxBYB0CWwTWILAOgS0CaxBYh8AWgTUIrENgi8AaBNYhsEVgDQLrTNXAwzVrt/kHR2rHjgkskkHgAzvcYye9y/6qmgvHBFbJIPDWQ66xxbvcfLjmwjGBVTIIvOm4a97tXOuut2sKx1sWLlzTmar2jnTnJdHRnvUOPO3pbqI9ziO41X/Urnxk9dIXwuNTR4/WDqeqcyDdeUkMdmS9A093b6rjzsd5Dd7pqk/kD7xH8IVjnqJFsvgsuq66wR2r9wP7xz4Ca2T1Pjj34rgTBNbIKnD/wLgTBNbgK1k6BLYIrEFgHQJbBNYgsA6BLQJrEFiHwBaBNQisQ2CLwBoE1iGwRWANAusQ2CKwBoF1CGwRWIPAOgS2CKxBYJ0pFHjkLed290auJLBGUQIfu3q1c3M/+GrUSgJrFCXwp7bkPz53c9RKAmsUJfB7gmfnD0StJLBGUQJf4/9Hzs2MWklgjaIEvn/5iHOjDy6LWklgjaIEHrht1tIvXDenJ2olgTWK9D74357Z+OPolQTWKE7ggdPeh1xkYgJrFCXws5dc8tGXr3vnB6NWElijKIGnv5R7+h17RiJXElijKIEv994j/f5FVhJYoyiB/zD8FYXAGkUJ/L7BwcH8r6iVBNYoSuB3hqJWEliD7wfrTJ3Ap+6e+0DXRVYSWKM43y5c2PDJJRdZSWCNogS+7Kz7WeRXORyBVXibpDN1Ar8vl8vlf0WtJLBGUQJPC0WtJLAGb5N0plLgprz9USsJrFGkwOXl5QsurYpaSWCNIj5Fn14UtZLAGkUMPDI7aiWBNYr3FF1+feQXswisUcRPsvaO/19E/xICaxQp8OhPjxw5fGvUSgJrFCnw8lnvnvMHD0etJLBGkQJfO7T68M8/G7WSwBpFCvz+oaYaVxa1ksAaRQp8z/wzs6vmR60ksEaxPsl6zf3kkTeiVhJYY9J8s6G2L1UdPenOS6KnI+sdeDq7Uh3Xw3eTxvAItgisUaqBN0a+4BcHga2UAy9+Jd15SRDYIrAGgXUIbBFYg8A6BLYIrEFgHQJbBNYgsA6BLQJrEFiHwBaBNQisQ2CLwBoE1iGwRWANAusQ2CKwBoF1CGwRWIPAOgS2CKxBYB0CWwTWILAOgS0CaxBYh8AWgTUIrENgi8AaBNYhsEVgDQLrENiKDDzSPVG375voish/eC8RAluRgZ8p07sp2e84AoGtyMD1TyXbzgR03JD6SAJbBNb4zQhcN/9zand8NNnvOAKBrejARXgNJnAcqsAVzWrf4yk6Dl6DDQJbBNYgcIjAsRDYILBFYA0ChwgcC4ENAlsE1iBwiMCxENggsEVgDQKHCBxLnMDDNWu3eRedVSvqRtvuqqh4yz9LYI0MAh/Y4R476dxzzW79G0cbC2cJrJFB4K2HXGOLc6fOta1s31+7cX9wlsAaGQTedNw17/YuBysfGjj0UtuqV53bcffdj56N8EQxAkdtIJGu9tRHTlxHR6rjOuM8glv9R/Col3qvd7Xlu8693tz8j4MRaooROGoDifR3pD5y4s52pzruXJzX4J2u+oRzG15332nZfsQ1HPTP8hStkcVn0XXVDe5Y/YnK1dVDZ1asqhvxzxJYI6v3wbkXx50gsEZWgfsHxp0gsAZfyQoROBYCGwS2CKxB4BCBYyGwQWCLwBoEDhE4FgIbBLYIrEHgEIFjIbBBYIvAGgQOETgWAhsEtgisQeAQgWMhsEFgi8AaBA4ROBYCGwS2CKzxGxL43l1qzxI4DlHgl6sm6iuVE11Rnex3HIHAVsr/ZkPXULrzkiCwRWANAusQ2CKwBoF1CGwRWIPAOgS2CKxBYB0CWwTWILAOgS0CaxBYh8AWgTUIrENgi8AaBNYhsEVgDQLrENgisAaBdQhsEViDwDoEtgisQWAdAlsE1iCwDoEtAmsQWIfAFoE1CKxDYKu2L1UdPenOS6KnI+sdeDq7Uh3XwyN4DI9gi8AaBNYhsEVgDQLrENgisAaBdQhsEViDwDoEtgisQWAdAlsE1iCwDoEtAmsQWIfAFoE1CKxDYIvAGgTWIbBFYA0C6xDYIrAGgXUIbBFYg8A6BLYIrEFgHQJbBNYgsA6BLQJrEFiHwBaBNQisQ2CLwBoE1iGwRWANAusQ2CKwBoF1CGwRWIPAOgS2CKxBYB0CWwTWILAOgS0CaxBYh8AWgTUIrENgi8AaBNYhsEVgDQLrENgisEZpBh59atmD/53mwESmauDhmrXbvIvOqhV1o+GxSznwnrKysnvTHJjIVA18YId77KRzzzW79W+Exy7lwBu8wGWjaU5MYqoG3nrINbY4d+pc28r28NilHPgnXt/KNAcmMlUDbzrumnd7l4OVDw2Exz9at25db5q+d9/6N1MdmER3e9Y78HR2pTquO84juNV/1HrPoJv2hsevNDSsP5eqjt505yXR15H1DjxdZ1Md1xfnNXinqz7hvU6+7r7TEh473iapZPFZdF11gztWf6JydfWQf+wjsEZW74NzL447QWCNrAL3D4w7QWCN0vxKFoEvILAOgS0CaxBYh8AWgTUIrENgi8AaBNYhsEVgDQLrENgisMakCfzFqlTdU5HuvCQq7sl6B57ly1Md9w9lfQkDv7QrVZ+uTXdeEpvmZr0DzxcfSHde8F3ABIFT9teHs96Bc0dvy3oHnie+qZiafeBv/k/WO3Du1FNZ78DzwgHF1OwDQ4rAJS77wI+O/xsjxde/asWj5zPeg//jI4K5WQfurbgt+8BNz7vtzRnvwf/xEcHcrAO7ka9mH/hYu9v944z34P/4iGBu5oHdquwDO9e6It0vIyWQ//ERwVgC5+2s7ct6C/6PjwjmEtjz8tasdxD++IhgLoE9W+6rqvqXjPfg//iIYG72gSFF4BJH4BJH4BJH4BJH4BJXEoEHqq55/xzzJnJ9/a+7V/ntCcf/8sL2P044JhslEfhTf985evDDTd5RriN/fSxwcN3Xf0XC6eMWmsBm+qRVCoF/9JER7+O+a93Bxdc9PrTsyhuW1LtvfOiaNf71/B3WXDVzrVv6u0u9w+5bPzBznwtuzi2f8ZEl9U3l3mO0KVxwZ+WixX1u5VXTG8L7ePIL/ePRL//R7OWji9/1pWCJP92/IRw6KZVC4Icfyn8ceVfPwcuOuy1/ef7t6fX7ru/oWfDt/HXPnhv7+294oW9G/njz513zchfcvNW/a1ArXHBpp1v4/efnDZ18T29wxlviLQyOX5s/NDTrp94jOAzsTQ9uCIZOTqUQ+Kur/YvLzhxc4Nwdu52rqP/aVfPmXbs8f91TucW5J1cEgY/MWPWKc8HNd/p3DWqFC+Y6d3/jsmedO5sLzjg/cHh8Zs+adx8ZC7ygMCgYOjmVQuBdH89/PDzdHfysc0t+4FxV/ePV3ivkcP66p+Jp5zZWBoFd15b5i1xw82L/rvlaC5rGFtzf+IXtzrWdD844P3Bw/PLsb7TOKQRe0JS/c3gnf+jkVAqBR26s6HP/PnuHH/iZW4c7rqpv/ZOeobnPh4F/8ImBcx/bEwReu9r94vdGg5s3+Hfde/3I6Uubxhbc3/jtW4bb33smOOP8wMHxI1Xu55cfbp/pgiX5Owc3BEOz/CP4v5VCYNdz36wr/sz7JDr/Rz607MMfe/hbru6a6RUuDOxWXT1rjQsC/+yGK2duc8HNg+UzblxSP7Dk2kVV+8cWfOX5kQevnvFP4X2cHzg4fuMv/vT2h2/JXV8eLPGn+zeEQyelkgj8//Dr3zKXEAJnvQOxqR54qDvrHYhN9cAlj8AljsAljsAljsAl7n8B0zvSlaU3awUAAAAASUVORK5CYII=\n"
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%%R -i results\n",
"\n",
"dna = read.table(results)\n",
"#print(aggregate(dna[,2], list(dna[,3], dna[,4]), median))\n",
"\n",
"g1 = ggplot(subset(dna, dropout==0.0), aes(x=as.factor(order), y=auprc_val)) + geom_boxplot() + xlab(\"Order of sequence features\") + ylab(\"auPRC\") + theme_bw() +labs(tag=\"A\")\n",
"print(g1)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"figure = os.path.join(outputdir, 'dna_gridsearch.png')\n",
"\n",
"%R -i figure ggsave(figure, height=4, width=3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Furthermore, by using a dropout at the initial layer a slight performance improvement is obtained from tri-nucleotide based sequence encoding."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n"
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%%R\n",
"g1 = ggplot(dna, aes(x=as.factor(order), y=auprc_val, color=as.factor(dropout))) + \n",
" geom_boxplot() + xlab(\"Order of sequence features\") + \n",
" ylab(\"auPRC\") + guides(color=guide_legend(title=\"Dropout rate\")) + \n",
" theme_bw() + labs(tag=\"A\")\n",
"print(g1)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"figure = os.path.join(outputdir, 'dna_gridsearch_drop.png')\n",
"%R -i figure ggsave(figure, height=4, width=3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, we explore the JunD predictions when only DNase-seq coverage is used as input. In particular, we investigate the effect of data normalization and augmentation."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
"dnaseresults = os.path.join(outputdir, \"dnase_gridsearch_7.tsv\")"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [],
"source": [
"%%R -i dnaseresults\n",
"\n",
"dnase = read.table(dnaseresults, stringsAsFactors = F)\n",
"dnase$augment[dnase$augment==\"orient\"] = \"Flip orientation\"\n",
"dnase$augment[dnase$augment==\"none\"] = \"None\"\n",
"dnase$normalize[dnase$normalize==\"tpm\"] = \"TPM\"\n",
"dnase$normalize[dnase$normalize==\"none\"] = \"None\"\n",
"dnase$normalize[dnase$normalize==\"zscorelog\"] = \"Z-score of log-counts\"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Data normalization using transcripts-per-million (TPM) or Z-Score after log-transformation of the counts both improve the prediction accuracy considerably compared to not using normalization.\n",
"Furthermore, data augmentation by flipping 5' to 3' orientations of the coverage tracks further improves the results."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n"
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%%R\n",
"g2=ggplot(subset(dnase, normalize!=\"zscore\"), aes(x=normalize, y=auprc_test, color=augment)) + \n",
" geom_boxplot() + xlab(\"Normalization\") + \n",
" ylab(\"auPRC\") + \n",
" guides(color=guide_legend(title=\"Data augmentation\")) + theme_bw() + labs(tag=\"B\")\n",
"print(g2)"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [],
"source": [
"figure = os.path.join(outputdir, \"dnase_gridsearch.png\")\n",
"%R -i figure ggsave(figure, height=4, width=6)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, we use a model that takes DNA and DNase-seq as input for the prediction of JunD. This model obtains superior performance compared to using either input separately for the prediction."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [],
"source": [
"randres = os.path.join(outputdir, \"dnase_dna_use_randominit_submodels.tsv\")"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>auprc_val</th>\n",
" <th>auprc_test</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>0.693194</td>\n",
" <td>0.675581</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.709087</td>\n",
" <td>0.702321</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0.693212</td>\n",
" <td>0.672333</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>0.681510</td>\n",
" <td>0.672720</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>0.703246</td>\n",
" <td>0.693258</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" auprc_val auprc_test\n",
"0 0.693194 0.675581\n",
"1 0.709087 0.702321\n",
"2 0.693212 0.672333\n",
"3 0.681510 0.672720\n",
"4 0.703246 0.693258"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%R -i randres df = read.table(randres, stringsAsFactors = F)"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAeAAAAHgCAMAAABKCk6nAAACkVBMVEUAAAACAgIEBAQHBwcICAgJCQkNDQ0PDw8RERESEhITExMUFBQVFRUWFhYXFxcYGBgZGRkcHBwdHR0eHh4fHx8hISEiIiIjIyMkJCQlJSUmJiYnJycoKCgpKSkqKiorKyswMDAzMzM0NDQ1NTU2NjY3Nzc4ODg7Ozs8PDw9PT0+Pj5BQUFCQkJDQ0NERERFRUVGRkZHR0dISEhJSUlKSkpMTExNTU1OTk5PT09QUFBRUVFSUlJTU1NUVFRVVVVWVlZYWFhaWlpbW1tgYGBhYWFiYmJkZGRlZWVmZmZnZ2doaGhpaWlqampra2tsbGxtbW1ubm5vb29wcHBxcXFycnJzc3N0dHR1dXV2dnZ3d3d4eHh5eXl6enp8fHx9fX1+fn5/f3+AgICBgYGCgoKDg4OEhISFhYWGhoaHh4eIiIiJiYmKioqLi4uMjIyNjY2Ojo6Pj4+QkJCRkZGSkpKTk5OUlJSVlZWWlpaXl5eYmJiZmZmampqbm5udnZ2enp6fn5+goKChoaGjo6OkpKSlpaWmpqanp6eoqKipqamqqqqrq6usrKytra2urq6vr6+wsLCxsbGysrKzs7O0tLS1tbW2tra3t7e4uLi5ubm6urq7u7u8vLy9vb2+vr6/v7/AwMDBwcHCwsLDw8PExMTFxcXGxsbHx8fIyMjJycnKysrLy8vMzMzNzc3Ozs7Q0NDR0dHS0tLT09PV1dXW1tbX19fY2NjZ2dna2trb29vc3Nzd3d3e3t7g4ODh4eHi4uLj4+Pk5OTl5eXm5ubn5+fo6Ojp6enq6urr6+vt7e3u7u7v7+/w8PDx8fHy8vLz8/P09PT19fX29vb39/f4+Pj5+fn6+vr7+/v8/Pz9/f3+/v7////AhdfAAAALzklEQVR4nO3djXMV1R2HcaqlVm1tgVZFqhQL1SrVqghq6yvUWq0CEoMg8mpBSIUEpArWEKU1gEYbReVFsUBfjG9oralaxVIIVZtASExucv6a7r2byM+Ms2aYPbv3fH0+M7m7c2ZP5sw+s5u7NzPJEAdpQ/JeAPwisDgCiyOwOAKLI7C4/sD/uGr4iRNez3Up8KEv8N9OmP3anhtO3pfvYpC+vsDnzCq+jl2Q51LgQxz4n0P2Fzd73851LfAgDrz56zkvA77EgRu+mfMy4Esc+PUhHxc3f/xVrmuBB3HgwqhFxc3FP891LfCg7130M0PnvtE8e+hb+S4G6ev/oKNpwiknXdiU61LgAx9VijuWwA+vklNdk/cKPPjkWAOvOoY5Za61M+8VpK5nzKHihsAlBLYIHAICWwS2CBwCAlsEtggcAgJbBLYIHAICWwS2CBwCAlsEtggcAgJbBLYIHAICWwS2CBwCAlsEtgQDHziU9wpSR2Br6uN5ryB1BLYIbBE4BAS2CGwROAQEtghsETgEBLYIbBE4BAS2CGwROAQEtghsETgEBLYIbBE4BAS2CGwROAQEtghsETgEBLYIbBE4BAS2CGwROAQEtghsETgEBLa+rIG7ly9cF202VVbeWNu37wgchkEF3vWIW/JBaW/1/qP7BA7BoALXvujqny/uvP/g0X0CB2FQgVe/67Y1FneWt/Xv71i1aslhObdszHsFqWsb1BXcFF+1h3/96T6BAzGowLs2uqq90fZPjUf3uUWHYXDvoqur6lxzjbtnb99+CYFDMPjn4MKWAQMEDsHgA7d3DBggcAj4JMsisEXgEBDYIrBF4BAQ2CKwReAQENgisEXgEBDYIrBF4BAQ2CKwReAQENgisEXgEBDYIrBF4BAQ2CKwReAQENgisEXgEBDYIrBF4BAQ2CKwReAQENgisEXgEBDYIrBF4BAQ2CKwReAQENgisEXgEBDYIrBF4BAQ2CKwReAQENgisEXgEBDYIrBF4BAQ2CKwReAQENgisEXgEBDYIrBF4BAQ2CKwReAQENgisEXgEBDYIrBF4BAQ2CKwReAQENgisEXgEBDYIrBF4BAQ2CKwtSLvf42bvi/r/w/+fFzBIeAWbRHYInAICGwR2CJwCAhsEdgicAgIbBHYInAICGwR2CJwCAhsEdgicAgIbBHYInAICGwR2CJwCAhsEdgicAgIbBHYInAIyifw+2/mb8p9ea8g8lGqp7VsAveOnTgJkQvuSPW8lk/gMa2pfr9gNcxM9dsRuNwQWByBxRFYHIHFEVgcgcURWByBxRFYHIHFEVgcgcURWByBxRFYHIHFEVgcgcURWByBxRFYHIHFEVgcgcURWByBxRFYHIHFEVgcgcURWByBxRFYHIHFEVgcgcXlEbh7+cJ10aZ37V0re9vm3f5gPEpgL/IIvOsRt+QD55rWu537/7DVLXuvNEpgL/IIXPuiq3/eubqHVm5xK95xDdtKowT2Io/Aq9912xqjzbKWRa8+/sCB255w7u4xY+b+N00HCRxrmJbqeW0Z1BXcVLqC173idmzqrq+pec65jtbWlb1p6iFwrGFmque1MKifwRtd1d5os8HV7vz7m27pf0qj3KK9yOVddHVVnWuu6V56R1Wh/e4lm+JRAnuR23NwYcuAAQJ7kVvg9o4BAwT2gk+yxBFYHIHFEVgcgcURWByBxRFYHIHFEVgcgcURWFxGgXv2Odd4KHEqgb3IJnDzGfOcGz98T9JUAnuRTeAL7y++bvhx0lQCe5FN4BPju/OwpKkE9iKbwKPai69HTk+aSmAvsgl86/Se6JzfdmPSVAJ7kU3gjokjp1w/+ry2pKkE9iKr5+CX1977QvJUAnuRUeCOA9FLITExgb3IJvDDxx//g7+MPm540lQCe5FN4BFbC2u+srkncSqBvcgm8DeiZ6STvmAqgb3IJvC3+r6SENiLbAKf0tnZWfxKmkpgL7IJfFyfpKkE9oLfB4vLJvD+m8bP+PgLphLYi4x+XTip7vxrv2Aqgb3IJvAJ/3P/SvyUwxHYEx6TxGX0mFQoFIpfSVMJ7EU2gYf0SZpKYC94TBKXVeCninYkTSWwF1kFnjx58iVDZyVNJbAXWd6iD1yRNJXAXmQZuOespKkE9iLDW/TksxM/zCKwF1m+ydo+8G+ffQaBvcgqcO9br7320qVJUwnsRVaBp4/82nknL0iaSmAvsgp8Zte8l96/Mmkqgb3IKvC3u55a7sYkTSWwF1kFvvmig2fNuihpKoG9yOxN1hvuz4veSZpKYC/4ZYM4AosjsDgCiyOwOAKLI7A4AosjsDgCiyOwOAKLK5/AKw6n6RCBYw0zUj2vbVzBZaZ8rmACe0FgcQQWR2BxBBZHYHEEFkdgcQQWR2BxBBZHYHEEFkdgcQQWR2BxBBZHYHEEFkdgcQQWR2BxBBZHYHEEFkdgcQQWR2BxBBZHYHEEFkdgcQQWR2BxBBZHYHEEFkdgcQQWR2BxBBZHYHEEFkdgcQQWR2BxBBZHYHEEFkdgcQQWR2BxBBZHYHEEFkdgcQQWR2BxBBZHYHF5BO5evnBdtOlde9fK3va5c6p7S6ME9iKPwLsecUs+cK5pvdu5/9nH3Irm0iiBvcgjcO2Lrv555+oeWrnFvVL90ZwW5/5aV7f0SJraCRxrqEj1vB4eTODV77ptjdFmWcuiV1unzp7X5dwzixcvPpSmNgLHGmakel5bB3UFN5Wu4HWvuB2bfveSe2xraZRbtBe5/Aze6Kr2RpsNrnbnfU1u09OlUQJ7kcu76OqqOtdc0730jqpCy5y5iztLowT2Irfn4MKWAQME9iK3wO0dAwYI7AWfZIkjsDgCiyOwOAKLI7A4AosjsDgCiyOwOAKLI7A4AosjsDgCiyOwOAKLI7A4AosjsDgCiyOwOAKLI7A4AosjsDgCiyOwOAKLI7A4AosjsDgCiyOwOAKLI7A4AosjsDgCiyOwOAKLI7A4AosjsDgCiyOwOAKLI7A4AosjsDjdwL+4CZGfTUv1vJZRYMRuTvW8llHg3z+GyJ0VqZ7XMgrMz+AS3Z/BBC4hsDgCiyOwuPIJXN2RpiMEjjVUpHpe24898MpCmroJHGuoTPW8dnGLLjPlc4smsBcEFkdgcQQWR2BxBBZHYHEEFkdgcQQWR2BxBBZHYHEEFkdgcQQWR2BxBBZHYHEEFkdgcQQWR2BxBBZHYHEEFqcb+K1/I1IrGthNPD9/55yX9woiv0n1tJZP4HIw9fG8V5A6AlsEtggcAgJbBLYEA29/O+8VpI7AVmtn3itIHYEtAlsEDgGBLQJbBA4BgS0CWwQOAYEtAlsEDgGBLQJbBA4BgS0CWwQOAYEtAlsEDgGBLQJbBA4BgS0CW7+cJWdqRd4rSN3tYw4fa+Ctef8bx/RdsyDvFaRvizvWwIJmPJn3CnwhcMmGPXmvwBcCiyOwOAKLI7A4AosjsDgCiyOwOAKLI7A4AosjsDgCiyOwOAKLUw/cWfG9E3/SPHB0zYLia/3ZA8eX1nz+d+kf335NmmvLhHrgSdPa2347qmfAaBy47b2BRxM4NC+PLLadt8/NP+30hW73ZVPOrZx52SWda647d8Tkjt1X7r5u5hVXH3b3fHfUfNd143fGXlsM+elR8aS+8dIxBC43a6fE280/bG8f++zukzvbv/q0++nmNcMOdk1YEQUe+pGb9MRzZ3/Ydsn6+y/+pGVEKXD/UfGkeDw+hsDl5t7r4+3M+51bNXv3ROeGfeJuXb9mWtT88ijweOdurZ972gUXnDn9mkbnKkuB+4+KJ8Xj8TEELjcvjO6NXsc3Vq6JYs/cfXmUrlAMPN25p66KAl9ZDHx3lXOF7mufdG5WKXD/UfGkeDw+hsBl50e3dbiNww49eW7HkXGbjwYefrD70gf6Azd9v61r/KNrL+3+8LTPBI4nxePxMQQuO603nDps/B7n7jxj5Pz+dBX19XPGnXpLdxy44lFXPWpEZfRm6tRxCx5y7uhR8aS+8dIxBEa5IbA4AosjsDgCiyOwOAKLI7C4/wO0YIL9grmjJgAAAABJRU5ErkJggg==\n"
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%%R\n",
"\n",
"df$init = \"\"\n",
"\n",
"g3 <- ggplot(df, aes(x=init, y=auprc_test)) + geom_boxplot() + xlab(\"Combined model\") + ylab(\"auPRC\") + theme_bw() + labs(tag=\"C\")\n",
"\n",
"print(g3)"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [],
"source": [
"figure = os.path.join(outputdir, \"dna_dnase_joint.png\")\n",
"%R -i figure ggsave(figure, height=4, width=2)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}