--- a +++ b/.ipynb_checkpoints/SNAREseq_replicate-checkpoint.ipynb @@ -0,0 +1,190 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Notebook for running SCOT on SNARE-seq Cell Mixture Data\n", + "Access to the raw dataset: Gene Expression Omnibus accession no GSE126074 \n", + "SNARE-seq data in `/data` folder containes the version with dimensionality reduction techniques applied from the original SNARE-seq paper (https://www.nature.com/articles/s41587-019-0290-0) \n", + "SCOT software has been updated on 20 September 2020. It now outputs error statements for convergence issues. When it runs into numerical instabilities in convergence, it outputs None, None instead of X_new, y_new. If you run into such an error, please try using a larger epsilon value for the entropic regularization. \n", + "If you have any questions, e-mail: ritambhara@brown.edu, pinar_demetci@brown.edu, rebecca_santorella@brown.edu " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "##### Import source code:" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "import src.utils as ut\n", + "import src.evals as evals\n", + "from src.scot import *" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "##### Read in the data:" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Dimensions of input datasets are: X= (1047, 10) y= (1047, 19)\n" + ] + } + ], + "source": [ + "X=np.exp(np.load(\"data/scrna_feat.npy\"))\n", + "y=np.load(\"data/scatac_feat.npy\")\n", + "print(\"Dimensions of input datasets are: \", \"X= \", X.shape, \" y= \", y.shape)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "##### Perform normalization (optional):" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "X=ut.unit_normalize(X)\n", + "y=ut.unit_normalize(y)\n", + "\n", + "## If you'd like to apply z-score normalization instead:\n", + "# X=ut.zscore_standardize()\n", + "# y=ut.zscore_standardize()\n", + "# Note that zscore_standardize doesn't yield as good results on this dataset and MMD-MA and UnionCom comparisons \n", + "# also used unit (l-2) normalization" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "##### Set hyperparameters of the algorithm:" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "It. |Err \n", + "-------------------\n", + " 0|1.425627e-03|\n", + " 10|1.977665e-04|\n", + " 20|7.060937e-05|\n", + " 30|8.834972e-06|\n", + " 40|7.413147e-07|\n", + " 50|6.468501e-08|\n", + " 60|5.819180e-09|\n", + " 70|5.293865e-10|\n" + ] + } + ], + "source": [ + "# Set hyperparameters of the algorithm:\n", + "k=24\n", + "e=0.0038 \n", + "# Other values to try for very similar alignment results:\n", + "# k=25 with e=0.0018, 0.00182, 0.00185, 0.002, or k=30 with \n", + "# Combinations from a range of k=20 to k=30 and e=0.0015 to e= 0.0040 (and aounrd 0.01 for k=30 setting) seem to \n", + "# yield the best results on this dataset, so if you'd like to perform hyperparameter tuning, \n", + "# you can set a grid between these values.\n", + "X_new,y_new= scot(X, y, k, e)" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Average FOSCTTM score for this alignment is: 0.19851720567368114\n" + ] + } + ], + "source": [ + "fracs=evals.calc_domainAveraged_FOSCTTM(X_new, y_new)\n", + "print(\"Average FOSCTTM score for this alignment is: \", np.mean(fracs))" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "<Figure size 432x288 with 1 Axes>" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "import matplotlib.pyplot as plt\n", + "legend_label=\"SCOT alignment FOSCTTM \\n average value: \"+str(np.mean(fracs))\n", + "plt.plot(np.arange(len(fracs)), np.sort(fracs), \"r--\", label=legend_label)\n", + "plt.legend()\n", + "plt.xlabel(\"Cells\")\n", + "plt.ylabel(\"Sorted FOSCTTM\")\n", + "plt.show()" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.5" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +}