[090c8c]: / src / __pycache__ / scot.cpython-39.pyc

Download this file

94 lines (85 with data), 9.0 kB

a

¶Åña*ã@s\dZddlZddlZddlmZddlmZddlm	Z	ddl
mZmZGdd„de
ƒZdS)	aC
Authors: Pinar Demetci, Rebecca Santorella
Principal Investigator: Ritambhara Singh, Ph.D. from Brown University
12 February 2020
Updated: 27 November 2020
SCOT algorithm (version 1): Single Cell alignment using Optimal Transport
Correspondence: pinar_demetci@brown.edu, rebecca_santorella@brown.edu, ritambhara@brown.edu
éN)Údijkstra)Ú
csr_matrix)Úkneighbors_graph)ÚStandardScalerÚ	normalizec
@sneZdZdZdd„Zdd„Zddd	„Zddd
„Zdd„Zddd„Z	d dd„Z
d!dd„Zd"dd„Zd#dd„Z
dS)$ÚSCOTa›

	SCOT algorithm for unsupervised alignment of single-cell multi-omic data.
	https://www.biorxiv.org/content/10.1101/2020.04.28.066787v2 (original preprint)
	https://www.liebertpub.com/doi/full/10.1089/cmb.2021.0446 (Journal of Computational Biology publication through RECOMB 2021 conference)

	Input: domain1, domain2 in form of numpy arrays/matrices, where the rows correspond to samples and columns correspond to features.
	Returns: aligned domain 1, aligned domain 2 in form of numpy arrays/matrices projected on domain 1

	Example use:
	# Given two numpy matrices, domain1 and domain2, where the rows are cells and columns are different genomic features:
	scot= SCOT(domain1, domain2)
	aligned_domain1, aligned_domain2 = scot.align(k=20, e=1e-3)

	#If you can't pick the parameters k and e, you can try out our unsupervised self-tuning heuristic by running:
	scot= SCOT(domain1, domain2)
	aligned_domain1, aligned_domain2 = scot.align(selfTune=True)

	Required parameters:
	- k: Number of neighbors to be used when constructing kNN graphs. Default= min(min(n_1, n_2), 50), where n_i, for i=1,2 corresponds to the number of samples in the i^th domain.
	- e: Regularization constant for the entropic regularization term in entropic Gromov-Wasserstein optimal transport formulation. Default= 1e-3 
   
	Optional parameters:

	- normalize= Determines whether to normalize input data ahead of alignment. True or False (boolean parameter). Default = True.
	- norm= Determines what sort of normalization to run, "l2", "l1", "max", "zscore". Default="l2" 
	- mode: "connectivity" or "distance". Determines whether to use a connectivity graph (adjacency matrix of 1s/0s based on whether nodes are connected) or a distance graph (adjacency matrix entries weighted by distances between nodes). Default="connectivity"  
	- metric: Sets the metric to use while constructing nearest neighbor graphs. some possible choices are "correlation", "minkowski".  "correlation" is Pearson's correlation and "minkowski" is equivalent to Euclidean distance in its default form (). Default= "correlation". 
	- verbose: Prints loss while optimizing the optimal transport formulation. Default=True
	- XontoY: Determines the direction of barycentric projection. True or False (boolean parameter). If True, projects domain1 onto domain2. If False, projects domain2 onto domain1. Default=True.

	Note: If you want to specify the marginal distributions of the input domains and not use uniform distribution, please set the attributes p and q to the distributions of your choice (for domain 1, and 2, respectively) 
			after initializing a SCOT class instance and before running alignment and set init_marginals=False in .align() parameters
	cCsF||_||_d|_d|_d|_d|_d|_d|_d|_d|_	d|_
dS)N)ÚXÚyÚpÚqÚCxÚCyÚcouplingÚgwdistÚflagÚ	X_alignedÚ	y_aligned)ÚselfZdomain1Zdomain2©rúB/Users/pinardemetci/Documents/newSCOT/SCOT/examples/../src/scot.pyÚ__init__;sz
SCOT.__init__cCs,t |jjd¡|_t |jjd¡|_dS)Nr)ÚotZunifrÚshaper
r	r)rrrrÚinit_marginalsMszSCOT.init_marginalsÚl2TcCs‚|dvsJdƒ‚|dks |dkr&d}nd}|dkrXtƒ}| |j¡| |j¡|_|_n&t|j||dt|j||d|_|_dS)N)Úl1rÚmaxÚzscorezåNorm argument has to be either one of 'max', 'l1', 'l2' or 'zscore'. If you would like to perform another type of normalization, please give SCOT the normalize data and set the argument normalize=False when running the algorithm.Térr)ÚnormÚaxis)rZ
fit_transformrr	r)rrZbySampler ZscalerrrrrRs zSCOT.normalizeÚconnectivityÚcorrelationcCsZ|dvsJdƒ‚|dkrd}nd}t|j||||d|_t|j||||d|_|j|jfS)N)r!ZdistancezENorm argument has to be either one of 'connectivity', or 'distance'. r!TF)ÚmodeÚmetricÚinclude_self)rrÚXgraphr	Úygraph)rÚkr#r$r%rrrÚconstruct_graphaszSCOT.construct_graphcCstt|jƒddd}tt|jƒddd}t ||tjk¡}t ||tjk¡}||||k<||||k<|| ¡|_|| ¡|_	|j|j	fS)NF)ZcsgraphZdirectedZreturn_predecessors)
rrr&r'ÚnpZnanmaxÚinfrrr
)rZX_shortestPathZy_shortestPathZX_maxZy_maxrrrÚinit_distancesmszSCOT.init_distancesc
Cs–tjj|j|j|j|jd|d|d\|_}|d|_t	 
|j¡ ¡s‚t	 |jjdd¡s‚t	 |jjdd¡s‚tt|jƒƒdkrŠd	|_
nd|_
|jS)
NZsquare_lossT)Zloss_funÚepsilonÚlogÚverboseZgw_distr©r rgffffffî?F)rZgromovZentropic_gromov_wassersteinrr
r
rrrr*ZisnanÚanyÚsumr)rÚer/r.rrrÚfind_correspondences~s*
NzSCOT.find_correspondencescCsŒ|r@|j|_tj|jdd}t |j|j¡|dd…df|_n@|j|_tj|jdd}t t |j¡|j¡|dd…df|_|j|jfS)Nrr0)	r	rr*r2rÚmatmulrrZ	transpose)rÚXontoYZweightsrrrÚbarycentric_projectionŠs$(zSCOT.barycentric_projectionNçü©ñÒMbP?Fc
CsÊ|r|j|d|
r| ¡|	r.| ¡\}}n‚|dkrdtt|jjddƒt|jjddƒfdƒ}|j|ddd| 	¡|j
||d|jd	kr td
ƒdS|j
|d\}}|||_|_|j|jfS)N©rrgš™™™™™É?é2r!r"©r#r$©r3r/Fz›CONVERGENCE ERROR: Optimization procedure runs into numerical errors with the hyperparameters specified. Please try aligning with higher values of epsilon.)r6)rrÚunsupervised_scotÚminÚintrrr	r)r,r4rÚprintr7rr)
rr(r3r#r$r/rrr6ZselfTunerrrrrrÚalign—s".
z
SCOT.alignc	Cs:|r|j|d|r| ¡t|ƒt|ƒ}	t |	¡}
t |	¡}t |	¡}d}
d}d\}}d\}}|D]¬}|j|||d| ¡|D]Š}t|d|	ƒtd|d|ƒ|j|dd	|j	r†|rà||
|<|||<|j
||<t|j
ƒ|j
|
kr| ¡\}}|j
}
||}}|d}q†qf|r(||||
|fS|||
||fSd
S)zÃ
		Performs a hyperparameter sweep for given values of k and epsilon
		Default: return the parameters corresponding to the lowest GW distance
		(Optional): return all k, epsilon, and GW values
		r9r)NNr;ú/zAligning k: z and e: Fr<N)rrÚlenr*Zzerosr)r,r@r4rrr7)rÚksÚesÚ
all_valuesr#r$rrrÚtotalZk_sweepZe_sweepZgw_sweepZgminÚcounterrrÚe_bestÚk_bestr(r3rrrÚsearch_scot±sB





zSCOT.search_scotcCs®t|jjd|jjdƒ}t|ddƒ}d}d}t dd|¡}|dkrVt dd	|¡}nt |d|d
|¡}| t¡}|j	||d||dd\}	}
}}}
t
d
||
fƒ|	|
fS)z|
		Unsupervised hyperparameter tuning algorithm to find an alignment
		by using the GW distance as a measure of alignment
		rér:ééÿÿÿÿéýÿÿÿéúéédéF)rFrrrzAlignment completed. Hyperparameters selected from the unsupervised hyperparameter sweep are: %d for number of neighbors k and %f for epsilon)r>rrr	r*ZlogspaceZlinspaceZastyper?rKr@)rrrÚnZk_startZnum_epsZnum_krErDrrZg_bestrJrIrrrr=ês
 zSCOT.unsupervised_scot)rT)r!r")T)T)
Nr8r!r"TTrTFT)Fr!r"TrT)Fr)Ú__name__Ú
__module__Ú__qualname__Ú__doc__rrrr)r,r4r7rArKr=rrrrrs"





9r)rXZnumpyr*rZscipy.sparse.csgraphrZscipy.sparserZsklearn.neighborsrZsklearn.preprocessingrrÚobjectrrrrrÚ<module>s