|
a |
|
b/README.md |
|
|
1 |
# scAI: a single cell Aggregation and Integration method for analyzing single cell multi-omics data |
|
|
2 |
|
|
|
3 |
- scAI is an unsupervised approach for integrative analysis of gene expression and chromatin accessibility or DNA methylation proflies measured in the same individual cells. |
|
|
4 |
- scAI infers a set of biologically relevant factors, which enable various downstream analyses, including the identification of cell clusters, cluster-specific markers and regulatory relationships. |
|
|
5 |
- scAI provides an intuitive way to visualize features (i.e., genes and loci) alongside the cells in two dimensions. |
|
|
6 |
- scAI aggegrates chromatin profiles of similar cells in an unsupervised and iterative manner, which opens up new avenues for analyzing extremely sparse, binary scATAC-seq data. |
|
|
7 |
|
|
|
8 |
Once the single cell multi-omics data are decomposed into multiple biologically relevant factors, the package provides functionality for further data exploration, analysis, and visualization. Users can: |
|
|
9 |
|
|
|
10 |
- Visualize the latent biological patterns of the multi-omics data |
|
|
11 |
- Visualize both genes and loci alongside cells onto the same two-dimensional space |
|
|
12 |
- Identify cell clusters from the inferred joint cell loading matrix and cluster-specific markers |
|
|
13 |
- Visualize clusters and gene expression in the low-dimensional space such as VscAI, t-SNE and UMAP |
|
|
14 |
- Infer regulatory relationships between cluster-specific chromatin region and marker genes |
|
|
15 |
|
|
|
16 |
 |
|
|
17 |
|
|
|
18 |
|
|
|
19 |
Check out [our paper (Suoqin Jin#, Lihua Zhang# & Qing Nie*, Genome Biology, 2020)](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-1932-8) for the detailed methods and applications. |
|
|
20 |
|
|
|
21 |
|
|
|
22 |
## Packages |
|
|
23 |
scAI has been implemented as both **R package** and **MATLAB package** under the license GPL-3. In each package, we provide example workflows that outline the key steps and unique features of scAI. The **MATLAB package and examples** are available [here](https://github.com/amsszlh/scAI). |
|
|
24 |
|
|
|
25 |
|
|
|
26 |
## Installation of R package |
|
|
27 |
|
|
|
28 |
### Install from Github using devtools |
|
|
29 |
|
|
|
30 |
``` |
|
|
31 |
devtools::install_github("sqjin/scAI") |
|
|
32 |
``` |
|
|
33 |
|
|
|
34 |
### Install from R source codes |
|
|
35 |
Download source codes [here](https://github.com/sqjin/scAI/blob/master/scAI_1.0.0.tar.gz) and type (in R) |
|
|
36 |
``` |
|
|
37 |
install.packages(path_to_file, type = 'source', rep = NULL) # The path_to_file would represent the full path and file name |
|
|
38 |
``` |
|
|
39 |
This [website](https://kbroman.org/pkg_primer/pages/build.html) shows other ways for building and installing an R package. |
|
|
40 |
|
|
|
41 |
## Examples and Walkthroughs |
|
|
42 |
|
|
|
43 |
All the R markdown used to generate the walkthroughs can be found under the /examples directory. |
|
|
44 |
|
|
|
45 |
- Simulated single cell RNA-seq and ATAC-seq data [(Walkthrough)](https://htmlpreview.github.io/?https://github.com/sqjin/scAI/blob/master/examples/walkthrough_simulation.html): This simulated data were generated based on bulk RNA-seq and DNase-seq profiles from the same sample using MOSim package. |
|
|
46 |
- Simulated single cell RNA-seq and ATAC-seq dataset 8 [(Walkthrough)](https://htmlpreview.github.io/?https://github.com/sqjin/scAI/blob/master/examples/walkthrough_simulation_dataset8.html): This simulated data set consists of five imbalanced cell clusters with five clusters in scRNA-seq data and three clusters in scATAC-seq data. |
|
|
47 |
- Paired single cell RNA-seq and ATAC-seq data of A549 cells [(Walkthrough)](https://htmlpreview.github.io/?https://github.com/sqjin/scAI/blob/master/examples/walkthrough_A549dataset.html): This data describes lung adenocarcinoma-derived A549 cells after 0, 1 and 3 hours of 100 nM dexamethasone treatment. |
|
|
48 |
- Paired single-cell RNA-seq and single-cell methylation data of mESC [(Walkthrough)](https://htmlpreview.github.io/?https://github.com/sqjin/scAI/blob/master/examples/walkthrough_mESC_dataset.html): This data describes the differentiation of mouse embryonic stem cells (mESC). |
|
|
49 |
- Paired single cell RNA-seq and ATAC-seq data of Kidney cells [(Walkthrough)](https://htmlpreview.github.io/?https://github.com/sqjin/scAI/blob/master/examples/walkthrough_Kidneydataset.html): This data describes various subpopulations of Kidney cells, including scRNA-seq and scATAC-seq data of 8837 co-assayed cells. |
|
|
50 |
|
|
|
51 |
## Suggestions for speeding up on large-scale datasets |
|
|
52 |
|
|
|
53 |
### Using the Python implementation of scAI model |
|
|
54 |
``` |
|
|
55 |
object <- run_scAI(object, K, do.fast = TRUE) |
|
|
56 |
``` |
|
|
57 |
|
|
|
58 |
### Feature selection |
|
|
59 |
|
|
|
60 |
Feature selection can reduce the running time in both scAI model and downstream analysis such as dimension reduction. |
|
|
61 |
|
|
|
62 |
- Using informative genes for scRNA-seq data: |
|
|
63 |
|
|
|
64 |
The most informative genes can be selected based on their average expression and Fano factor (see [our paper](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-020-1932-8) for details). |
|
|
65 |
``` |
|
|
66 |
object <- selectFeatures(object, assay = "RNA") |
|
|
67 |
object <- run_scAI(object, K, do.fast = TRUE, hvg.use1 = TRUE) |
|
|
68 |
``` |
|
|
69 |
- Using informative loci for scATAC-seq or single cell methylation data: |
|
|
70 |
|
|
|
71 |
Unlike scRNA-seq data, the largely binary nature of scATAC-seq data makes it challenging to perform ‘variable’ feature selection. One option is to select the nearby chromsome regions of the informative genes. |
|
|
72 |
``` |
|
|
73 |
object <- selectFeatures(object, assay = "RNA") |
|
|
74 |
loci.use <- searchGeneRegions(genes = object@var.features[[1]], species = "mouse") |
|
|
75 |
object@var.features[[2]] <- loci.use |
|
|
76 |
object <- run_scAI(object, K, do.fast = TRUE, hvg.use1 = TRUE, hvg.use2 = TRUE) |
|
|
77 |
``` |
|
|
78 |
|
|
|
79 |
Another option is to use only the top n% of features or remove features present in less that n cells. This method is used in [Signac](https://satijalab.org/signac/articles/pbmc_vignette.html). |
|
|
80 |
|
|
|
81 |
|
|
|
82 |
## Additional installation steps (possibly) |
|
|
83 |
|
|
|
84 |
- Please consider install [RcppEigen](https://github.com/RcppCore/RcppEigen) and [rfunctions](https://github.com/jaredhuling/rfunctions) if they are not automatically installed. |
|
|
85 |
``` |
|
|
86 |
if(!require(devtools)){ install.packages("devtools")} |
|
|
87 |
install.packages("RcppEigen") |
|
|
88 |
devtools::install_github("jaredhuling/rfunctions") |
|
|
89 |
``` |
|
|
90 |
**Troubleshooting**: Installing RcppEigen and rfunctions on R>=3.5 requires Clang >= 6 and gfortran-6.1. For MacOS, it's recommended to follow guidance on the official R page [here](https://cloud.r-project.org/bin/macosx/tools/) OR the [post](https://thecoatlessprofessor.com/programming/r-compiler-tools-for-rcpp-on-macos-before-r-3.6.0/). For Windows, please ensure that [Rtools](https://cran.r-project.org/bin/windows/Rtools/) is installed. |
|
|
91 |
|
|
|
92 |
|
|
|
93 |
- Install other dependencies |
|
|
94 |
|
|
|
95 |
scAI provides functionality for further data exploration, analysis, and visualization. A couple of excellent packages need to be installed. |
|
|
96 |
``` |
|
|
97 |
library(devtools) |
|
|
98 |
install_github('linxihui/NNLM') |
|
|
99 |
install_github("yanwu2014/swne") |
|
|
100 |
install_github("jokergoo/ComplexHeatmap") |
|
|
101 |
``` |
|
|
102 |
- Install Leiden algorithm for identifying cell clusters: pip install leidenalg. Please check [here](https://github.com/vtraag/leidenalg) if there is any trouble. |
|
|
103 |
|
|
|
104 |
- Install UMAP and FIt-SNE for faster dimension reduction in `reducedDims` |
|
|
105 |
|
|
|
106 |
Using UMAP and FIt-SNE is recommended for computational efficiency when using `reducedDims` on very large datasets. |
|
|
107 |
|
|
|
108 |
-- install UMAP Python package: pip install umap-learn. Please check [here](https://github.com/lmcinnes/umap) if there is any trouble. |
|
|
109 |
|
|
|
110 |
-- install FIt-SNE R package: Installing and compiling the necessary software requires the use of FIt-SNE and FFTW. For detailed instructions of installation, please visit this [page](https://github.com/KlugerLab/FIt-SNE). |
|
|
111 |
|
|
|
112 |
|
|
|
113 |
### Troubleshooting on the R Compiler Tools for Rcpp on macOS |
|
|
114 |
If you get the error "clang: error: unsupported option '-fopenmp'" when installing R package, please consider the configuration in ~/.R/Makevars and see this [post](https://thecoatlessprofessor.com/programming/r-compiler-tools-for-rcpp-on-macos-before-r-3.6.0/) for detailed configuration. In addition, you may can also reinstall your R because -fopenmp option is usually added by R automatically if openmp is available. |
|
|
115 |
|
|
|
116 |
If you are using macOS Mojave Version (10.14) and you might get the error "/usr/local/clang6/bin/../include/c++/v1/math.h:301:15: fatal error: 'math.h' file not found", please check the [post](https://github.com/RcppCore/Rcpp/issues/922). This error can be solved if running the following on the terminal: |
|
|
117 |
|
|
|
118 |
``` |
|
|
119 |
sudo installer -pkg \ |
|
|
120 |
/Library/Developer/CommandLineTools/Packages/macOS_SDK_headers_for_macOS_10.14.pkg \ |
|
|
121 |
-target / |
|
|
122 |
``` |
|
|
123 |
|
|
|
124 |
## Help |
|
|
125 |
If you have any problems, comments or suggestions, please contact us at Suoqin Jin (suoqin.jin@uci.edu) or Lihua Zhang (lihuaz1@uci.edu). |
|
|
126 |
|
|
|
127 |
## How should I cite scAI? |
|
|
128 |
Jin, S., Zhang, L. & Nie, Q. scAI: an unsupervised approach for the integrative analysis of parallel single-cell transcriptomic and epigenomic profiles. Genome Biol 21, 25 (2020). https://doi.org/10.1186/s13059-020-1932-8 |
|
|
129 |
|