--- a/README.md +++ b/README.md @@ -1,72 +1,72 @@ -[](https://pypi.org/project/sc-libra/) -[](https://pepy.tech/project/sc_libra) -[](https://sc-libra.readthedocs.io/en/latest/) -[](https://doi.org/10.1101/2021.01.27.428400) -[](http://dx.doi.org/10.6084/m9.figshare.19466246) - -LIBRA - Machine Translation between paired <img src="gaf/figures/LIBRA_icon_2.png" width="181px" align="right" /> -Single-Cell Multi-Omics Data -=========== -This repository contains the [LIBRA code](https://github.com/TranslationalBioinformaticsUnit/LIBRA/blob/main/code_snapshots/) and [online data](#datasets) used for Single-cell multi-omics integration and prediction analysis employed on [LIBRA manuscript](https://www.biorxiv.org/content/10.1101/2021.01.27.428400v2). [Libra metrics](https://github.com/TranslationalBioinformaticsUnit/LIBRA/blob/main/code_snapshots/R/LIBRA_code/) are also available for quantifying outputs quality as well as novel PPJI preservation measurement. [Seurat code](https://github.com/TranslationalBioinformaticsUnit/LIBRA/blob/main/code_snapshots/R/Seurat_code/) employed to analyze LIBRA input omics as well as for clustering and visualization pipelines are providen. - -The Python package [sc-Libra](https://pypi.org/project/sc-libra/), has been developed with the aim of extending and summarizing the developer code used on the paper to a user-friendly version and is freely available in the PyPI repository. Read online package [documentation](https://sc-libra.readthedocs.io/en/latest/) for detailled description and guidelines. - -- [Summary](#summary) -- [Installation](#installation) -- [Datasets](#datasets) -- [Usage](#usage) -- [Material of interest](#material-of-interest) - -# Summary -LIBRA is a deep learning model that is designed for Single-cell multi-omics integration and prediction. LIBRA performs this by using an unbalance Autoencoder which learns a shared low-dimensional embedding from both experiment omics, combining each sample's uniqueness for generating a enriched representation of integrated data respect to the original experiment independent data. This tool has been first developed in [R code](https://github.com/TranslationalBioinformaticsUnit/LIBRA/blob/main/code_snapshots/R/LIBRA_code/), a code snapshot is providen for R users. Next, adaptative LIBRA (aLIBRA) tool has been develop for paralellize training of LIBRA models using a grid structure for selecting optimal hyperparameters in a automatic way excluding the requirement of doing this by users saving considerable time. Snapshot code is providen in [Python code](https://github.com/TranslationalBioinformaticsUnit/LIBRA/blob/main/code_snapshots/Python/LIBRA_fine_tune_code/) for conceptual understanding. - -As a result from these raw developer-codes provided, [sc-Libra](https://pypi.org/project/sc-libra/) package is provided as a built-in resource to perform the pipeline propossed. - -For further details, please refer to the [online manuscript](https://www.biorxiv.org/content/10.1101/2021.01.27.428400v2) currently at biorxiv repository (will be updated asap). - -# Installation - -To run sc-Libra pipeline the following settings are required: -- Install Python **>=3.7.0**. -- Install R **>=3.5.2**. -- Install sc-libra python package: - ``` - $ pip install sc_libra - ``` - -For stepwise guide follow the online [documentation](https://sc-libra.readthedocs.io/en/latest/). - -# Datasets -Find [Neurips](https://openproblems.bio/neurips_2021/) provided dataset for LIBRA testing at figsahre repository to be downloaded [here](https://figshare.com/s/d7ad0c6b8285e75de40f). - -Following datasets consist only on the sparse versions without cell/feature identity, go to corresponding autor references for original datasets. -| LIBRA name | GSE link | Modalities | Technology | Genomic ref used | Download sparse matrix | -| :---: | :---: | :---: | :---: | :---: | :---: | -| DataSet1 | [GSE126074](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE126074) | scRNAseq + scATACseq | SNARE-seq | [Mus_musculus.GRCm38 Ver: 3.0.0](https://support.10xgenomics.com/single-cell-gene-expression/software/release-notes/build#) | [RNA](https://figshare.com/s/c9b87f4ac1d1c030e128) and [ATAC](https://figshare.com/s/9ff9ea93a2108478bb36) | -| DataSet2 | [GSE128639](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi) | scRNAseq + scADT | CITE-seq | [Homo_sapiens.GRCh38 Ver: 3.0.0](https://support.10xgenomics.com/single-cell-gene-expression/software/release-notes/build#) | [RNA](https://figshare.com/s/5f5cfa6fda4ae3512c0d) and [ADT](https://figshare.com/s/5e34cd80455398855ad8) | -| DataSet3 | [GSE130399](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi) | scRNAseq + scATACseq | Paired-seq | [Mus_musculus.GRCm38 Ver: 3.0.0](https://support.10xgenomics.com/single-cell-gene-expression/software/release-notes/build#) | [RNA](https://figshare.com/s/a1f4a5ef0735d1b4167d) and [ATAC](https://figshare.com/s/80d9b9d84ada526668a6) | -| DataSet4 | [GSE140203](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi) | scRNAseq + scATACseq | SHARE-seq | [Mus_musculus.GRCm38 Ver: 3.0.0](https://support.10xgenomics.com/single-cell-gene-expression/software/release-notes/build#) | [RNA](https://figshare.com/s/71312a335649b04972b8) and [ATAC](https://figshare.com/s/0b581450cd6e1f8fb64c) | -| DataSet5 | [10X Genomics](https://support.10xgenomics.com/single-cell-multiome-atac-gex/datasets/1.0.0/pbmc_granulocyte_sorted_10k) | scRNAseq + scATACseq | 10X multiome | [Homo_sapiens.GRCh38 Ver: 3.0.0](https://support.10xgenomics.com/single-cell-gene-expression/software/release-notes/build#) | [RNA](https://figshare.com/s/90b237227f0cc07d075d) and [ATAC](https://figshare.com/s/4086bce6032f6a206a13) | -| DataSet6 | [GSE194122](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi) | scRNAseq + scATACseq | 10X multiome | [Homo_sapiens.GRCh38 Ver: 3.0.0](https://support.10xgenomics.com/single-cell-gene-expression/software/release-notes/build#) | [RNA](https://figshare.com/s/134562c3ec74a3a50c84) and [ATAC](https://figshare.com/s/378a630ec9c6ddadf4f5) | -| DataSet7 | [GSE194122](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi) | scRNAseq + scADT | CITE-seq | [Homo_sapiens.GRCh38 Ver: 3.0.0](https://support.10xgenomics.com/single-cell-gene-expression/software/release-notes/build#) | [RNA](https://figshare.com/s/41bdbfe7479e9729c800) and [ADT](https://figshare.com/s/975cd8a5bbc57c8d2c8c) | -| DataSet8 | [GSE109262](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi) | scRNAseq + scATACseq | scNMT-seq | [Mus_musculus.GRCm38 Ver: 3.0.0](https://support.10xgenomics.com/single-cell-gene-expression/software/release-notes/build#) | [RNA](https://figshare.com/s/4a158e6d243bcd45a171) and [ATAC](https://figshare.com/s/f13136b52f3b387d1a66) | - -# Usage - -- Easiest way of running LIBRA analysis is though [sc-Libra](https://pypi.org/project/sc-libra/) python package. -- Package [documentation](https://sc-libra.readthedocs.io/en/latest/) is online available using "Read the Docs" platform. - -# Material of interest - -### LIBRA benchmarking comparison: -For validating LIBRA performance we compared it against other: - -- Integration performance compared to - published/available: [BABEL](https://github.com/wukevin/babel). - -- Prediction performance compared to - published/available: [Seurat3](https://satijalab.org/seurat/articles/integration_mapping.html), [Seurat4](https://github.com/satijalab/seurat), [MOFA+](https://biofam.github.io/MOFA2/index.html), [totalVI](https://github.com/YosefLab/scvi-tools), [BABEL](https://github.com/wukevin/babel), [multiVI](https://github.com/scverse/scvi-tutorials/blob/master/MultiVI_tutorial.ipynb) and [multigrate](https://github.com/theislab/multigrate). - -**Further details are provided at supplementary material added at [LIBRA manuscript](https://www.biorxiv.org/content/10.1101/2021.01.27.428400v1).** - -### LIBRA visual workflow: - - +[](https://pypi.org/project/sc-libra/) +[](https://pepy.tech/project/sc_libra) +[](https://sc-libra.readthedocs.io/en/latest/) +[](https://doi.org/10.1101/2021.01.27.428400) +[](http://dx.doi.org/10.6084/m9.figshare.19466246) + +LIBRA - Machine Translation between paired <img src="https://github.com/TranslationalBioinformaticsUnit/LIBRA/blob/main/gaf/figures/LIBRA_icon_2.png?raw=true" width="181px" align="right" /> +Single-Cell Multi-Omics Data +=========== +This repository contains the [LIBRA code](https://github.com/TranslationalBioinformaticsUnit/LIBRA/blob/main/code_snapshots/) and [online data](#datasets) used for Single-cell multi-omics integration and prediction analysis employed on [LIBRA manuscript](https://www.biorxiv.org/content/10.1101/2021.01.27.428400v2). [Libra metrics](https://github.com/TranslationalBioinformaticsUnit/LIBRA/blob/main/code_snapshots/R/LIBRA_code/) are also available for quantifying outputs quality as well as novel PPJI preservation measurement. [Seurat code](https://github.com/TranslationalBioinformaticsUnit/LIBRA/blob/main/code_snapshots/R/Seurat_code/) employed to analyze LIBRA input omics as well as for clustering and visualization pipelines are providen. + +The Python package [sc-Libra](https://pypi.org/project/sc-libra/), has been developed with the aim of extending and summarizing the developer code used on the paper to a user-friendly version and is freely available in the PyPI repository. Read online package [documentation](https://sc-libra.readthedocs.io/en/latest/) for detailled description and guidelines. + +- [Summary](#summary) +- [Installation](#installation) +- [Datasets](#datasets) +- [Usage](#usage) +- [Material of interest](#material-of-interest) + +# Summary +LIBRA is a deep learning model that is designed for Single-cell multi-omics integration and prediction. LIBRA performs this by using an unbalance Autoencoder which learns a shared low-dimensional embedding from both experiment omics, combining each sample's uniqueness for generating a enriched representation of integrated data respect to the original experiment independent data. This tool has been first developed in [R code](https://github.com/TranslationalBioinformaticsUnit/LIBRA/blob/main/code_snapshots/R/LIBRA_code/), a code snapshot is providen for R users. Next, adaptative LIBRA (aLIBRA) tool has been develop for paralellize training of LIBRA models using a grid structure for selecting optimal hyperparameters in a automatic way excluding the requirement of doing this by users saving considerable time. Snapshot code is providen in [Python code](https://github.com/TranslationalBioinformaticsUnit/LIBRA/blob/main/code_snapshots/Python/LIBRA_fine_tune_code/) for conceptual understanding. + +As a result from these raw developer-codes provided, [sc-Libra](https://pypi.org/project/sc-libra/) package is provided as a built-in resource to perform the pipeline propossed. + +For further details, please refer to the [online manuscript](https://www.biorxiv.org/content/10.1101/2021.01.27.428400v2) currently at biorxiv repository (will be updated asap). + +# Installation + +To run sc-Libra pipeline the following settings are required: +- Install Python **>=3.7.0**. +- Install R **>=3.5.2**. +- Install sc-libra python package: + ``` + $ pip install sc_libra + ``` + +For stepwise guide follow the online [documentation](https://sc-libra.readthedocs.io/en/latest/). + +# Datasets +Find [Neurips](https://openproblems.bio/neurips_2021/) provided dataset for LIBRA testing at figsahre repository to be downloaded [here](https://figshare.com/s/d7ad0c6b8285e75de40f). + +Following datasets consist only on the sparse versions without cell/feature identity, go to corresponding autor references for original datasets. +| LIBRA name | GSE link | Modalities | Technology | Genomic ref used | Download sparse matrix | +| :---: | :---: | :---: | :---: | :---: | :---: | +| DataSet1 | [GSE126074](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE126074) | scRNAseq + scATACseq | SNARE-seq | [Mus_musculus.GRCm38 Ver: 3.0.0](https://support.10xgenomics.com/single-cell-gene-expression/software/release-notes/build#) | [RNA](https://figshare.com/s/c9b87f4ac1d1c030e128) and [ATAC](https://figshare.com/s/9ff9ea93a2108478bb36) | +| DataSet2 | [GSE128639](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi) | scRNAseq + scADT | CITE-seq | [Homo_sapiens.GRCh38 Ver: 3.0.0](https://support.10xgenomics.com/single-cell-gene-expression/software/release-notes/build#) | [RNA](https://figshare.com/s/5f5cfa6fda4ae3512c0d) and [ADT](https://figshare.com/s/5e34cd80455398855ad8) | +| DataSet3 | [GSE130399](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi) | scRNAseq + scATACseq | Paired-seq | [Mus_musculus.GRCm38 Ver: 3.0.0](https://support.10xgenomics.com/single-cell-gene-expression/software/release-notes/build#) | [RNA](https://figshare.com/s/a1f4a5ef0735d1b4167d) and [ATAC](https://figshare.com/s/80d9b9d84ada526668a6) | +| DataSet4 | [GSE140203](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi) | scRNAseq + scATACseq | SHARE-seq | [Mus_musculus.GRCm38 Ver: 3.0.0](https://support.10xgenomics.com/single-cell-gene-expression/software/release-notes/build#) | [RNA](https://figshare.com/s/71312a335649b04972b8) and [ATAC](https://figshare.com/s/0b581450cd6e1f8fb64c) | +| DataSet5 | [10X Genomics](https://support.10xgenomics.com/single-cell-multiome-atac-gex/datasets/1.0.0/pbmc_granulocyte_sorted_10k) | scRNAseq + scATACseq | 10X multiome | [Homo_sapiens.GRCh38 Ver: 3.0.0](https://support.10xgenomics.com/single-cell-gene-expression/software/release-notes/build#) | [RNA](https://figshare.com/s/90b237227f0cc07d075d) and [ATAC](https://figshare.com/s/4086bce6032f6a206a13) | +| DataSet6 | [GSE194122](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi) | scRNAseq + scATACseq | 10X multiome | [Homo_sapiens.GRCh38 Ver: 3.0.0](https://support.10xgenomics.com/single-cell-gene-expression/software/release-notes/build#) | [RNA](https://figshare.com/s/134562c3ec74a3a50c84) and [ATAC](https://figshare.com/s/378a630ec9c6ddadf4f5) | +| DataSet7 | [GSE194122](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi) | scRNAseq + scADT | CITE-seq | [Homo_sapiens.GRCh38 Ver: 3.0.0](https://support.10xgenomics.com/single-cell-gene-expression/software/release-notes/build#) | [RNA](https://figshare.com/s/41bdbfe7479e9729c800) and [ADT](https://figshare.com/s/975cd8a5bbc57c8d2c8c) | +| DataSet8 | [GSE109262](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi) | scRNAseq + scATACseq | scNMT-seq | [Mus_musculus.GRCm38 Ver: 3.0.0](https://support.10xgenomics.com/single-cell-gene-expression/software/release-notes/build#) | [RNA](https://figshare.com/s/4a158e6d243bcd45a171) and [ATAC](https://figshare.com/s/f13136b52f3b387d1a66) | + +# Usage + +- Easiest way of running LIBRA analysis is though [sc-Libra](https://pypi.org/project/sc-libra/) python package. +- Package [documentation](https://sc-libra.readthedocs.io/en/latest/) is online available using "Read the Docs" platform. + +# Material of interest + +### LIBRA benchmarking comparison: +For validating LIBRA performance we compared it against other: + +- Integration performance compared to - published/available: [BABEL](https://github.com/wukevin/babel). + +- Prediction performance compared to - published/available: [Seurat3](https://satijalab.org/seurat/articles/integration_mapping.html), [Seurat4](https://github.com/satijalab/seurat), [MOFA+](https://biofam.github.io/MOFA2/index.html), [totalVI](https://github.com/YosefLab/scvi-tools), [BABEL](https://github.com/wukevin/babel), [multiVI](https://github.com/scverse/scvi-tutorials/blob/master/MultiVI_tutorial.ipynb) and [multigrate](https://github.com/theislab/multigrate). + +**Further details are provided at supplementary material added at [LIBRA manuscript](https://www.biorxiv.org/content/10.1101/2021.01.27.428400v1).** + +### LIBRA visual workflow: + +