--- a +++ b/vignettes/USCSXenaTools.Rmd @@ -0,0 +1,267 @@ +--- +title: "UCSCXenaTools: an R package for Accessing Genomics Data from UCSC Xena platform, from Cancer Multi-omics to Single-cell RNA-seq" +author: "Shixiang Wang \\ + + ShanghaiTech University" +date: "`r Sys.Date()`" + +output: + prettydoc::html_pretty: + toc: true + theme: cayman + highlight: github + pdf_document: + toc: true +vignette: > + %\VignetteIndexEntry{Basic usage} + %\VignetteEngine{knitr::rmarkdown} + %\usepackage[utf8]{inputenc} +--- + +```{r setup, include = FALSE} +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>" +) +``` + + +**UCSCXenaTools** is an R package for accessing genomics data from UCSC Xena platform, +from cancer multi-omics to single-cell RNA-seq. +Public omics data from UCSC Xena are supported through [**multiple turn-key Xena Hubs**](https://xenabrowser.net/datapages/), which are a collection of UCSC-hosted public databases such as TCGA, ICGC, TARGET, GTEx, CCLE, and others. Databases are normalized so they can be combined, linked, filtered, explored and downloaded. + +**Who is the target audience and what are scientific applications of this package?** + +* Target Audience: cancer and clinical researchers, bioinformaticians +* Applications: genomic and clinical analyses + +## Installation + +Install stable release from CRAN with: + +```{r, eval=FALSE} +install.packages("UCSCXenaTools") +``` + +You can also install devel version of **UCSCXenaTools** from github with: + +```{r gh-installation, eval = FALSE} +# install.packages("remotes") +remotes::install_github("ropensci/UCSCXenaTools") +``` + +If you want to build vignette in local, please add two options: + +```{r, eval=FALSE} +remotes::install_github("ropensci/UCSCXenaTools", build_vignettes = TRUE, dependencies = TRUE) +``` + +The minimum versions to run the vignette is `1.2.4`. +[GitHub Issue](https://github.com/ropensci/UCSCXenaTools/issues) is a place for discussing any problem. + +## Data Hub List + +All datasets are available at <https://xenabrowser.net/datapages/>. + +Currently, **UCSCXenaTools** supports the following data hubs of UCSC Xena. + +* UCSC Public Hub: <https://ucscpublic.xenahubs.net/> +* TCGA Hub: <https://tcga.xenahubs.net/> +* GDC Xena Hub: <https://gdc.xenahubs.net/> +* ICGC Xena Hub: <https://icgc.xenahubs.net/> +* Pan-Cancer Atlas Hub: <https://pancanatlas.xenahubs.net/> +* UCSC Toil RNAseq Recompute Compendium Hub: <https://toil.xenahubs.net/> +* PCAWG Xena Hub: <https://pcawg.xenahubs.net/> +* ATAC-seq Hub: <https://atacseq.xenahubs.net/> +* Singel Cell Xena Hub: <https://singlecellnew.xenahubs.net/> +* Kids First Xena Hub: <https://kidsfirst.xenahubs.net/> +* Treehouse Xena Hub: <https://xena.treehouse.gi.ucsc.edu:443/> + +Users can update dataset list from the newest version of UCSC Xena by hand with `XenaDataUpdate()` function, followed +by restarting R and `library(UCSCXenaTools)`. + +If any url of data hub is changed or a new data hub is online, please remind me by emailing to <w_shixiang@163.com> or [opening an issue on GitHub](https://github.com/ropensci/UCSCXenaTools/issues). + + +## Usage + +Download UCSC Xena datasets and load them into R by **UCSCXenaTools** is a workflow with `generate`, `filter`, `query`, `download` and `prepare` 5 steps, which are implemented as `XenaGenerate`, `XenaFilter`, `XenaQuery`, `XenaDownload` and `XenaPrepare` functions, respectively. They are very clear and easy to use and combine with other packages like `dplyr`. + +To show the basic usage of **UCSCXenaTools**, we will download clinical data of LUNG, LUAD, LUSC from TCGA (hg19 version) data hub. + +### XenaData data.frame + +**UCSCXenaTools** uses a `data.frame` object (built in package) `XenaData` to generate an instance of `XenaHub` class, which records information of all datasets of UCSC Xena Data Hubs. + +You can load `XenaData` after loading `UCSCXenaTools` into R. + +```{r} +library(UCSCXenaTools) +data(XenaData) + +head(XenaData) +``` + +### Workflow + +Select datasets. + +```{r} +# The options in XenaFilter function support Regular Expression +XenaGenerate(subset = XenaHostNames=="tcgaHub") %>% + XenaFilter(filterDatasets = "clinical") %>% + XenaFilter(filterDatasets = "LUAD|LUSC|LUNG") -> df_todo + +df_todo +``` + +Sometimes we only know some keywords, `XenaScan()` can be used to scan all rows to detect if +the keywords exist in `XenaData`. + + +```{r} +x1 = XenaScan(pattern = 'Blood') +x2 = XenaScan(pattern = 'LUNG', ignore.case = FALSE) + +x1 %>% + XenaGenerate() +x2 %>% + XenaGenerate() +``` + +Query and download. + +```{r} +XenaQuery(df_todo) %>% + XenaDownload() -> xe_download +``` + +Prepare data into R for analysis. + +```{r} +cli = XenaPrepare(xe_download) +class(cli) +names(cli) +``` + +### Browse datasets + +Create two XenaHub objects: + +* `to_browse` - a XenaHub object containing a cohort and a dataset. +* `to_browse2` - a XenaHub object containing 2 cohorts and 2 datasets. + +```{r} +XenaGenerate(subset = XenaHostNames=="tcgaHub") %>% + XenaFilter(filterDatasets = "clinical") %>% + XenaFilter(filterDatasets = "LUAD") -> to_browse + +to_browse + +XenaGenerate(subset = XenaHostNames=="tcgaHub") %>% + XenaFilter(filterDatasets = "clinical") %>% + XenaFilter(filterDatasets = "LUAD|LUSC") -> to_browse2 + +to_browse2 +``` + +`XenaBrowse()` function can be used to browse dataset/cohort links using your default web browser. +At default, this function limits one dataset/cohort for preventing user to open too many links at once. + +```{r,eval=FALSE} +# This will open you web browser +XenaBrowse(to_browse) + +XenaBrowse(to_browse, type = "cohort") +``` + +```{r, error=TRUE} +# This will throw error +XenaBrowse(to_browse2) + +XenaBrowse(to_browse2, type = "cohort") +``` + +When you make sure you want to open multiple links, you can set `multiple` option to `TRUE`. + +```{r, eval=FALSE} +XenaBrowse(to_browse2, multiple = TRUE) +XenaBrowse(to_browse2, type = "cohort", multiple = TRUE) +``` + +## More usages + +The core functionality has been described above. +I write more usages about this package in my website but not here +because sometimes package check will fail due to internet problem. + +- [Introduction and basic usage of UCSCXenaTools](https://shixiangwang.github.io/home/en/tools/ucscxenatools-intro/) - [PDF](https://shixiangwang.github.io/home/en/tools/ucscxenatools-intro.pdf) +- [APIs of UCSCXenaTools](https://shixiangwang.github.io/home/en/tools/ucscxenatools-api/) - [PDF](https://shixiangwang.github.io/home/en/tools/ucscxenatools-api.pdf) + +Read [Obtain RNAseq Values for a Specific Gene in Xena Database](https://shixiangwang.github.io/home/en/tools/ucscxenatools-single-gene/) to see how to get values for single gene. A use case for survival analysis based on single gene expression has been published on rOpenSci, please read +[UCSCXenaTools: Retrieve Gene Expression and Clinical Information from UCSC Xena for Survival Analysis](https://ropensci.org/technotes/2019/09/06/ucscxenatools-surv/). + +## QA + +### How to resume file from breakpoint + +Thanks to the UCSC Xena team, the new feature 'resume from breakpoint' is added and +can be done by **XenaDownload()** with the `method` and `extra` flags specified. + +Of note, the corresponding `wget` or `curl` command must be installed by your OS +and can be found by R. + +The folliwng code gives a test example, the data can be viewed on [web page](https://xenabrowser.net/datapages/?dataset=TcgaTargetGtex_expected_count&host=https%3A%2F%2Ftoil.xenahubs.net&removeHub=https%3A%2F%2Fxena.treehouse.gi.ucsc.edu%3A443). + +```r +library(UCSCXenaTools) +xe = XenaGenerate(subset = XenaDatasets == "TcgaTargetGtex_expected_count") +xe +xq = XenaQuery(xe) +# You cannot resume from breakpoint in default mode +XenaDownload(xq, destdir = "~/test/", force = TRUE) +# You can do it with 'curl' command +XenaDownload(xq, destdir = "~/test/", method = "curl", extra = "-C -", force = TRUE) +# You can do it with 'wget' command +XenaDownload(xq, destdir = "~/test/", method = "wget", extra = "-c", force = TRUE) +``` + +## Citation + +Cite me by the following paper. + +``` +Wang et al., (2019). The UCSCXenaTools R package: a toolkit for accessing genomics data + from UCSC Xena platform, from cancer multi-omics to single-cell RNA-seq. + Journal of Open Source Software, 4(40), 1627, https://doi.org/10.21105/joss.01627 + +# For BibTex + +@article{Wang2019UCSCXenaTools, + journal = {Journal of Open Source Software}, + doi = {10.21105/joss.01627}, + issn = {2475-9066}, + number = {40}, + publisher = {The Open Journal}, + title = {The UCSCXenaTools R package: a toolkit for accessing genomics data from UCSC Xena platform, from cancer multi-omics to single-cell RNA-seq}, + url = {http://dx.doi.org/10.21105/joss.01627}, + volume = {4}, + author = {Wang, Shixiang and Liu, Xuesong}, + pages = {1627}, + date = {2019-08-05}, + year = {2019}, + month = {8}, + day = {5}, +} +``` + +Cite UCSC Xena by the following paper. + +``` +Goldman, Mary, et al. "The UCSC Xena Platform for cancer genomics data + visualization and interpretation." BioRxiv (2019): 326470. +``` + +## Acknowledgments + +This package is based on [XenaR](https://github.com/mtmorgan/XenaR), thanks [Martin Morgan](https://github.com/mtmorgan) for his work.