|
a |
|
b/vignettes/USCSXenaTools.Rmd |
|
|
1 |
--- |
|
|
2 |
title: "UCSCXenaTools: an R package for Accessing Genomics Data from UCSC Xena platform, from Cancer Multi-omics to Single-cell RNA-seq" |
|
|
3 |
author: "Shixiang Wang \\ |
|
|
4 |
|
|
|
5 |
ShanghaiTech University" |
|
|
6 |
date: "`r Sys.Date()`" |
|
|
7 |
|
|
|
8 |
output: |
|
|
9 |
prettydoc::html_pretty: |
|
|
10 |
toc: true |
|
|
11 |
theme: cayman |
|
|
12 |
highlight: github |
|
|
13 |
pdf_document: |
|
|
14 |
toc: true |
|
|
15 |
vignette: > |
|
|
16 |
%\VignetteIndexEntry{Basic usage} |
|
|
17 |
%\VignetteEngine{knitr::rmarkdown} |
|
|
18 |
%\usepackage[utf8]{inputenc} |
|
|
19 |
--- |
|
|
20 |
|
|
|
21 |
```{r setup, include = FALSE} |
|
|
22 |
knitr::opts_chunk$set( |
|
|
23 |
collapse = TRUE, |
|
|
24 |
comment = "#>" |
|
|
25 |
) |
|
|
26 |
``` |
|
|
27 |
|
|
|
28 |
|
|
|
29 |
**UCSCXenaTools** is an R package for accessing genomics data from UCSC Xena platform, |
|
|
30 |
from cancer multi-omics to single-cell RNA-seq. |
|
|
31 |
Public omics data from UCSC Xena are supported through [**multiple turn-key Xena Hubs**](https://xenabrowser.net/datapages/), which are a collection of UCSC-hosted public databases such as TCGA, ICGC, TARGET, GTEx, CCLE, and others. Databases are normalized so they can be combined, linked, filtered, explored and downloaded. |
|
|
32 |
|
|
|
33 |
**Who is the target audience and what are scientific applications of this package?** |
|
|
34 |
|
|
|
35 |
* Target Audience: cancer and clinical researchers, bioinformaticians |
|
|
36 |
* Applications: genomic and clinical analyses |
|
|
37 |
|
|
|
38 |
## Installation |
|
|
39 |
|
|
|
40 |
Install stable release from CRAN with: |
|
|
41 |
|
|
|
42 |
```{r, eval=FALSE} |
|
|
43 |
install.packages("UCSCXenaTools") |
|
|
44 |
``` |
|
|
45 |
|
|
|
46 |
You can also install devel version of **UCSCXenaTools** from github with: |
|
|
47 |
|
|
|
48 |
```{r gh-installation, eval = FALSE} |
|
|
49 |
# install.packages("remotes") |
|
|
50 |
remotes::install_github("ropensci/UCSCXenaTools") |
|
|
51 |
``` |
|
|
52 |
|
|
|
53 |
If you want to build vignette in local, please add two options: |
|
|
54 |
|
|
|
55 |
```{r, eval=FALSE} |
|
|
56 |
remotes::install_github("ropensci/UCSCXenaTools", build_vignettes = TRUE, dependencies = TRUE) |
|
|
57 |
``` |
|
|
58 |
|
|
|
59 |
The minimum versions to run the vignette is `1.2.4`. |
|
|
60 |
[GitHub Issue](https://github.com/ropensci/UCSCXenaTools/issues) is a place for discussing any problem. |
|
|
61 |
|
|
|
62 |
## Data Hub List |
|
|
63 |
|
|
|
64 |
All datasets are available at <https://xenabrowser.net/datapages/>. |
|
|
65 |
|
|
|
66 |
Currently, **UCSCXenaTools** supports the following data hubs of UCSC Xena. |
|
|
67 |
|
|
|
68 |
* UCSC Public Hub: <https://ucscpublic.xenahubs.net/> |
|
|
69 |
* TCGA Hub: <https://tcga.xenahubs.net/> |
|
|
70 |
* GDC Xena Hub: <https://gdc.xenahubs.net/> |
|
|
71 |
* ICGC Xena Hub: <https://icgc.xenahubs.net/> |
|
|
72 |
* Pan-Cancer Atlas Hub: <https://pancanatlas.xenahubs.net/> |
|
|
73 |
* UCSC Toil RNAseq Recompute Compendium Hub: <https://toil.xenahubs.net/> |
|
|
74 |
* PCAWG Xena Hub: <https://pcawg.xenahubs.net/> |
|
|
75 |
* ATAC-seq Hub: <https://atacseq.xenahubs.net/> |
|
|
76 |
* Singel Cell Xena Hub: <https://singlecellnew.xenahubs.net/> |
|
|
77 |
* Kids First Xena Hub: <https://kidsfirst.xenahubs.net/> |
|
|
78 |
* Treehouse Xena Hub: <https://xena.treehouse.gi.ucsc.edu:443/> |
|
|
79 |
|
|
|
80 |
Users can update dataset list from the newest version of UCSC Xena by hand with `XenaDataUpdate()` function, followed |
|
|
81 |
by restarting R and `library(UCSCXenaTools)`. |
|
|
82 |
|
|
|
83 |
If any url of data hub is changed or a new data hub is online, please remind me by emailing to <w_shixiang@163.com> or [opening an issue on GitHub](https://github.com/ropensci/UCSCXenaTools/issues). |
|
|
84 |
|
|
|
85 |
|
|
|
86 |
## Usage |
|
|
87 |
|
|
|
88 |
Download UCSC Xena datasets and load them into R by **UCSCXenaTools** is a workflow with `generate`, `filter`, `query`, `download` and `prepare` 5 steps, which are implemented as `XenaGenerate`, `XenaFilter`, `XenaQuery`, `XenaDownload` and `XenaPrepare` functions, respectively. They are very clear and easy to use and combine with other packages like `dplyr`. |
|
|
89 |
|
|
|
90 |
To show the basic usage of **UCSCXenaTools**, we will download clinical data of LUNG, LUAD, LUSC from TCGA (hg19 version) data hub. |
|
|
91 |
|
|
|
92 |
### XenaData data.frame |
|
|
93 |
|
|
|
94 |
**UCSCXenaTools** uses a `data.frame` object (built in package) `XenaData` to generate an instance of `XenaHub` class, which records information of all datasets of UCSC Xena Data Hubs. |
|
|
95 |
|
|
|
96 |
You can load `XenaData` after loading `UCSCXenaTools` into R. |
|
|
97 |
|
|
|
98 |
```{r} |
|
|
99 |
library(UCSCXenaTools) |
|
|
100 |
data(XenaData) |
|
|
101 |
|
|
|
102 |
head(XenaData) |
|
|
103 |
``` |
|
|
104 |
|
|
|
105 |
### Workflow |
|
|
106 |
|
|
|
107 |
Select datasets. |
|
|
108 |
|
|
|
109 |
```{r} |
|
|
110 |
# The options in XenaFilter function support Regular Expression |
|
|
111 |
XenaGenerate(subset = XenaHostNames=="tcgaHub") %>% |
|
|
112 |
XenaFilter(filterDatasets = "clinical") %>% |
|
|
113 |
XenaFilter(filterDatasets = "LUAD|LUSC|LUNG") -> df_todo |
|
|
114 |
|
|
|
115 |
df_todo |
|
|
116 |
``` |
|
|
117 |
|
|
|
118 |
Sometimes we only know some keywords, `XenaScan()` can be used to scan all rows to detect if |
|
|
119 |
the keywords exist in `XenaData`. |
|
|
120 |
|
|
|
121 |
|
|
|
122 |
```{r} |
|
|
123 |
x1 = XenaScan(pattern = 'Blood') |
|
|
124 |
x2 = XenaScan(pattern = 'LUNG', ignore.case = FALSE) |
|
|
125 |
|
|
|
126 |
x1 %>% |
|
|
127 |
XenaGenerate() |
|
|
128 |
x2 %>% |
|
|
129 |
XenaGenerate() |
|
|
130 |
``` |
|
|
131 |
|
|
|
132 |
Query and download. |
|
|
133 |
|
|
|
134 |
```{r} |
|
|
135 |
XenaQuery(df_todo) %>% |
|
|
136 |
XenaDownload() -> xe_download |
|
|
137 |
``` |
|
|
138 |
|
|
|
139 |
Prepare data into R for analysis. |
|
|
140 |
|
|
|
141 |
```{r} |
|
|
142 |
cli = XenaPrepare(xe_download) |
|
|
143 |
class(cli) |
|
|
144 |
names(cli) |
|
|
145 |
``` |
|
|
146 |
|
|
|
147 |
### Browse datasets |
|
|
148 |
|
|
|
149 |
Create two XenaHub objects: |
|
|
150 |
|
|
|
151 |
* `to_browse` - a XenaHub object containing a cohort and a dataset. |
|
|
152 |
* `to_browse2` - a XenaHub object containing 2 cohorts and 2 datasets. |
|
|
153 |
|
|
|
154 |
```{r} |
|
|
155 |
XenaGenerate(subset = XenaHostNames=="tcgaHub") %>% |
|
|
156 |
XenaFilter(filterDatasets = "clinical") %>% |
|
|
157 |
XenaFilter(filterDatasets = "LUAD") -> to_browse |
|
|
158 |
|
|
|
159 |
to_browse |
|
|
160 |
|
|
|
161 |
XenaGenerate(subset = XenaHostNames=="tcgaHub") %>% |
|
|
162 |
XenaFilter(filterDatasets = "clinical") %>% |
|
|
163 |
XenaFilter(filterDatasets = "LUAD|LUSC") -> to_browse2 |
|
|
164 |
|
|
|
165 |
to_browse2 |
|
|
166 |
``` |
|
|
167 |
|
|
|
168 |
`XenaBrowse()` function can be used to browse dataset/cohort links using your default web browser. |
|
|
169 |
At default, this function limits one dataset/cohort for preventing user to open too many links at once. |
|
|
170 |
|
|
|
171 |
```{r,eval=FALSE} |
|
|
172 |
# This will open you web browser |
|
|
173 |
XenaBrowse(to_browse) |
|
|
174 |
|
|
|
175 |
XenaBrowse(to_browse, type = "cohort") |
|
|
176 |
``` |
|
|
177 |
|
|
|
178 |
```{r, error=TRUE} |
|
|
179 |
# This will throw error |
|
|
180 |
XenaBrowse(to_browse2) |
|
|
181 |
|
|
|
182 |
XenaBrowse(to_browse2, type = "cohort") |
|
|
183 |
``` |
|
|
184 |
|
|
|
185 |
When you make sure you want to open multiple links, you can set `multiple` option to `TRUE`. |
|
|
186 |
|
|
|
187 |
```{r, eval=FALSE} |
|
|
188 |
XenaBrowse(to_browse2, multiple = TRUE) |
|
|
189 |
XenaBrowse(to_browse2, type = "cohort", multiple = TRUE) |
|
|
190 |
``` |
|
|
191 |
|
|
|
192 |
## More usages |
|
|
193 |
|
|
|
194 |
The core functionality has been described above. |
|
|
195 |
I write more usages about this package in my website but not here |
|
|
196 |
because sometimes package check will fail due to internet problem. |
|
|
197 |
|
|
|
198 |
- [Introduction and basic usage of UCSCXenaTools](https://shixiangwang.github.io/home/en/tools/ucscxenatools-intro/) - [PDF](https://shixiangwang.github.io/home/en/tools/ucscxenatools-intro.pdf) |
|
|
199 |
- [APIs of UCSCXenaTools](https://shixiangwang.github.io/home/en/tools/ucscxenatools-api/) - [PDF](https://shixiangwang.github.io/home/en/tools/ucscxenatools-api.pdf) |
|
|
200 |
|
|
|
201 |
Read [Obtain RNAseq Values for a Specific Gene in Xena Database](https://shixiangwang.github.io/home/en/tools/ucscxenatools-single-gene/) to see how to get values for single gene. A use case for survival analysis based on single gene expression has been published on rOpenSci, please read |
|
|
202 |
[UCSCXenaTools: Retrieve Gene Expression and Clinical Information from UCSC Xena for Survival Analysis](https://ropensci.org/technotes/2019/09/06/ucscxenatools-surv/). |
|
|
203 |
|
|
|
204 |
## QA |
|
|
205 |
|
|
|
206 |
### How to resume file from breakpoint |
|
|
207 |
|
|
|
208 |
Thanks to the UCSC Xena team, the new feature 'resume from breakpoint' is added and |
|
|
209 |
can be done by **XenaDownload()** with the `method` and `extra` flags specified. |
|
|
210 |
|
|
|
211 |
Of note, the corresponding `wget` or `curl` command must be installed by your OS |
|
|
212 |
and can be found by R. |
|
|
213 |
|
|
|
214 |
The folliwng code gives a test example, the data can be viewed on [web page](https://xenabrowser.net/datapages/?dataset=TcgaTargetGtex_expected_count&host=https%3A%2F%2Ftoil.xenahubs.net&removeHub=https%3A%2F%2Fxena.treehouse.gi.ucsc.edu%3A443). |
|
|
215 |
|
|
|
216 |
```r |
|
|
217 |
library(UCSCXenaTools) |
|
|
218 |
xe = XenaGenerate(subset = XenaDatasets == "TcgaTargetGtex_expected_count") |
|
|
219 |
xe |
|
|
220 |
xq = XenaQuery(xe) |
|
|
221 |
# You cannot resume from breakpoint in default mode |
|
|
222 |
XenaDownload(xq, destdir = "~/test/", force = TRUE) |
|
|
223 |
# You can do it with 'curl' command |
|
|
224 |
XenaDownload(xq, destdir = "~/test/", method = "curl", extra = "-C -", force = TRUE) |
|
|
225 |
# You can do it with 'wget' command |
|
|
226 |
XenaDownload(xq, destdir = "~/test/", method = "wget", extra = "-c", force = TRUE) |
|
|
227 |
``` |
|
|
228 |
|
|
|
229 |
## Citation |
|
|
230 |
|
|
|
231 |
Cite me by the following paper. |
|
|
232 |
|
|
|
233 |
``` |
|
|
234 |
Wang et al., (2019). The UCSCXenaTools R package: a toolkit for accessing genomics data |
|
|
235 |
from UCSC Xena platform, from cancer multi-omics to single-cell RNA-seq. |
|
|
236 |
Journal of Open Source Software, 4(40), 1627, https://doi.org/10.21105/joss.01627 |
|
|
237 |
|
|
|
238 |
# For BibTex |
|
|
239 |
|
|
|
240 |
@article{Wang2019UCSCXenaTools, |
|
|
241 |
journal = {Journal of Open Source Software}, |
|
|
242 |
doi = {10.21105/joss.01627}, |
|
|
243 |
issn = {2475-9066}, |
|
|
244 |
number = {40}, |
|
|
245 |
publisher = {The Open Journal}, |
|
|
246 |
title = {The UCSCXenaTools R package: a toolkit for accessing genomics data from UCSC Xena platform, from cancer multi-omics to single-cell RNA-seq}, |
|
|
247 |
url = {http://dx.doi.org/10.21105/joss.01627}, |
|
|
248 |
volume = {4}, |
|
|
249 |
author = {Wang, Shixiang and Liu, Xuesong}, |
|
|
250 |
pages = {1627}, |
|
|
251 |
date = {2019-08-05}, |
|
|
252 |
year = {2019}, |
|
|
253 |
month = {8}, |
|
|
254 |
day = {5}, |
|
|
255 |
} |
|
|
256 |
``` |
|
|
257 |
|
|
|
258 |
Cite UCSC Xena by the following paper. |
|
|
259 |
|
|
|
260 |
``` |
|
|
261 |
Goldman, Mary, et al. "The UCSC Xena Platform for cancer genomics data |
|
|
262 |
visualization and interpretation." BioRxiv (2019): 326470. |
|
|
263 |
``` |
|
|
264 |
|
|
|
265 |
## Acknowledgments |
|
|
266 |
|
|
|
267 |
This package is based on [XenaR](https://github.com/mtmorgan/XenaR), thanks [Martin Morgan](https://github.com/mtmorgan) for his work. |