Switch to unified view

a b/vignettes/USCSXenaTools.Rmd
1
---
2
title: "UCSCXenaTools: an R package for Accessing Genomics Data from UCSC Xena platform, from Cancer Multi-omics to Single-cell RNA-seq"
3
author: "Shixiang Wang \\
4
5
        ShanghaiTech University"
6
date: "`r Sys.Date()`"
7
8
output:
9
  prettydoc::html_pretty:
10
    toc: true
11
    theme: cayman
12
    highlight: github
13
  pdf_document:
14
    toc: true
15
vignette: >
16
  %\VignetteIndexEntry{Basic usage}
17
  %\VignetteEngine{knitr::rmarkdown}
18
  %\usepackage[utf8]{inputenc}
19
---
20
21
```{r setup, include = FALSE}
22
knitr::opts_chunk$set(
23
  collapse = TRUE,
24
  comment = "#>"
25
)
26
```
27
28
29
**UCSCXenaTools** is an R package for accessing genomics data from UCSC Xena platform, 
30
from cancer multi-omics to single-cell RNA-seq. 
31
Public omics data from UCSC Xena are supported through [**multiple turn-key Xena Hubs**](https://xenabrowser.net/datapages/), which are a collection of UCSC-hosted public databases such as TCGA, ICGC, TARGET, GTEx, CCLE, and others. Databases are normalized so they can be combined, linked, filtered, explored and downloaded.
32
33
**Who is the target audience and what are scientific applications of this package?**
34
35
* Target Audience: cancer and clinical researchers, bioinformaticians
36
* Applications: genomic and clinical analyses
37
38
## Installation
39
40
Install stable release from CRAN with:
41
42
```{r, eval=FALSE}
43
install.packages("UCSCXenaTools")
44
```
45
46
You can also install devel version of **UCSCXenaTools** from github with:
47
48
```{r gh-installation, eval = FALSE}
49
# install.packages("remotes")
50
remotes::install_github("ropensci/UCSCXenaTools")
51
```
52
53
If you want to build vignette in local, please add two options:
54
55
```{r, eval=FALSE}
56
remotes::install_github("ropensci/UCSCXenaTools", build_vignettes = TRUE, dependencies = TRUE)
57
```
58
59
The minimum versions to run the vignette is `1.2.4`. 
60
[GitHub Issue](https://github.com/ropensci/UCSCXenaTools/issues) is a place for discussing any problem.
61
62
## Data Hub List
63
64
All datasets are available at <https://xenabrowser.net/datapages/>.
65
66
Currently, **UCSCXenaTools** supports the following data hubs of UCSC Xena.
67
68
* UCSC Public Hub: <https://ucscpublic.xenahubs.net/>
69
* TCGA Hub: <https://tcga.xenahubs.net/>
70
* GDC Xena Hub: <https://gdc.xenahubs.net/>
71
* ICGC Xena Hub: <https://icgc.xenahubs.net/>
72
* Pan-Cancer Atlas Hub: <https://pancanatlas.xenahubs.net/>
73
* UCSC Toil RNAseq Recompute Compendium Hub: <https://toil.xenahubs.net/>
74
* PCAWG Xena Hub: <https://pcawg.xenahubs.net/>
75
* ATAC-seq Hub: <https://atacseq.xenahubs.net/>
76
* Singel Cell Xena Hub: <https://singlecellnew.xenahubs.net/>
77
* Kids First Xena Hub: <https://kidsfirst.xenahubs.net/>
78
* Treehouse Xena Hub: <https://xena.treehouse.gi.ucsc.edu:443/>
79
80
Users can update dataset list from the newest version of UCSC Xena by hand with `XenaDataUpdate()` function, followed
81
by restarting R and `library(UCSCXenaTools)`.
82
83
If any url of data hub is changed or a new data hub is online, please remind me by emailing to <w_shixiang@163.com> or [opening an issue on GitHub](https://github.com/ropensci/UCSCXenaTools/issues).
84
85
86
## Usage
87
88
Download UCSC Xena datasets and load them into R by **UCSCXenaTools** is a workflow with `generate`, `filter`, `query`, `download` and `prepare` 5 steps, which are implemented as `XenaGenerate`, `XenaFilter`, `XenaQuery`, `XenaDownload` and `XenaPrepare` functions, respectively. They are very clear and easy to use and combine with other packages like `dplyr`.
89
90
To show the basic usage of **UCSCXenaTools**, we will download clinical data of LUNG, LUAD, LUSC from TCGA (hg19 version) data hub.
91
92
### XenaData data.frame
93
94
**UCSCXenaTools** uses a `data.frame` object (built in package) `XenaData` to generate an instance of `XenaHub` class, which records information of all datasets of UCSC Xena Data Hubs.
95
96
You can load `XenaData` after loading `UCSCXenaTools` into R.
97
98
```{r}
99
library(UCSCXenaTools)
100
data(XenaData)
101
102
head(XenaData)
103
```
104
105
### Workflow
106
107
Select datasets.
108
109
```{r}
110
# The options in XenaFilter function support Regular Expression
111
XenaGenerate(subset = XenaHostNames=="tcgaHub") %>% 
112
  XenaFilter(filterDatasets = "clinical") %>% 
113
  XenaFilter(filterDatasets = "LUAD|LUSC|LUNG") -> df_todo
114
115
df_todo
116
```
117
118
Sometimes we only know some keywords, `XenaScan()` can be used to scan all rows to detect if 
119
the keywords exist in `XenaData`.
120
121
122
```{r}
123
x1 = XenaScan(pattern = 'Blood')
124
x2 = XenaScan(pattern = 'LUNG', ignore.case = FALSE)
125
126
x1 %>%
127
    XenaGenerate()
128
x2 %>%
129
    XenaGenerate()
130
```
131
132
Query and download.
133
134
```{r}
135
XenaQuery(df_todo) %>%
136
  XenaDownload() -> xe_download
137
```
138
139
Prepare data into R for analysis.
140
141
```{r}
142
cli = XenaPrepare(xe_download)
143
class(cli)
144
names(cli)
145
```
146
147
### Browse datasets
148
149
Create two XenaHub objects:
150
151
* `to_browse` - a XenaHub object containing a cohort and a dataset.
152
* `to_browse2` - a XenaHub object containing 2 cohorts and 2 datasets.
153
154
```{r}
155
XenaGenerate(subset = XenaHostNames=="tcgaHub") %>%
156
    XenaFilter(filterDatasets = "clinical") %>%
157
    XenaFilter(filterDatasets = "LUAD") -> to_browse
158
159
to_browse
160
161
XenaGenerate(subset = XenaHostNames=="tcgaHub") %>%
162
    XenaFilter(filterDatasets = "clinical") %>%
163
    XenaFilter(filterDatasets = "LUAD|LUSC") -> to_browse2
164
165
to_browse2
166
```
167
168
`XenaBrowse()` function can be used to browse dataset/cohort links using your default web browser.
169
At default, this function limits one dataset/cohort for preventing user to open too many links at once. 
170
171
```{r,eval=FALSE}
172
# This will open you web browser
173
XenaBrowse(to_browse)
174
175
XenaBrowse(to_browse, type = "cohort")
176
```
177
178
```{r, error=TRUE}
179
# This will throw error
180
XenaBrowse(to_browse2)
181
182
XenaBrowse(to_browse2, type = "cohort")
183
```
184
185
When you make sure you want to open multiple links, you can set `multiple` option to `TRUE`.
186
187
```{r, eval=FALSE}
188
XenaBrowse(to_browse2, multiple = TRUE)
189
XenaBrowse(to_browse2, type = "cohort", multiple = TRUE)
190
```
191
192
## More usages
193
194
The core functionality has been described above. 
195
I write more usages about this package in my website but not here
196
because sometimes package check will fail due to internet problem.
197
198
- [Introduction and basic usage of UCSCXenaTools](https://shixiangwang.github.io/home/en/tools/ucscxenatools-intro/) - [PDF](https://shixiangwang.github.io/home/en/tools/ucscxenatools-intro.pdf)
199
- [APIs of UCSCXenaTools](https://shixiangwang.github.io/home/en/tools/ucscxenatools-api/) - [PDF](https://shixiangwang.github.io/home/en/tools/ucscxenatools-api.pdf)
200
201
Read [Obtain RNAseq Values for a Specific Gene in Xena Database](https://shixiangwang.github.io/home/en/tools/ucscxenatools-single-gene/) to see how to get values for single gene. A use case for survival analysis based on single gene expression has been published on rOpenSci, please read
202
[UCSCXenaTools: Retrieve Gene Expression and Clinical Information from UCSC Xena for Survival Analysis](https://ropensci.org/technotes/2019/09/06/ucscxenatools-surv/).
203
204
## QA
205
206
### How to resume file from breakpoint
207
208
Thanks to the UCSC Xena team, the new feature 'resume from breakpoint' is added and 
209
can be done by **XenaDownload()** with the `method` and `extra` flags specified.
210
211
Of note, the corresponding `wget` or `curl` command must be installed by your OS
212
and can be found by R.
213
214
The folliwng code gives a test example, the data can be viewed on [web page](https://xenabrowser.net/datapages/?dataset=TcgaTargetGtex_expected_count&host=https%3A%2F%2Ftoil.xenahubs.net&removeHub=https%3A%2F%2Fxena.treehouse.gi.ucsc.edu%3A443).
215
216
```r
217
library(UCSCXenaTools)
218
xe = XenaGenerate(subset = XenaDatasets == "TcgaTargetGtex_expected_count")
219
xe
220
xq = XenaQuery(xe)
221
# You cannot resume from breakpoint in default mode
222
XenaDownload(xq, destdir = "~/test/", force = TRUE)
223
# You can do it with 'curl' command
224
XenaDownload(xq, destdir = "~/test/", method = "curl", extra = "-C -", force = TRUE)
225
# You can do it with 'wget' command
226
XenaDownload(xq, destdir = "~/test/", method = "wget", extra = "-c", force = TRUE)
227
```
228
229
## Citation
230
231
Cite me by the following paper.
232
233
```
234
Wang et al., (2019). The UCSCXenaTools R package: a toolkit for accessing genomics data
235
  from UCSC Xena platform, from cancer multi-omics to single-cell RNA-seq. 
236
  Journal of Open Source Software, 4(40), 1627, https://doi.org/10.21105/joss.01627
237
238
# For BibTex
239
  
240
@article{Wang2019UCSCXenaTools,
241
    journal = {Journal of Open Source Software},
242
    doi = {10.21105/joss.01627},
243
    issn = {2475-9066},
244
    number = {40},
245
    publisher = {The Open Journal},
246
    title = {The UCSCXenaTools R package: a toolkit for accessing genomics data from UCSC Xena platform, from cancer multi-omics to single-cell RNA-seq},
247
    url = {http://dx.doi.org/10.21105/joss.01627},
248
    volume = {4},
249
    author = {Wang, Shixiang and Liu, Xuesong},
250
    pages = {1627},
251
    date = {2019-08-05},
252
    year = {2019},
253
    month = {8},
254
    day = {5},
255
}
256
```
257
258
Cite UCSC Xena by the following paper. 
259
260
```
261
Goldman, Mary, et al. "The UCSC Xena Platform for cancer genomics data 
262
    visualization and interpretation." BioRxiv (2019): 326470.
263
```
264
265
## Acknowledgments
266
267
This package is based on [XenaR](https://github.com/mtmorgan/XenaR), thanks [Martin Morgan](https://github.com/mtmorgan) for his work.