|
a/README.md |
|
b/README.md |
1 |
|
1 |
--- |
2 |
<!-- README.md is generated from README.Rmd. Please edit that file --> |
2 |
output: github_document |
3 |
|
3 |
--- |
4 |
# UCSCXenaTools <img src='man/figures/logo.png' align="right" height="200" alt="logo"/> |
4 |
|
5 |
|
5 |
<!-- README.md is generated from README.Rmd. Please edit that file --> |
6 |
<!-- badges: start --> |
6 |
|
7 |
|
7 |
```{r, echo = FALSE} |
8 |
[](https://cran.r-project.org/package=UCSCXenaTools) |
9 |
collapse = TRUE, |
10 |
[](https://lifecycle.r-lib.org/articles/stages.html) |
10 |
comment = "#>", |
11 |
[](https://github.com/ropensci/UCSCXenaTools/actions/workflows/main.yml) |
11 |
fig.path = "README-" |
12 |
[](https://cran.r-project.org/package=UCSCXenaTools) |
12 |
) |
13 |
[](https://github.com/ropensci/software-review/issues/315) |
13 |
``` |
14 |
[](https://doi.org/10.21105/joss.01627) |
14 |
|
15 |
|
15 |
|
16 |
<!-- badges: end --> |
16 |
<!-- badges: start --> |
17 |
|
17 |
|
18 |
**UCSCXenaTools** is an R package for accessing genomics data from UCSC |
18 |
[](https://cran.r-project.org/package=UCSCXenaTools) |
19 |
Xena platform, from cancer multi-omics to single-cell RNA-seq. Public |
19 |
[](https://lifecycle.r-lib.org/articles/stages.html) |
20 |
omics data from UCSC Xena are supported through [**multiple turn-key |
20 |
[](https://github.com/ropensci/UCSCXenaTools/actions/workflows/main.yml) |
21 |
Xena Hubs**](https://xenabrowser.net/datapages/), which are a collection |
21 |
[](https://cran.r-project.org/package=UCSCXenaTools) |
22 |
of UCSC-hosted public databases such as TCGA, ICGC, TARGET, GTEx, CCLE, |
22 |
[](https://github.com/ropensci/software-review/issues/315) |
23 |
and others. Databases are normalized so they can be combined, linked, |
23 |
[](https://doi.org/10.21105/joss.01627) |
24 |
filtered, explored and downloaded. |
24 |
|
25 |
|
25 |
<!-- badges: end --> |
26 |
**Who is the target audience and what are scientific applications of |
26 |
|
27 |
this package?** |
27 |
**UCSCXenaTools** is an R package for accessing genomics data from UCSC Xena platform, from cancer multi-omics to single-cell RNA-seq. |
28 |
|
28 |
Public omics data from UCSC Xena are supported through [**multiple turn-key Xena Hubs**](https://xenabrowser.net/datapages/), which are a collection of UCSC-hosted public databases such as TCGA, ICGC, TARGET, GTEx, CCLE, and others. Databases are normalized so they can be combined, linked, filtered, explored and downloaded. |
29 |
- Target Audience: cancer and clinical researchers, bioinformaticians |
29 |
|
30 |
- Applications: genomic and clinical analyses |
30 |
**Who is the target audience and what are scientific applications of this package?** |
31 |
|
31 |
|
32 |
## Table of Contents |
32 |
* Target Audience: cancer and clinical researchers, bioinformaticians |
33 |
|
33 |
* Applications: genomic and clinical analyses |
34 |
- [Installation](#installation) |
34 |
|
35 |
- [Data Hub List](#data-hub-list) |
35 |
## Table of Contents |
36 |
- [Basic usage](#basic-usage) |
36 |
|
37 |
- [Citation](#citation) |
37 |
* [Installation](#installation) |
38 |
- [How to contribute](#how-to-contribute) |
38 |
* [Data Hub List](#data-hub-list) |
39 |
- [Acknowledgment](#acknowledgment) |
39 |
* [Basic usage](#basic-usage) |
40 |
|
40 |
* [Citation](#citation) |
41 |
## Installation |
41 |
* [How to contribute](#how-to-contribute) |
42 |
|
42 |
* [Acknowledgment](#acknowledgment) |
43 |
Install stable release from r-universe/CRAN with: |
43 |
|
44 |
|
44 |
## Installation |
45 |
``` r |
45 |
|
46 |
install.packages('UCSCXenaTools', repos = c('https://ropensci.r-universe.dev', 'https://cloud.r-project.org')) |
46 |
Install stable release from r-universe/CRAN with: |
47 |
#install.packages("UCSCXenaTools") |
47 |
|
48 |
``` |
48 |
```{r, eval=FALSE} |
49 |
|
49 |
install.packages('UCSCXenaTools', repos = c('https://ropensci.r-universe.dev', 'https://cloud.r-project.org')) |
50 |
You can also install devel version of **UCSCXenaTools** from github |
50 |
#install.packages("UCSCXenaTools") |
51 |
with: |
51 |
``` |
52 |
|
52 |
|
53 |
``` r |
53 |
You can also install devel version of **UCSCXenaTools** from github with: |
54 |
# install.packages("remotes") |
54 |
|
55 |
remotes::install_github("ropensci/UCSCXenaTools") |
55 |
```{r gh-installation, eval = FALSE} |
56 |
``` |
56 |
# install.packages("remotes") |
57 |
|
57 |
remotes::install_github("ropensci/UCSCXenaTools") |
58 |
If you want to build vignette in local, please add two options: |
58 |
``` |
59 |
|
59 |
|
60 |
``` r |
60 |
If you want to build vignette in local, please add two options: |
61 |
remotes::install_github("ropensci/UCSCXenaTools", build_vignettes = TRUE, dependencies = TRUE) |
61 |
|
62 |
``` |
62 |
```{r, eval=FALSE} |
63 |
|
63 |
remotes::install_github("ropensci/UCSCXenaTools", build_vignettes = TRUE, dependencies = TRUE) |
64 |
## Data Hub List |
64 |
``` |
65 |
|
65 |
|
66 |
All datasets are available at <https://xenabrowser.net/datapages/>. |
66 |
## Data Hub List |
67 |
|
67 |
|
68 |
Currently, **UCSCXenaTools** supports the following data hubs of UCSC |
68 |
All datasets are available at <https://xenabrowser.net/datapages/>. |
69 |
Xena. |
69 |
|
70 |
|
70 |
Currently, **UCSCXenaTools** supports the following data hubs of UCSC Xena. |
71 |
- UCSC Public Hub: <https://ucscpublic.xenahubs.net/> |
71 |
|
72 |
- TCGA Hub: <https://tcga.xenahubs.net/> |
72 |
* UCSC Public Hub: <https://ucscpublic.xenahubs.net/> |
73 |
- GDC Xena Hub (new): <https://gdc.xenahubs.net/> |
73 |
* TCGA Hub: <https://tcga.xenahubs.net/> |
74 |
- GDC v18.0 Xena Hub (old): <https://gdcV18.xenahubs.net/> |
74 |
* GDC Xena Hub (new): <https://gdc.xenahubs.net/> |
75 |
- ICGC Xena Hub: <https://icgc.xenahubs.net/> |
75 |
* GDC v18.0 Xena Hub (old): <https://gdcV18.xenahubs.net/> |
76 |
- Pan-Cancer Atlas Hub: <https://pancanatlas.xenahubs.net/> |
76 |
* ICGC Xena Hub: <https://icgc.xenahubs.net/> |
77 |
- UCSC Toil RNAseq Recompute Compendium Hub: |
77 |
* Pan-Cancer Atlas Hub: <https://pancanatlas.xenahubs.net/> |
78 |
<https://toil.xenahubs.net/> |
78 |
* UCSC Toil RNAseq Recompute Compendium Hub: <https://toil.xenahubs.net/> |
79 |
- PCAWG Xena Hub: <https://pcawg.xenahubs.net/> |
79 |
* PCAWG Xena Hub: <https://pcawg.xenahubs.net/> |
80 |
- ATAC-seq Hub: <https://atacseq.xenahubs.net/> |
80 |
* ATAC-seq Hub: <https://atacseq.xenahubs.net/> |
81 |
- Singel Cell Xena Hub: <https://singlecellnew.xenahubs.net/> |
81 |
* Singel Cell Xena Hub: <https://singlecellnew.xenahubs.net/> (**Disabled by UCSCXena**) |
82 |
(**Disabled by UCSCXena**) |
82 |
* Kids First Xena Hub: <https://kidsfirst.xenahubs.net/> |
83 |
- Kids First Xena Hub: <https://kidsfirst.xenahubs.net/> |
83 |
* Treehouse Xena Hub: <https://xena.treehouse.gi.ucsc.edu:443/> |
84 |
- Treehouse Xena Hub: <https://xena.treehouse.gi.ucsc.edu:443/> |
84 |
|
85 |
|
85 |
Users can update dataset list from the newest version of UCSC Xena by hand with `XenaDataUpdate()` function, followed |
86 |
Users can update dataset list from the newest version of UCSC Xena by |
86 |
by restarting R and `library(UCSCXenaTools)`. |
87 |
hand with `XenaDataUpdate()` function, followed by restarting R and |
87 |
|
88 |
`library(UCSCXenaTools)`. |
88 |
If any url of data hub is changed or a new data hub is online, please remind me by emailing to <w_shixiang@163.com> or [opening an issue on GitHub](https://github.com/ropensci/UCSCXenaTools/issues). |
89 |
|
89 |
|
90 |
If any url of data hub is changed or a new data hub is online, please |
90 |
|
91 |
remind me by emailing to <w_shixiang@163.com> or [opening an issue on |
91 |
## Basic usage |
92 |
GitHub](https://github.com/ropensci/UCSCXenaTools/issues). |
92 |
|
93 |
|
93 |
Download UCSC Xena datasets and load them into R by **UCSCXenaTools** is a workflow with `generate`, `filter`, `query`, `download` and `prepare` 5 steps, which are implemented as `XenaGenerate`, `XenaFilter`, `XenaQuery`, `XenaDownload` and `XenaPrepare` functions, respectively. They are very clear and easy to use and combine with other packages like `dplyr`. |
94 |
## Basic usage |
94 |
|
95 |
|
95 |
To show the basic usage of **UCSCXenaTools**, we will download clinical data of LUNG, LUAD, LUSC from TCGA (hg19 version) data hub. Users can learn more about **UCSCXenaTools** by running `browseVignettes("UCSCXenaTools")` to read vignette. |
96 |
Download UCSC Xena datasets and load them into R by **UCSCXenaTools** is |
96 |
|
97 |
a workflow with `generate`, `filter`, `query`, `download` and `prepare` |
97 |
### XenaData data.frame |
98 |
5 steps, which are implemented as `XenaGenerate`, `XenaFilter`, |
98 |
|
99 |
`XenaQuery`, `XenaDownload` and `XenaPrepare` functions, respectively. |
99 |
**UCSCXenaTools** uses a `data.frame` object (built in package) `XenaData` to generate an instance of `XenaHub` class, which records information of all datasets of UCSC Xena Data Hubs. |
100 |
They are very clear and easy to use and combine with other packages like |
100 |
|
101 |
`dplyr`. |
101 |
You can load `XenaData` after loading `UCSCXenaTools` into R. |
102 |
|
102 |
|
103 |
To show the basic usage of **UCSCXenaTools**, we will download clinical |
103 |
```{r} |
104 |
data of LUNG, LUAD, LUSC from TCGA (hg19 version) data hub. Users can |
104 |
library(UCSCXenaTools) |
105 |
learn more about **UCSCXenaTools** by running |
105 |
data(XenaData) |
106 |
`browseVignettes("UCSCXenaTools")` to read vignette. |
106 |
|
107 |
|
107 |
head(XenaData) |
108 |
### XenaData data.frame |
108 |
``` |
109 |
|
109 |
|
110 |
**UCSCXenaTools** uses a `data.frame` object (built in package) |
110 |
### Workflow |
111 |
`XenaData` to generate an instance of `XenaHub` class, which records |
111 |
|
112 |
information of all datasets of UCSC Xena Data Hubs. |
112 |
Select datasets. |
113 |
|
113 |
|
114 |
You can load `XenaData` after loading `UCSCXenaTools` into R. |
114 |
```{r} |
115 |
|
115 |
# The options in XenaFilter function support Regular Expression |
116 |
``` r |
116 |
XenaGenerate(subset = XenaHostNames=="tcgaHub") %>% |
117 |
library(UCSCXenaTools) |
117 |
XenaFilter(filterDatasets = "clinical") %>% |
118 |
#> ========================================================================================= |
118 |
XenaFilter(filterDatasets = "LUAD|LUSC|LUNG") -> df_todo |
119 |
#> UCSCXenaTools version 1.6.0 |
119 |
|
120 |
#> Project URL: https://github.com/ropensci/UCSCXenaTools |
120 |
df_todo |
121 |
#> Usages: https://cran.r-project.org/web/packages/UCSCXenaTools/vignettes/USCSXenaTools.html |
121 |
``` |
122 |
#> |
122 |
|
123 |
#> If you use it in published research, please cite: |
123 |
Query and download. |
124 |
#> Wang et al., (2019). The UCSCXenaTools R package: a toolkit for accessing genomics data |
124 |
|
125 |
#> from UCSC Xena platform, from cancer multi-omics to single-cell RNA-seq. |
125 |
```{r} |
126 |
#> Journal of Open Source Software, 4(40), 1627, https://doi.org/10.21105/joss.01627 |
126 |
XenaQuery(df_todo) %>% |
127 |
#> ========================================================================================= |
127 |
XenaDownload() -> xe_download |
128 |
#> --Enjoy it-- |
128 |
``` |
129 |
data(XenaData) |
129 |
|
130 |
|
130 |
Prepare data into R for analysis. |
131 |
head(XenaData) |
131 |
|
132 |
#> # A tibble: 6 × 17 |
132 |
```{r} |
133 |
#> XenaHosts XenaHostNames XenaCohorts XenaDatasets SampleCount DataSubtype Label |
133 |
cli = XenaPrepare(xe_download) |
134 |
#> <chr> <chr> <chr> <chr> <int> <chr> <chr> |
134 |
class(cli) |
135 |
#> 1 https://… publicHub Breast Can… ucsfNeve_pu… 51 gene expre… Neve… |
135 |
names(cli) |
136 |
#> 2 https://… publicHub Breast Can… ucsfNeve_pu… 57 phenotype Phen… |
136 |
``` |
137 |
#> 3 https://… publicHub Glioma (Ko… kotliarov20… 194 copy number Kotl… |
137 |
|
138 |
#> 4 https://… publicHub Glioma (Ko… kotliarov20… 194 phenotype Phen… |
138 |
## More to read |
139 |
#> 5 https://… publicHub Lung Cance… weir2007_pu… 383 copy number CGH |
139 |
|
140 |
#> 6 https://… publicHub Lung Cance… weir2007_pu… 383 phenotype Phen… |
140 |
- [Introduction and basic usage of UCSCXenaTools](https://shixiangwang.github.io/home/en/tools/ucscxenatools-intro/) |
141 |
#> # ℹ 10 more variables: Type <chr>, AnatomicalOrigin <chr>, SampleType <chr>, |
141 |
- [UCSCXenaTools: Retrieve Gene Expression and Clinical Information from UCSC Xena for Survival Analysis](https://shixiangwang.github.io/home/en/post/ucscxenatools-201908/) |
142 |
#> # Tags <chr>, ProbeMap <chr>, LongTitle <chr>, Citation <chr>, Version <chr>, |
142 |
- [Obtain RNAseq Values for a Specific Gene in Xena Database](https://shixiangwang.github.io/home/en/post/2020-07-22-ucscxenatools-single-gene/) |
143 |
#> # Unit <chr>, Platform <chr> |
143 |
- [UCSC Xena Access APIs in UCSCXenaTools](https://shixiangwang.github.io/home/en/tools/ucscxenatools-api/) |
144 |
``` |
144 |
|
145 |
|
145 |
## Citation |
146 |
### Workflow |
146 |
|
147 |
|
147 |
Cite me by the following paper. |
148 |
Select datasets. |
148 |
|
149 |
|
149 |
``` |
150 |
``` r |
150 |
Wang et al., (2019). The UCSCXenaTools R package: a toolkit for accessing genomics data |
151 |
# The options in XenaFilter function support Regular Expression |
151 |
from UCSC Xena platform, from cancer multi-omics to single-cell RNA-seq. |
152 |
XenaGenerate(subset = XenaHostNames=="tcgaHub") %>% |
152 |
Journal of Open Source Software, 4(40), 1627, https://doi.org/10.21105/joss.01627 |
153 |
XenaFilter(filterDatasets = "clinical") %>% |
153 |
|
154 |
XenaFilter(filterDatasets = "LUAD|LUSC|LUNG") -> df_todo |
154 |
# For BibTex |
155 |
|
155 |
|
156 |
df_todo |
156 |
@article{Wang2019UCSCXenaTools, |
157 |
#> class: XenaHub |
157 |
journal = {Journal of Open Source Software}, |
158 |
#> hosts(): |
158 |
doi = {10.21105/joss.01627}, |
159 |
#> https://tcga.xenahubs.net |
159 |
issn = {2475-9066}, |
160 |
#> cohorts() (3 total): |
160 |
number = {40}, |
161 |
#> TCGA Lung Cancer (LUNG) |
161 |
publisher = {The Open Journal}, |
162 |
#> TCGA Lung Adenocarcinoma (LUAD) |
162 |
title = {The UCSCXenaTools R package: a toolkit for accessing genomics data from UCSC Xena platform, from cancer multi-omics to single-cell RNA-seq}, |
163 |
#> TCGA Lung Squamous Cell Carcinoma (LUSC) |
163 |
url = {https://dx.doi.org/10.21105/joss.01627}, |
164 |
#> datasets() (3 total): |
164 |
volume = {4}, |
165 |
#> TCGA.LUNG.sampleMap/LUNG_clinicalMatrix |
165 |
author = {Wang, Shixiang and Liu, Xuesong}, |
166 |
#> TCGA.LUAD.sampleMap/LUAD_clinicalMatrix |
166 |
pages = {1627}, |
167 |
#> TCGA.LUSC.sampleMap/LUSC_clinicalMatrix |
167 |
date = {2019-08-05}, |
168 |
``` |
168 |
year = {2019}, |
169 |
|
169 |
month = {8}, |
170 |
Query and download. |
170 |
day = {5}, |
171 |
|
171 |
} |
172 |
``` r |
172 |
``` |
173 |
XenaQuery(df_todo) %>% |
173 |
|
174 |
XenaDownload() -> xe_download |
174 |
Cite UCSC Xena by the following paper. |
175 |
#> This will check url status, please be patient. |
175 |
|
176 |
#> All downloaded files will under directory /tmp/RtmpYsoGw3. |
176 |
``` |
177 |
#> The 'trans_slash' option is FALSE, keep same directory structure as Xena. |
177 |
Goldman, Mary, et al. "The UCSC Xena Platform for cancer genomics data |
178 |
#> Creating directories for datasets... |
178 |
visualization and interpretation." BioRxiv (2019): 326470. |
179 |
#> Downloading TCGA.LUNG.sampleMap/LUNG_clinicalMatrix |
179 |
``` |
180 |
#> Downloading TCGA.LUAD.sampleMap/LUAD_clinicalMatrix |
180 |
|
181 |
#> Downloading TCGA.LUSC.sampleMap/LUSC_clinicalMatrix |
181 |
## How to contribute |
182 |
``` |
182 |
|
183 |
|
183 |
For anyone who wants to contribute, please follow the guideline: |
184 |
Prepare data into R for analysis. |
184 |
|
185 |
|
185 |
* Clone project from GitHub |
186 |
``` r |
186 |
* Open `UCSCXenaTools.Rproj` with RStudio |
187 |
cli = XenaPrepare(xe_download) |
187 |
* Modify source code |
188 |
class(cli) |
188 |
* Run `devtools::check()`, and fix all errors, warnings and notes |
189 |
#> [1] "list" |
189 |
* Create a pull request |
190 |
names(cli) |
190 |
|
191 |
#> [1] "LUNG_clinicalMatrix" "LUAD_clinicalMatrix" "LUSC_clinicalMatrix" |
191 |
## Acknowledgment |
192 |
``` |
192 |
|
193 |
|
193 |
This package is based on [XenaR](https://github.com/mtmorgan/XenaR), thanks [Martin Morgan](https://github.com/mtmorgan) for his work. |
194 |
## More to read |
194 |
|
195 |
|
195 |
[](https://ropensci.org)
|
196 |
- [Introduction and basic usage of |
|
|
197 |
UCSCXenaTools](https://shixiangwang.github.io/home/en/tools/ucscxenatools-intro/) |
|
|
198 |
- [UCSCXenaTools: Retrieve Gene Expression and Clinical Information from |
|
|
199 |
UCSC Xena for Survival |
|
|
200 |
Analysis](https://shixiangwang.github.io/home/en/post/ucscxenatools-201908/) |
|
|
201 |
- [Obtain RNAseq Values for a Specific Gene in Xena |
|
|
202 |
Database](https://shixiangwang.github.io/home/en/post/2020-07-22-ucscxenatools-single-gene/) |
|
|
203 |
- [UCSC Xena Access APIs in |
|
|
204 |
UCSCXenaTools](https://shixiangwang.github.io/home/en/tools/ucscxenatools-api/) |
|
|
205 |
|
|
|
206 |
## Citation |
|
|
207 |
|
|
|
208 |
Cite me by the following paper. |
|
|
209 |
|
|
|
210 |
Wang et al., (2019). The UCSCXenaTools R package: a toolkit for accessing genomics data |
|
|
211 |
from UCSC Xena platform, from cancer multi-omics to single-cell RNA-seq. |
|
|
212 |
Journal of Open Source Software, 4(40), 1627, https://doi.org/10.21105/joss.01627 |
|
|
213 |
|
|
|
214 |
# For BibTex |
|
|
215 |
|
|
|
216 |
@article{Wang2019UCSCXenaTools, |
|
|
217 |
journal = {Journal of Open Source Software}, |
|
|
218 |
doi = {10.21105/joss.01627}, |
|
|
219 |
issn = {2475-9066}, |
|
|
220 |
number = {40}, |
|
|
221 |
publisher = {The Open Journal}, |
|
|
222 |
title = {The UCSCXenaTools R package: a toolkit for accessing genomics data from UCSC Xena platform, from cancer multi-omics to single-cell RNA-seq}, |
|
|
223 |
url = {https://dx.doi.org/10.21105/joss.01627}, |
|
|
224 |
volume = {4}, |
|
|
225 |
author = {Wang, Shixiang and Liu, Xuesong}, |
|
|
226 |
pages = {1627}, |
|
|
227 |
date = {2019-08-05}, |
|
|
228 |
year = {2019}, |
|
|
229 |
month = {8}, |
|
|
230 |
day = {5}, |
|
|
231 |
} |
|
|
232 |
|
|
|
233 |
Cite UCSC Xena by the following paper. |
|
|
234 |
|
|
|
235 |
Goldman, Mary, et al. "The UCSC Xena Platform for cancer genomics data |
|
|
236 |
visualization and interpretation." BioRxiv (2019): 326470. |
|
|
237 |
|
|
|
238 |
## How to contribute |
|
|
239 |
|
|
|
240 |
For anyone who wants to contribute, please follow the guideline: |
|
|
241 |
|
|
|
242 |
- Clone project from GitHub |
|
|
243 |
- Open `UCSCXenaTools.Rproj` with RStudio |
|
|
244 |
- Modify source code |
|
|
245 |
- Run `devtools::check()`, and fix all errors, warnings and notes |
|
|
246 |
- Create a pull request |
|
|
247 |
|
|
|
248 |
## Acknowledgment |
|
|
249 |
|
|
|
250 |
This package is based on [XenaR](https://github.com/mtmorgan/XenaR), |
|
|
251 |
thanks [Martin Morgan](https://github.com/mtmorgan) for his work. |
|
|
252 |
|
|
|
253 |
[](https://ropensci.org) |
|
|