|
a |
|
b/paper/paper.md |
|
|
1 |
--- |
|
|
2 |
title: 'The UCSCXenaTools R package: a toolkit for accessing genomics data from UCSC Xena platform, |
|
|
3 |
from cancer multi-omics to single-cell RNA-seq' |
|
|
4 |
authors: |
|
|
5 |
- affiliation: '1, 2, 3' |
|
|
6 |
name: Shixiang Wang |
|
|
7 |
orcid: 0000-0001-9855-7357 |
|
|
8 |
- affiliation: 1 |
|
|
9 |
name: Xuesong Liu |
|
|
10 |
orcid: 0000-0002-7736-0077 |
|
|
11 |
date: "24 July 2019" |
|
|
12 |
bibliography: paper.bib |
|
|
13 |
tags: |
|
|
14 |
- R |
|
|
15 |
- cancer genomics |
|
|
16 |
- data access |
|
|
17 |
affiliations: |
|
|
18 |
- index: 1 |
|
|
19 |
name: School of Life Science and Technology, ShanghaiTech University |
|
|
20 |
- index: 2 |
|
|
21 |
name: Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences |
|
|
22 |
- index: 3 |
|
|
23 |
name: University of Chinese Academy of Sciences |
|
|
24 |
--- |
|
|
25 |
|
|
|
26 |
# Summary |
|
|
27 |
|
|
|
28 |
UCSC Xena platform (https://xenabrowser.net/) provides unprecedented resource for public omics data [@goldman2019ucsc] |
|
|
29 |
from big projects like The Cancer Genome Atlas (TCGA) [@weinstein2013cancer], |
|
|
30 |
International Cancer Genome Consortium Data Portal (ICGC) [@zhang2011international], |
|
|
31 |
The Cancer Cell Line Encyclopedia (CCLE) [@barretina2012cancer], or reserach groups like @mullighan2008genomic, @puram2017single. |
|
|
32 |
All available data types include single-nucleotide variants (SNVs), small insertions and deletions (INDELs), large structural variants, copy number variation (CNV), expression, DNA methylation, ATAC-seq signals, and phenotypic annotations. |
|
|
33 |
|
|
|
34 |
Despite UCSC Xena platform itself allows users to explore and analyze data, it is hard |
|
|
35 |
for users to incorporate multiple datasets or data types, integrate the selected data with |
|
|
36 |
popular analysis tools or homebrewed code, and reproduce analysis procedures. |
|
|
37 |
R language is well established and extensively used standard in statistical and bioinformatics research. |
|
|
38 |
Here, we introduce an R package UCSCXenaTools for enabling data retrieval, analysis integration and |
|
|
39 |
reproducible research for omics data from UCSC Xena platform. |
|
|
40 |
|
|
|
41 |
Currently, UCSCXenaTools supports downloading over 1600 datasets from 10 data hubs of UCSC Xena platform |
|
|
42 |
as shown in the following table. Typically, downloading UCSC Xena datasets and loading them into R by UCSCXenaTools |
|
|
43 |
is a workflow with generate, filter, query, download and prepare 5 steps, which are implemented as functions. |
|
|
44 |
They are very clear and easy to use and combine with other packages like dplyr [@wickham2015dplyr]. |
|
|
45 |
Besides, UCSCXenaTools can also query and download subset of a target dataset, |
|
|
46 |
this is particularly useful when |
|
|
47 |
user focus on studying one object like gene or protein. The key features are summarized in Figure 1. |
|
|
48 |
|
|
|
49 |
|
|
|
50 |
|Data hub | Dataset count|URL | |
|
|
51 |
|:--------------|-------------:|:----------------------------------| |
|
|
52 |
|tcgaHub | 879|https://tcga.xenahubs.net | |
|
|
53 |
|gdcHub | 449|https://gdc.xenahubs.net | |
|
|
54 |
|publicHub | 104|https://ucscpublic.xenahubs.net | |
|
|
55 |
|pcawgHub | 53|https://pcawg.xenahubs.net | |
|
|
56 |
|toilHub | 50|https://toil.xenahubs.net | |
|
|
57 |
|singlecellHub | 45|https://singlecell.xenahubs.net | |
|
|
58 |
|icgcHub | 23|https://icgc.xenahubs.net | |
|
|
59 |
|pancanAtlasHub | 19|https://pancanatlas.xenahubs.net | |
|
|
60 |
|treehouseHub | 15|https://xena.treehouse.gi.ucsc.edu | |
|
|
61 |
|atacseqHub | 9|https://atacseq.xenahubs.net | |
|
|
62 |
|
|
|
63 |
 |
|
|
64 |
|
|
|
65 |
# Acknowledgements |
|
|
66 |
|
|
|
67 |
We thank Christine Stawitz and Carl Ganz for their constructive comments. |
|
|
68 |
This package is based on R package [XenaR](https://github.com/mtmorgan/XenaR), thanks [Martin Morgan](https://github.com/mtmorgan) for his work. |
|
|
69 |
|
|
|
70 |
# References |