Diff of /paper/paper.md [000000] .. [0bdad5]

Switch to unified view

a b/paper/paper.md
1
---
2
title: 'The UCSCXenaTools R package: a toolkit for accessing genomics data from UCSC Xena platform,
3
  from cancer multi-omics to single-cell RNA-seq'
4
authors:
5
- affiliation: '1, 2, 3'
6
  name: Shixiang Wang
7
  orcid: 0000-0001-9855-7357
8
- affiliation: 1
9
  name: Xuesong Liu
10
  orcid: 0000-0002-7736-0077
11
date: "24 July 2019"
12
bibliography: paper.bib
13
tags:
14
- R
15
- cancer genomics
16
- data access
17
affiliations:
18
- index: 1
19
  name: School of Life Science and Technology, ShanghaiTech University
20
- index: 2
21
  name: Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences
22
- index: 3
23
  name: University of Chinese Academy of Sciences
24
---
25
26
# Summary
27
28
UCSC Xena platform (https://xenabrowser.net/) provides unprecedented resource for public omics data [@goldman2019ucsc]
29
from big projects like The Cancer Genome Atlas (TCGA) [@weinstein2013cancer], 
30
International Cancer Genome Consortium Data Portal (ICGC) [@zhang2011international],
31
The Cancer Cell Line Encyclopedia (CCLE) [@barretina2012cancer], or reserach groups like @mullighan2008genomic, @puram2017single.
32
All available data types include single-nucleotide variants (SNVs), small insertions and deletions (INDELs), large structural variants, copy number variation (CNV), expression, DNA methylation, ATAC-seq signals, and phenotypic annotations. 
33
34
Despite UCSC Xena platform itself allows users to explore and analyze data, it is hard
35
for users to incorporate multiple datasets or data types, integrate the selected data with 
36
popular analysis tools or homebrewed code, and reproduce analysis procedures.
37
R language is well established and extensively used standard in statistical and bioinformatics research.
38
Here, we introduce an R package UCSCXenaTools for enabling data retrieval, analysis integration and 
39
reproducible research for omics data from UCSC Xena platform.
40
41
Currently, UCSCXenaTools supports downloading over 1600 datasets from 10 data hubs of UCSC Xena platform
42
as shown in the following table. Typically, downloading UCSC Xena datasets and loading them into R by UCSCXenaTools 
43
is a workflow with generate, filter, query, download and prepare 5 steps, which are implemented as functions.
44
They are very clear and easy to use and combine with other packages like dplyr [@wickham2015dplyr].
45
Besides, UCSCXenaTools can also query and download subset of a target dataset, 
46
this is particularly useful when
47
user focus on studying one object like gene or protein. The key features are summarized in Figure 1.
48
49
50
|Data hub       | Dataset count|URL                                |
51
|:--------------|-------------:|:----------------------------------|
52
|tcgaHub        |           879|https://tcga.xenahubs.net          |
53
|gdcHub         |           449|https://gdc.xenahubs.net           |
54
|publicHub      |           104|https://ucscpublic.xenahubs.net    |
55
|pcawgHub       |            53|https://pcawg.xenahubs.net         |
56
|toilHub        |            50|https://toil.xenahubs.net          |
57
|singlecellHub  |            45|https://singlecell.xenahubs.net    |
58
|icgcHub        |            23|https://icgc.xenahubs.net          |
59
|pancanAtlasHub |            19|https://pancanatlas.xenahubs.net   |
60
|treehouseHub   |            15|https://xena.treehouse.gi.ucsc.edu |
61
|atacseqHub     |             9|https://atacseq.xenahubs.net       |
62
63
![Overview of UCSCXenaTools](overview.png)
64
65
# Acknowledgements
66
67
We thank Christine Stawitz and Carl Ganz for their constructive comments.
68
This package is based on R package [XenaR](https://github.com/mtmorgan/XenaR), thanks [Martin Morgan](https://github.com/mtmorgan) for his work.
69
70
# References