|
a |
|
b/docs/source/edgepy.rst |
|
|
1 |
====== |
|
|
2 |
edgepy |
|
|
3 |
====== |
|
|
4 |
|
|
|
5 |
.. currentmodule:: inmoose.edgepy |
|
|
6 |
|
|
|
7 |
This module is a partial port in Python of the R Bioconductor `edgeR package |
|
|
8 |
<https://bioconductor.org/packages/release/bioc/html/edgeR.html>`_. |
|
|
9 |
Only the functionalities necessary to :func:`inmoose.pycombat.pycombat_seq` and |
|
|
10 |
differential expression analysis have been ported so far. |
|
|
11 |
|
|
|
12 |
Differential Expression Analysis Example |
|
|
13 |
======================================== |
|
|
14 |
|
|
|
15 |
We give below an example of how to use :code:`edgepy` to perform a differential |
|
|
16 |
expression analysis on the pasilla dataset. |
|
|
17 |
|
|
|
18 |
.. repl:: |
|
|
19 |
from inmoose.data.pasilla import pasilla |
|
|
20 |
from inmoose.edgepy import DGEList, glmLRT, topTags |
|
|
21 |
from patsy import dmatrix |
|
|
22 |
|
|
|
23 |
# load the pasilla dataset as an AnnData |
|
|
24 |
pas = pasilla() |
|
|
25 |
|
|
|
26 |
# extract the count matrix and the annotation dataframe from the AnnData object |
|
|
27 |
counts = pas.X.T |
|
|
28 |
anno = pas.obs |
|
|
29 |
# build the design matrix |
|
|
30 |
design = dmatrix("~condition", data=anno) |
|
|
31 |
|
|
|
32 |
# build a DGEList object |
|
|
33 |
dge_list = DGEList(counts=counts, samples=anno, group_col="condition", genes=pas.var) |
|
|
34 |
# estimate the dispersions |
|
|
35 |
dge_list.estimateGLMCommonDisp(design=design) |
|
|
36 |
|
|
|
37 |
# fit the GLM |
|
|
38 |
fit = dge_list.glmFit(design=design) |
|
|
39 |
|
|
|
40 |
# run a differential expression analysis based on LRT |
|
|
41 |
lrt = glmLRT(fit) |
|
|
42 |
|
|
|
43 |
topTags(lrt) |
|
|
44 |
|
|
|
45 |
|
|
|
46 |
References |
|
|
47 |
========== |
|
|
48 |
|
|
|
49 |
.. [Chen2016] Y. Chen, A.T.L Lun, G.K. Smyth. 2016. From reads to genes to |
|
|
50 |
pathways: differential expression analysis of RNA-Seq experiments using |
|
|
51 |
Rsubread and the edgeR quasi-likelihood pipeline. *F1000Research* 5, 1438. |
|
|
52 |
:doi:`10.12688/f1000research.8987.2` |
|
|
53 |
|
|
|
54 |
.. [Gibbons1975] J.D. Gibbons, J.W. Pratt. 1975. P-values: interpretation and |
|
|
55 |
methodology. *The American Statistician* 29, 20-25. |
|
|
56 |
:doi:`10.1080/00031305.1975.10479106` |
|
|
57 |
|
|
|
58 |
.. [Lun2016] A.T.L. Lun, Y. Chen, G.K. Smyth. 2016. It's DE-licious: a recipe |
|
|
59 |
for differential expression analyses of RNA-seq experiments using |
|
|
60 |
quasi-likelihood methods in edgeR. *Methods in Molecular Biology* 1418, |
|
|
61 |
391-416. :doi:`10.1007/978-1-4939-3578-9_19` |
|
|
62 |
|
|
|
63 |
.. [Lund2012] S.P. Lund, D. Nettleton, D.J. McCarthy, G.K. Smyth. 2012. |
|
|
64 |
Detecting differential expression in RNA-sequence data using quasi-likelihood |
|
|
65 |
with shrunken dispersion estimates. *Statistical Applications in Genetics and |
|
|
66 |
Molecular Biology* Volume 11, Issue 5, Article 8. |
|
|
67 |
:doi:`10.1515/1544-6115.1826` |
|
|
68 |
|
|
|
69 |
.. [Lun2017] A.T.L. Lun, G.K. Smyth. 2017. No counts, no variance: allowing for |
|
|
70 |
loss of degrees of freedom when assessing biological variability from RNA-seq |
|
|
71 |
data. *Statistical Applications in Genetics and Molecular Biology* 16(2), |
|
|
72 |
83-93. :doi:`10.1515/sagmb-2017-0010` |
|
|
73 |
|
|
|
74 |
.. [McCarthy2012] D. J. McCarthy, Y. Chen, G. K. Smyth. 2012. Differential |
|
|
75 |
expression analysis of multifactor RNA-Seq experiments with respect to |
|
|
76 |
biological variation. Nucleic Acids Research 40, 4288-4297. |
|
|
77 |
:doi:`10.1093/nar/gks042` |
|
|
78 |
|
|
|
79 |
.. [Phipson2016] B. Phipson, S. Lee, I.J. Majewski, W. S. Alexander, G.K. Smyth. |
|
|
80 |
2016. Robust hyperparameter estimation protects against hypervariable genes |
|
|
81 |
and improves power to detect differential expression. *Annals of Applied |
|
|
82 |
Statistics* 10, 946-963. :doi:`10.1214/16-AOAS920` |
|
|
83 |
|
|
|
84 |
.. [Robinson2008] M.D. Robinson, g.K. Smyth. 2008. Small-sample estimation of |
|
|
85 |
negative binomial dispersion, with applications to SAGE data. |
|
|
86 |
*Biostatistics* 9, 321-332. :doi:`10.1093/biostatistics/kxm030` |
|
|
87 |
|
|
|
88 |
|
|
|
89 |
Code documentation |
|
|
90 |
================== |
|
|
91 |
|
|
|
92 |
.. autosummary:: |
|
|
93 |
:toctree: generated/ |
|
|
94 |
|
|
|
95 |
DGEList |
|
|
96 |
|
|
|
97 |
addPriorCount |
|
|
98 |
adjustedProfileLik |
|
|
99 |
aveLogCPM |
|
|
100 |
binomTest |
|
|
101 |
designAsFactor |
|
|
102 |
dispCoxReid |
|
|
103 |
dispCoxReidInterpolateTagwise |
|
|
104 |
estimateGLMCommonDisp |
|
|
105 |
estimateGLMTagwiseDisp |
|
|
106 |
exactTest |
|
|
107 |
exactTestBetaApprox |
|
|
108 |
exactTestByDeviance |
|
|
109 |
exactTestBySmallP |
|
|
110 |
exactTestDoubleTail |
|
|
111 |
glmFit |
|
|
112 |
glmLRT |
|
|
113 |
glmQLFit |
|
|
114 |
glmQLFTest |
|
|
115 |
mglmLevenberg |
|
|
116 |
mglmOneGroup |
|
|
117 |
mglmOneWay |
|
|
118 |
movingAverageByCol |
|
|
119 |
nbinomDeviance |
|
|
120 |
plotQLDisp |
|
|
121 |
predFC |
|
|
122 |
splitIntoGroups |
|
|
123 |
systematicSubset |
|
|
124 |
topTags |
|
|
125 |
validDGEList |