Diff of /docs/source/limma.rst [000000] .. [ea0fd6]

Switch to unified view

a b/docs/source/limma.rst
1
=====
2
limma
3
=====
4
5
.. currentmodule:: inmoose.limma
6
7
This module is a partial port in Python of the R Bioconductor `limma package
8
<https://bioconductor.org/packages/release/bioc/html/limma.html>`_.
9
10
Introduction
11
============
12
13
:mod:`limma` is a package for the analysis of gene expression microarray data,
14
especially the use of linear models for analysing designed experiments and the
15
assessment of differential expression. :mod:`limma` provides the ability to
16
analyse comparisons between many RNA targets simultaneously in arbitrarily
17
complicated designed experiments. Empirical Bayesian methods are used to provide
18
stable results even when the number of arrays is small. The linear model and
19
differential expression functions apply to all gene expression technologies,
20
including microarrays, RNA-Seq and quantitative PCR.
21
22
There are three types of documentation available:
23
24
  1. the *LIMMA User's Guide* can be reached through the "User Guides and
25
     Package Vignettes" links at the top of the LIMMA contents page.
26
  2. an overview of limma functions grouped by purpose is contained in the
27
     chapters of the present page
28
  3. the Code documentation section gives an alphabetical index of detailed help
29
     topics
30
31
Classes Defined in this Module
32
==============================
33
34
This module defines the following data classes:
35
36
- :class:`MArrayLM`: store the result of fitting gene-wise linear models to the
37
  normalized intensities or log-ratios. Usually created by :func:`lmFit`.
38
  Objects of this class normally contain only one row for each unique probe.
39
- :class:`TestResults`: store the results of testing a set of contrasts equal to
40
  zero for each probe. Usually created by :func:`decideTests`. Objects of this
41
  class normally contain one row for each unique probe.
42
43
.. _linearmodels:
44
45
Linear Models for Microarrays
46
=============================
47
48
This section gives an overview of the LIMMA functions available to fit linear
49
models and to interpret the results. This section covers models for two color
50
arrays in terms of log-ratios or for single-channel arrays in terms of
51
log-intensities. If you wish to fit models to the individual channel
52
log-intensities from two color arrays, see :ref:`singlechannel`.
53
54
The core of this module is the fitting of gene-wise linear models to microarray
55
data. The basic idea is to estimate log-ratios between two or more target RNA
56
samples simultaneously. See the LIMMA User's Guide for several case studies.
57
58
Fitting Models
59
--------------
60
61
The main function for model fitting is :func:`lmFit`. This is the recommended
62
interface for most users. :func:`lmFit` produces a fitted model object of class
63
:class:`MArrayLM` containing coefficients, standard errors and residual standard
64
errors for each gene. :func:`lmFit` calls one of the following three functions
65
to do the actual computations:
66
67
- :func:`lm_series` Straightforward least squares fitting of a linear model for
68
  each gene.
69
- :func:`mrlm` An alternative to :func:`lm_series` using robust regression as
70
  implemented by the :code:`rlm` function in the MASS package.
71
- :func:`gls_series` Generalized least squares taking into account correlations
72
  between duplicate spots (*i.e.* replicate spots on the same array) or related
73
  arrays. The function :func:`duplicateCorrelation` is used to estimate the
74
  inter-duplicate or inter-block correlation before using :func:`gls_series`.
75
76
All the function which fit linear models use :func:`getEAWP` to extract data
77
from microarray data objects, and :func:`unwrapdups` which provides a unified
78
method for handling duplicate spots.
79
80
Forming the Design Matrix
81
-------------------------
82
83
:func:`lmFit` has two main arguments: the expression data and the design matrix.
84
The design matrix is essentially an indicator matrix which specifies which
85
target RNA samples were applied to each channel on each array. There is
86
considerable freedom in choosing the design matrix -- there is always more than
87
one choice which is correct provided it is interpreted correctly.
88
89
Design matrices for Affymetrix or single-color arrays can be created using the
90
function :func:`patsy.dmatrix`. The function :func:`modelMatrix` is provided to
91
assist with the creation of an appropriate design matrix for two-color
92
microarray experiments. For direct two-color designs, without a common
93
reference, the design matrix often needs to be created by hand.
94
95
Making Comparisons of Interest
96
------------------------------
97
98
Once a linear model has been fit using an appropriate design matrix, the
99
function :func:`makeContrasts` may be used to form a contrast matrix to make
100
comparisons of interest. The fit and the contrast matrix are used by
101
:func:`contrasts_fit` to compute fold changes and *t*-statistics for the
102
contrasts of interest. This is a way to compute all possible pairwise
103
comparisons between treatments, for example in an experiment which compares many
104
treatments to a common reference.
105
106
Assessing Differential Expression
107
---------------------------------
108
109
After fitting a linear model, the standard errors are moderated using a simple
110
empirical Bayes model using :func:`eBayes` or :func:`treat`. A moderated
111
*t*-statistic and a log-odds of differential expression is computed for each
112
contrast for each gene. :func:`treat` tests whether log-fold-changes are greater
113
than a threshold rather than merely different to zero.
114
115
:func:`eBayes` and :func:`treat` use internal functions :func:`squeezeVar`,
116
:func:`fitFDist`, :func:`tmixture_matrix` and :func:`tmixture_vector`.
117
118
Summarizing Model Fits
119
----------------------
120
121
After the above steps, the results may be displayed or further processed using:
122
123
- :func:`topTable` Presents a list of the genes most likely to be differentially
124
  expressed for a given contrast or set of contrasts.
125
- :func:`topTableF` Presents a list of the genes most likely to be
126
  differentially expressed for a given set of contrasts. Equivalent to
127
  :func:`topTable` with :code:`coef` set to all the coefficients,
128
  :code:`coef=range(fit.shape[1])`.
129
- :func:`volcanoplot` Volcano plot of fold change versus the *B*-statistic for
130
  any fitted coefficient.
131
- :func:`plotlines` Plots fitted coefficients or log-intensity values for
132
  time-course data.
133
- :func:`genas` Estimates biological correlation between two coefficients.
134
- :func:`write_fit` Writes a :class:`MArrayLM` object to a file. Note that if
135
  :code:`fit` is a :class:`MArrayLM` object, either :func:`write_fit` or
136
  :func:`write_table` can be used to write the results to a delimited text file.
137
138
For multiple testing functions which operate on linear model fits, see
139
:ref:`tests`.
140
141
Model Selection
142
---------------
143
144
:func:`selectModel` provides a means to choose between alternative linear models
145
using AIC or BIC information criteria.
146
147
148
.. _singlechannel:
149
150
Individual Channel Analysis of Two-Color Microarrays
151
====================================================
152
153
This section gives an overview of the LIMMA functions fit linear models to
154
two-color microarray data in terms of the log-intensities rather than
155
log-ratios.
156
157
The function :func:`intrapotCorrelation` estimates the intra-spot correlation
158
between the two channels. The regression function :func:`lmscFit` takes the
159
correlation as an argument and fits linear models to the two-color data in terms
160
of the individual log-intensities. The output of :func:`lmscFit` is a
161
:class:`MArrayLM` object just the same as from :func:`lmFit`, so inference
162
proceeds the same way as for log-ratios once the linear model is fitted. See
163
:ref:`linearmodels`.
164
165
The function :func:`targetsA2C` converts two-color format target data frames to
166
single channel format, *i.e.* converts from array-per-line to channel-per-line,
167
to facilitate the formulation of the design matrix.
168
169
.. _tests:
170
171
Hypothesis Testing for Linear Models
172
====================================
173
174
LIMMA provides a number of functions for multiple testing across both contrasts
175
and genes. The starting point is a :class:`MArrayLM` object, called :code:`fit`
176
say, resulting from fitting a linear model and running :func:`eBayes` and,
177
optionally, :func:`contrasts_fit`. See :ref:`linearmodels` or
178
:ref:`singlechannel` for details.
179
180
Multiple Testing across Genes and Contrasts
181
-------------------------------------------
182
183
The key function is :func:`decideTests`. This function writes an object of class
184
:class:`TestResults`, which is basically a matrix of :math:`-1`, :math:`0` or
185
:math:`1` elements, of the same dimension as :code:`fit_coefficients`,
186
indicating whether each coefficient is significantly different from zero. A
187
number of multiple testing strategies are provided. :func:`decideTests` calls
188
:func:`classifyTestsF` to implement the nested *F*-test strategy.
189
190
:func:`selectModel` chooses between linear models for each probe using AIC or
191
BIC criteria. This is an alternative to hypothesis testing and can choose
192
between non-nested models.
193
194
A number of other functions are provided to display the results of
195
:func:`decideTests`. The function :func:`heatDiagram` displays the results in a
196
heat-map style display. This allows visual comparison of the results across many
197
different conditions in the linear model.
198
199
The functions :func:`vennCounts` and :func:`vennDiagram` provide Venn diagrams
200
style summaries of the results.
201
202
Summary and :func:`show` method exists for objects of class
203
:class:`TestResults`.
204
205
The results from :func:`decideTests` can also be included when the results of a
206
linear model fit are written to a file using :func:`write_fit`.
207
208
..
209
  Gene Set Tests
210
  --------------
211
212
  Competitive gene set testing for an individual gene set is provided by
213
  :func:`wilcoxGST` or :func:`geneSetTest`, which permute genes. The gene set can
214
  be displayed using :func:`barcodeplot`.
215
216
  Self-contained gene set testing for an individual set is provided by
217
  :func:`roast`, which uses rotation technology, analogous to permuting arrays.
218
219
  Gene set enrichment analysis for a large database of gene sets is provided by
220
  :func:`romer`. :func:`topRomer` is used to rank results from :func:`romer`.
221
222
  The functions :func:`alias2Symbol`, :func:`alias2SymbolTable` and
223
  :func:`alias2SymbolUsingNCBI` are provided to help match gene sets with
224
  microarray probes by way of official gene symbols.
225
226
  Global Tests
227
  ------------
228
229
  The function :func:`genas` can test for associations between two contrasts in a
230
  linear model.
231
232
  Given a set of *p*-values, the function :func:`propTrueNull` can be used to
233
  estimate the proportion of true null hypothesis.
234
235
  When evaluating test procedures with simulated or known results, the utility
236
  function :func:`auROC` can be used to compute the area under the Receiver
237
  Operating Curve for the test results for a given probe.
238
239
240
References
241
==========
242
243
.. [Law2014] C.W. Law, Y. Chen, W. Shi, G.K. Smyth. 2014. Voom: precision
244
   weights unlock linear model analysis tools for RNA-seq read counts. *Genome
245
   Biology* 15(R29). :doi:`10.1186/gb-2014-15-2-r29`
246
247
.. [Loennstedt2002] I. Loennstedt, T.P. Speed. 2002. Replicated microarray data.
248
   *Statistica Sinica* 12(31-46).
249
250
.. [Michaud2008] J. Michaud, K.M. Simpson, R. Escher, K. Buchet-Poyau, R.
251
   Beissbarth, C. Carmichael, M.E. Ritchie, F. Schutz, P. Cannon, M. Liu, X.
252
   Shen, Y. Ito, W.H. Raskind, M.S. Horwitz, M. Osato, D.R. Turner, T.P. Speed,
253
   M. Kavallaris, G.K. Smyth, H.S. Scott. 2008. Integrative analysis of RUNX1
254
   downstream pathways and target genes. *BMC Genomics* 9(363).
255
   :doi:`10.1186/1471-2164-9-363`
256
257
.. [Ritchie2015] M.E. Ritchie, B. Phipson, D. Wu, Y. Hu, C.W. Law, W. Shi, G.K.
258
   Smyth. 2015. Limma powers differential expression analyses for RNA-sequencing
259
   and microarray studies. *Nucleic Acids Research* 43(7).
260
   :doi:`10.1093/nar/gkv007`
261
262
.. [Sartor2006] M.A. Sartor, C.R. Tomlinson, S.C. Wesselkamper, S. Sivaganesan,
263
   G.D. Leikauf, M. Medvedovic. 2006. Intensity-based hierarchical Bayes method
264
   improves testing for differentially expressed genes in microarray
265
   experiments. *BMC Bioinformatics* 7(538). :doi:`10.1186/1471-2105-7-538`
266
267
.. [Smyth2004] G.K. Smyth. 2004. Linear models and empirical Bayes methods for
268
   assessing differential expression in microarray experiments. *Statistical
269
   Applications in Genetics and Molecular Biology* 3(1).
270
   :doi:`10.2202/1544-6115.1027`
271
272
273
Code documentation
274
==================
275
276
.. autosummary::
277
   :toctree: generated/
278
279
   MArrayLM
280
   TestResults
281
282
   classifyTestsF
283
   contrasts_fit
284
   eBayes
285
   fitFDist
286
   lmFit
287
   lm_series
288
   makeContrasts
289
   squeezeVar
290
   topTable
291
   tmixture_matrix
292
   tmixture_vector