[ea0fd6]: / docs / source / limma.rst

Download this file

293 lines (223 with data), 12.5 kB

limma

This module is a partial port in Python of the R Bioconductor limma package.

Introduction

:mod:`limma` is a package for the analysis of gene expression microarray data, especially the use of linear models for analysing designed experiments and the assessment of differential expression. :mod:`limma` provides the ability to analyse comparisons between many RNA targets simultaneously in arbitrarily complicated designed experiments. Empirical Bayesian methods are used to provide stable results even when the number of arrays is small. The linear model and differential expression functions apply to all gene expression technologies, including microarrays, RNA-Seq and quantitative PCR.

There are three types of documentation available:

  1. the LIMMA User's Guide can be reached through the "User Guides and Package Vignettes" links at the top of the LIMMA contents page.
  2. an overview of limma functions grouped by purpose is contained in the chapters of the present page
  3. the Code documentation section gives an alphabetical index of detailed help topics

Classes Defined in this Module

This module defines the following data classes:

  • :class:`MArrayLM`: store the result of fitting gene-wise linear models to the normalized intensities or log-ratios. Usually created by :func:`lmFit`. Objects of this class normally contain only one row for each unique probe.
  • :class:`TestResults`: store the results of testing a set of contrasts equal to zero for each probe. Usually created by :func:`decideTests`. Objects of this class normally contain one row for each unique probe.

Linear Models for Microarrays

This section gives an overview of the LIMMA functions available to fit linear models and to interpret the results. This section covers models for two color arrays in terms of log-ratios or for single-channel arrays in terms of log-intensities. If you wish to fit models to the individual channel log-intensities from two color arrays, see :ref:`singlechannel`.

The core of this module is the fitting of gene-wise linear models to microarray data. The basic idea is to estimate log-ratios between two or more target RNA samples simultaneously. See the LIMMA User's Guide for several case studies.

Fitting Models

The main function for model fitting is :func:`lmFit`. This is the recommended interface for most users. :func:`lmFit` produces a fitted model object of class :class:`MArrayLM` containing coefficients, standard errors and residual standard errors for each gene. :func:`lmFit` calls one of the following three functions to do the actual computations:

  • :func:`lm_series` Straightforward least squares fitting of a linear model for each gene.
  • :func:`mrlm` An alternative to :func:`lm_series` using robust regression as implemented by the rlm function in the MASS package.
  • :func:`gls_series` Generalized least squares taking into account correlations between duplicate spots (i.e. replicate spots on the same array) or related arrays. The function :func:`duplicateCorrelation` is used to estimate the inter-duplicate or inter-block correlation before using :func:`gls_series`.

All the function which fit linear models use :func:`getEAWP` to extract data from microarray data objects, and :func:`unwrapdups` which provides a unified method for handling duplicate spots.

Forming the Design Matrix

:func:`lmFit` has two main arguments: the expression data and the design matrix. The design matrix is essentially an indicator matrix which specifies which target RNA samples were applied to each channel on each array. There is considerable freedom in choosing the design matrix -- there is always more than one choice which is correct provided it is interpreted correctly.

Design matrices for Affymetrix or single-color arrays can be created using the function :func:`patsy.dmatrix`. The function :func:`modelMatrix` is provided to assist with the creation of an appropriate design matrix for two-color microarray experiments. For direct two-color designs, without a common reference, the design matrix often needs to be created by hand.

Making Comparisons of Interest

Once a linear model has been fit using an appropriate design matrix, the function :func:`makeContrasts` may be used to form a contrast matrix to make comparisons of interest. The fit and the contrast matrix are used by :func:`contrasts_fit` to compute fold changes and t-statistics for the contrasts of interest. This is a way to compute all possible pairwise comparisons between treatments, for example in an experiment which compares many treatments to a common reference.

Assessing Differential Expression

After fitting a linear model, the standard errors are moderated using a simple empirical Bayes model using :func:`eBayes` or :func:`treat`. A moderated t-statistic and a log-odds of differential expression is computed for each contrast for each gene. :func:`treat` tests whether log-fold-changes are greater than a threshold rather than merely different to zero.

:func:`eBayes` and :func:`treat` use internal functions :func:`squeezeVar`, :func:`fitFDist`, :func:`tmixture_matrix` and :func:`tmixture_vector`.

Summarizing Model Fits

After the above steps, the results may be displayed or further processed using:

  • :func:`topTable` Presents a list of the genes most likely to be differentially expressed for a given contrast or set of contrasts.
  • :func:`topTableF` Presents a list of the genes most likely to be differentially expressed for a given set of contrasts. Equivalent to :func:`topTable` with coef set to all the coefficients, coef=range(fit.shape[1]).
  • :func:`volcanoplot` Volcano plot of fold change versus the B-statistic for any fitted coefficient.
  • :func:`plotlines` Plots fitted coefficients or log-intensity values for time-course data.
  • :func:`genas` Estimates biological correlation between two coefficients.
  • :func:`write_fit` Writes a :class:`MArrayLM` object to a file. Note that if fit is a :class:`MArrayLM` object, either :func:`write_fit` or :func:`write_table` can be used to write the results to a delimited text file.

For multiple testing functions which operate on linear model fits, see :ref:`tests`.

Model Selection

:func:`selectModel` provides a means to choose between alternative linear models using AIC or BIC information criteria.

Individual Channel Analysis of Two-Color Microarrays

This section gives an overview of the LIMMA functions fit linear models to two-color microarray data in terms of the log-intensities rather than log-ratios.

The function :func:`intrapotCorrelation` estimates the intra-spot correlation between the two channels. The regression function :func:`lmscFit` takes the correlation as an argument and fits linear models to the two-color data in terms of the individual log-intensities. The output of :func:`lmscFit` is a :class:`MArrayLM` object just the same as from :func:`lmFit`, so inference proceeds the same way as for log-ratios once the linear model is fitted. See :ref:`linearmodels`.

The function :func:`targetsA2C` converts two-color format target data frames to single channel format, i.e. converts from array-per-line to channel-per-line, to facilitate the formulation of the design matrix.

Hypothesis Testing for Linear Models

LIMMA provides a number of functions for multiple testing across both contrasts and genes. The starting point is a :class:`MArrayLM` object, called fit say, resulting from fitting a linear model and running :func:`eBayes` and, optionally, :func:`contrasts_fit`. See :ref:`linearmodels` or :ref:`singlechannel` for details.

Multiple Testing across Genes and Contrasts

The key function is :func:`decideTests`. This function writes an object of class :class:`TestResults`, which is basically a matrix of  − 1, 0 or 1 elements, of the same dimension as fit_coefficients, indicating whether each coefficient is significantly different from zero. A number of multiple testing strategies are provided. :func:`decideTests` calls :func:`classifyTestsF` to implement the nested F-test strategy.

:func:`selectModel` chooses between linear models for each probe using AIC or BIC criteria. This is an alternative to hypothesis testing and can choose between non-nested models.

A number of other functions are provided to display the results of :func:`decideTests`. The function :func:`heatDiagram` displays the results in a heat-map style display. This allows visual comparison of the results across many different conditions in the linear model.

The functions :func:`vennCounts` and :func:`vennDiagram` provide Venn diagrams style summaries of the results.

Summary and :func:`show` method exists for objects of class :class:`TestResults`.

The results from :func:`decideTests` can also be included when the results of a linear model fit are written to a file using :func:`write_fit`.

References

[Law2014]C.W. Law, Y. Chen, W. Shi, G.K. Smyth. 2014. Voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology 15(R29). :doi:`10.1186/gb-2014-15-2-r29`
[Loennstedt2002]I. Loennstedt, T.P. Speed. 2002. Replicated microarray data. Statistica Sinica 12(31-46).
[Michaud2008]J. Michaud, K.M. Simpson, R. Escher, K. Buchet-Poyau, R. Beissbarth, C. Carmichael, M.E. Ritchie, F. Schutz, P. Cannon, M. Liu, X. Shen, Y. Ito, W.H. Raskind, M.S. Horwitz, M. Osato, D.R. Turner, T.P. Speed, M. Kavallaris, G.K. Smyth, H.S. Scott. 2008. Integrative analysis of RUNX1 downstream pathways and target genes. BMC Genomics 9(363). :doi:`10.1186/1471-2164-9-363`
[Ritchie2015]M.E. Ritchie, B. Phipson, D. Wu, Y. Hu, C.W. Law, W. Shi, G.K. Smyth. 2015. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research 43(7). :doi:`10.1093/nar/gkv007`
[Sartor2006]M.A. Sartor, C.R. Tomlinson, S.C. Wesselkamper, S. Sivaganesan, G.D. Leikauf, M. Medvedovic. 2006. Intensity-based hierarchical Bayes method improves testing for differentially expressed genes in microarray experiments. BMC Bioinformatics 7(538). :doi:`10.1186/1471-2105-7-538`
[Smyth2004]G.K. Smyth. 2004. Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology 3(1). :doi:`10.2202/1544-6115.1027`

Code documentation