Diff of /docs/source/limma.rst [000000] .. [ea0fd6]

Switch to side-by-side view

--- a
+++ b/docs/source/limma.rst
@@ -0,0 +1,292 @@
+=====
+limma
+=====
+
+.. currentmodule:: inmoose.limma
+
+This module is a partial port in Python of the R Bioconductor `limma package
+<https://bioconductor.org/packages/release/bioc/html/limma.html>`_.
+
+Introduction
+============
+
+:mod:`limma` is a package for the analysis of gene expression microarray data,
+especially the use of linear models for analysing designed experiments and the
+assessment of differential expression. :mod:`limma` provides the ability to
+analyse comparisons between many RNA targets simultaneously in arbitrarily
+complicated designed experiments. Empirical Bayesian methods are used to provide
+stable results even when the number of arrays is small. The linear model and
+differential expression functions apply to all gene expression technologies,
+including microarrays, RNA-Seq and quantitative PCR.
+
+There are three types of documentation available:
+
+  1. the *LIMMA User's Guide* can be reached through the "User Guides and
+     Package Vignettes" links at the top of the LIMMA contents page.
+  2. an overview of limma functions grouped by purpose is contained in the
+     chapters of the present page
+  3. the Code documentation section gives an alphabetical index of detailed help
+     topics
+
+Classes Defined in this Module
+==============================
+
+This module defines the following data classes:
+
+- :class:`MArrayLM`: store the result of fitting gene-wise linear models to the
+  normalized intensities or log-ratios. Usually created by :func:`lmFit`.
+  Objects of this class normally contain only one row for each unique probe.
+- :class:`TestResults`: store the results of testing a set of contrasts equal to
+  zero for each probe. Usually created by :func:`decideTests`. Objects of this
+  class normally contain one row for each unique probe.
+
+.. _linearmodels:
+
+Linear Models for Microarrays
+=============================
+
+This section gives an overview of the LIMMA functions available to fit linear
+models and to interpret the results. This section covers models for two color
+arrays in terms of log-ratios or for single-channel arrays in terms of
+log-intensities. If you wish to fit models to the individual channel
+log-intensities from two color arrays, see :ref:`singlechannel`.
+
+The core of this module is the fitting of gene-wise linear models to microarray
+data. The basic idea is to estimate log-ratios between two or more target RNA
+samples simultaneously. See the LIMMA User's Guide for several case studies.
+
+Fitting Models
+--------------
+
+The main function for model fitting is :func:`lmFit`. This is the recommended
+interface for most users. :func:`lmFit` produces a fitted model object of class
+:class:`MArrayLM` containing coefficients, standard errors and residual standard
+errors for each gene. :func:`lmFit` calls one of the following three functions
+to do the actual computations:
+
+- :func:`lm_series` Straightforward least squares fitting of a linear model for
+  each gene.
+- :func:`mrlm` An alternative to :func:`lm_series` using robust regression as
+  implemented by the :code:`rlm` function in the MASS package.
+- :func:`gls_series` Generalized least squares taking into account correlations
+  between duplicate spots (*i.e.* replicate spots on the same array) or related
+  arrays. The function :func:`duplicateCorrelation` is used to estimate the
+  inter-duplicate or inter-block correlation before using :func:`gls_series`.
+
+All the function which fit linear models use :func:`getEAWP` to extract data
+from microarray data objects, and :func:`unwrapdups` which provides a unified
+method for handling duplicate spots.
+
+Forming the Design Matrix
+-------------------------
+
+:func:`lmFit` has two main arguments: the expression data and the design matrix.
+The design matrix is essentially an indicator matrix which specifies which
+target RNA samples were applied to each channel on each array. There is
+considerable freedom in choosing the design matrix -- there is always more than
+one choice which is correct provided it is interpreted correctly.
+
+Design matrices for Affymetrix or single-color arrays can be created using the
+function :func:`patsy.dmatrix`. The function :func:`modelMatrix` is provided to
+assist with the creation of an appropriate design matrix for two-color
+microarray experiments. For direct two-color designs, without a common
+reference, the design matrix often needs to be created by hand.
+
+Making Comparisons of Interest
+------------------------------
+
+Once a linear model has been fit using an appropriate design matrix, the
+function :func:`makeContrasts` may be used to form a contrast matrix to make
+comparisons of interest. The fit and the contrast matrix are used by
+:func:`contrasts_fit` to compute fold changes and *t*-statistics for the
+contrasts of interest. This is a way to compute all possible pairwise
+comparisons between treatments, for example in an experiment which compares many
+treatments to a common reference.
+
+Assessing Differential Expression
+---------------------------------
+
+After fitting a linear model, the standard errors are moderated using a simple
+empirical Bayes model using :func:`eBayes` or :func:`treat`. A moderated
+*t*-statistic and a log-odds of differential expression is computed for each
+contrast for each gene. :func:`treat` tests whether log-fold-changes are greater
+than a threshold rather than merely different to zero.
+
+:func:`eBayes` and :func:`treat` use internal functions :func:`squeezeVar`,
+:func:`fitFDist`, :func:`tmixture_matrix` and :func:`tmixture_vector`.
+
+Summarizing Model Fits
+----------------------
+
+After the above steps, the results may be displayed or further processed using:
+
+- :func:`topTable` Presents a list of the genes most likely to be differentially
+  expressed for a given contrast or set of contrasts.
+- :func:`topTableF` Presents a list of the genes most likely to be
+  differentially expressed for a given set of contrasts. Equivalent to
+  :func:`topTable` with :code:`coef` set to all the coefficients,
+  :code:`coef=range(fit.shape[1])`.
+- :func:`volcanoplot` Volcano plot of fold change versus the *B*-statistic for
+  any fitted coefficient.
+- :func:`plotlines` Plots fitted coefficients or log-intensity values for
+  time-course data.
+- :func:`genas` Estimates biological correlation between two coefficients.
+- :func:`write_fit` Writes a :class:`MArrayLM` object to a file. Note that if
+  :code:`fit` is a :class:`MArrayLM` object, either :func:`write_fit` or
+  :func:`write_table` can be used to write the results to a delimited text file.
+
+For multiple testing functions which operate on linear model fits, see
+:ref:`tests`.
+
+Model Selection
+---------------
+
+:func:`selectModel` provides a means to choose between alternative linear models
+using AIC or BIC information criteria.
+
+
+.. _singlechannel:
+
+Individual Channel Analysis of Two-Color Microarrays
+====================================================
+
+This section gives an overview of the LIMMA functions fit linear models to
+two-color microarray data in terms of the log-intensities rather than
+log-ratios.
+
+The function :func:`intrapotCorrelation` estimates the intra-spot correlation
+between the two channels. The regression function :func:`lmscFit` takes the
+correlation as an argument and fits linear models to the two-color data in terms
+of the individual log-intensities. The output of :func:`lmscFit` is a
+:class:`MArrayLM` object just the same as from :func:`lmFit`, so inference
+proceeds the same way as for log-ratios once the linear model is fitted. See
+:ref:`linearmodels`.
+
+The function :func:`targetsA2C` converts two-color format target data frames to
+single channel format, *i.e.* converts from array-per-line to channel-per-line,
+to facilitate the formulation of the design matrix.
+
+.. _tests:
+
+Hypothesis Testing for Linear Models
+====================================
+
+LIMMA provides a number of functions for multiple testing across both contrasts
+and genes. The starting point is a :class:`MArrayLM` object, called :code:`fit`
+say, resulting from fitting a linear model and running :func:`eBayes` and,
+optionally, :func:`contrasts_fit`. See :ref:`linearmodels` or
+:ref:`singlechannel` for details.
+
+Multiple Testing across Genes and Contrasts
+-------------------------------------------
+
+The key function is :func:`decideTests`. This function writes an object of class
+:class:`TestResults`, which is basically a matrix of :math:`-1`, :math:`0` or
+:math:`1` elements, of the same dimension as :code:`fit_coefficients`,
+indicating whether each coefficient is significantly different from zero. A
+number of multiple testing strategies are provided. :func:`decideTests` calls
+:func:`classifyTestsF` to implement the nested *F*-test strategy.
+
+:func:`selectModel` chooses between linear models for each probe using AIC or
+BIC criteria. This is an alternative to hypothesis testing and can choose
+between non-nested models.
+
+A number of other functions are provided to display the results of
+:func:`decideTests`. The function :func:`heatDiagram` displays the results in a
+heat-map style display. This allows visual comparison of the results across many
+different conditions in the linear model.
+
+The functions :func:`vennCounts` and :func:`vennDiagram` provide Venn diagrams
+style summaries of the results.
+
+Summary and :func:`show` method exists for objects of class
+:class:`TestResults`.
+
+The results from :func:`decideTests` can also be included when the results of a
+linear model fit are written to a file using :func:`write_fit`.
+
+..
+  Gene Set Tests
+  --------------
+
+  Competitive gene set testing for an individual gene set is provided by
+  :func:`wilcoxGST` or :func:`geneSetTest`, which permute genes. The gene set can
+  be displayed using :func:`barcodeplot`.
+
+  Self-contained gene set testing for an individual set is provided by
+  :func:`roast`, which uses rotation technology, analogous to permuting arrays.
+
+  Gene set enrichment analysis for a large database of gene sets is provided by
+  :func:`romer`. :func:`topRomer` is used to rank results from :func:`romer`.
+
+  The functions :func:`alias2Symbol`, :func:`alias2SymbolTable` and
+  :func:`alias2SymbolUsingNCBI` are provided to help match gene sets with
+  microarray probes by way of official gene symbols.
+
+  Global Tests
+  ------------
+
+  The function :func:`genas` can test for associations between two contrasts in a
+  linear model.
+
+  Given a set of *p*-values, the function :func:`propTrueNull` can be used to
+  estimate the proportion of true null hypothesis.
+
+  When evaluating test procedures with simulated or known results, the utility
+  function :func:`auROC` can be used to compute the area under the Receiver
+  Operating Curve for the test results for a given probe.
+
+
+References
+==========
+
+.. [Law2014] C.W. Law, Y. Chen, W. Shi, G.K. Smyth. 2014. Voom: precision
+   weights unlock linear model analysis tools for RNA-seq read counts. *Genome
+   Biology* 15(R29). :doi:`10.1186/gb-2014-15-2-r29`
+
+.. [Loennstedt2002] I. Loennstedt, T.P. Speed. 2002. Replicated microarray data.
+   *Statistica Sinica* 12(31-46).
+
+.. [Michaud2008] J. Michaud, K.M. Simpson, R. Escher, K. Buchet-Poyau, R.
+   Beissbarth, C. Carmichael, M.E. Ritchie, F. Schutz, P. Cannon, M. Liu, X.
+   Shen, Y. Ito, W.H. Raskind, M.S. Horwitz, M. Osato, D.R. Turner, T.P. Speed,
+   M. Kavallaris, G.K. Smyth, H.S. Scott. 2008. Integrative analysis of RUNX1
+   downstream pathways and target genes. *BMC Genomics* 9(363).
+   :doi:`10.1186/1471-2164-9-363`
+
+.. [Ritchie2015] M.E. Ritchie, B. Phipson, D. Wu, Y. Hu, C.W. Law, W. Shi, G.K.
+   Smyth. 2015. Limma powers differential expression analyses for RNA-sequencing
+   and microarray studies. *Nucleic Acids Research* 43(7).
+   :doi:`10.1093/nar/gkv007`
+
+.. [Sartor2006] M.A. Sartor, C.R. Tomlinson, S.C. Wesselkamper, S. Sivaganesan,
+   G.D. Leikauf, M. Medvedovic. 2006. Intensity-based hierarchical Bayes method
+   improves testing for differentially expressed genes in microarray
+   experiments. *BMC Bioinformatics* 7(538). :doi:`10.1186/1471-2105-7-538`
+
+.. [Smyth2004] G.K. Smyth. 2004. Linear models and empirical Bayes methods for
+   assessing differential expression in microarray experiments. *Statistical
+   Applications in Genetics and Molecular Biology* 3(1).
+   :doi:`10.2202/1544-6115.1027`
+
+
+Code documentation
+==================
+
+.. autosummary::
+   :toctree: generated/
+
+   MArrayLM
+   TestResults
+
+   classifyTestsF
+   contrasts_fit
+   eBayes
+   fitFDist
+   lmFit
+   lm_series
+   makeContrasts
+   squeezeVar
+   topTable
+   tmixture_matrix
+   tmixture_vector