--- a +++ b/paper/paper.md @@ -0,0 +1,75 @@ +--- +title: 'biotmle: Targeted Learning for Biomarker Discovery' +tags: + - targeted learning + - variable importance + - causal inference + - bioinformatics + - genomics + - R +authors: + - name: Nima S. Hejazi + orcid: 0000-0002-7127-2789 + affiliation: 1 + - name: Weixin Cai + orcid: 0000-0003-2680-3066 + affiliation: 1 + - name: Alan E. Hubbard + orcid: 0000-0002-3769-0127 + affiliation: 1 +affiliations: + - name: Division of Biostatistics, University of California, Berkeley + index: 1 +date: 26 July 2017 +bibliography: paper.bib +--- + +# Summary + +The `biotmle` package provides an implementation of a biomarker discovery +methodology based on targeted minimum loss-based estimation (TMLE) +[@vdl2011targeted] and a generalization of the moderated t-statistic of +[@smyth2004linear], designed for use with biological sequencing data (e.g., +microarrays, RNA-seq). The statistical approach made available in this package +relies on the use of TMLE to rigorously evaluate the association between a set +of potential biomarkers and another variable of interest while adjusting for +potential confounding from another set of user-specified covariates. The +implementation is in the form of a package for the R language for statistical +computing [@R]. + +There are two principal ways in which the biomarker discovery techniques in +the `biotmle` R package can be used: to evaluate the association between (1) a +phenotypic measure (say, environmental exposure) and a biomarker of interest, +and (2) an outcome of interest (e.g., survival status at a given time) and a +biomarker measurement, both while controlling for background covariates (e.g., +BMI, age). By using an estimation procedure based on TMLE, the package produces +results based on the Average Treatment Effect (ATE), a statistical parameter +with a well-studied causal interpretation (see @vdl2011targeted for extended +discussions), making the `biotmle` R package well-suited for applications in +bioinformatics, epidemiology, and genomics. + +After adjusting our data set to be consistent with the expect input format -- +please consult the vignette accompanying the R package for details -- we would +call the principal function of this R package: `biomarkertmle`. + +We would perform a moderated test on the output of the `biomarkertmle` function +using the function `modtest_ic`. + +While the principal table of results produced by this R package matches those +produced by the well-known `limma` R package [@smyth2005limma], there are also +several plot methods made available for the `bioTMLE` S4 class -- subclassed +from the popular `SummarizedExperiment` class -- introduced by this package +[@huber2015orchestrating]. For illustrative purposes, we demonstrate the ouput +of two such functions on anonymized experimental data below: + + + + + +\newpage + +# References +