--- a +++ b/README.md @@ -0,0 +1,255 @@ + +<!-- README.md is generated from README.Rmd. Please edit that file --> + +# R/`biotmle` + +[](https://github.com/nhejazi/biotmle/actions) +[](https://codecov.io/github/nhejazi/biotmle?branch=master) +[](http://www.repostatus.org/#active) +[](https://bioconductor.org/checkResults/release/bioc-LATEST/biotmle) +[](https://bioconductor.org/packages/release/bioc/html/biotmle.html) +[](https://bioconductor.org/packages/release/bioc/html/biotmle.html) +[](http://opensource.org/licenses/MIT) +[](https://zenodo.org/badge/latestdoi/65854775) +[](http://joss.theoj.org/papers/02be843d9bab1b598187bfbb08ce3949) + +> Targeted Learning with Moderated Statistics for Biomarker Discovery + +**Authors:** [Nima Hejazi](https://nimahejazi.org), [Mark van der +Laan](https://vanderlaan-lab.org/about), and [Alan +Hubbard](https://hubbard.berkeley.edu) + +----- + +## What’s `biotmle`? + +The `biotmle` R package facilitates biomarker discovery through a +generalization of the moderated t-statistic (Smyth 2004) that extends +the procedure to locally efficient estimators of asymptotically linear +target parameters (Tsiatis 2007). The set of methods implemented modify +targeted maximum likelihood (TML) estimators of statistical (or causal) +target parameters (e.g., average treatment effect) to apply variance +moderation to the standard variance estimator based on the efficient +influence function (EIF) of the target parameter (van der Laan and Rose +2011, 2018). By performing a moderated hypothesis test that pools the +individual probe-specific EIF-based variance estimates, a robust +variance estimator is constructed, which stabilizes the standard error +estimates and improves the performance of such estimators both in +smaller samples and in settings where the EIF is poorly estimated. The +resultant procedure allows for the construction of conservative +hypothesis tests that reduce the false discovery rate and/or the +family-wise error rate (Hejazi, van der Laan, and Hubbard 2021). +Improvements upon prior TML-based approaches to biomarker discovery +(e.g., Bembom et al. (2009)) include both the moderated variance +estimator as well as the use of conservative reference distributions for +the corresponding moderated test statistics (e.g., logistic +distribution), inspired by tail bounds based on concentration +inequalities (Rosenblum and van der Laan 2009); the latter prove +critical for obtaining robust inference when the finite-sample +distribution of the estimator deviates from normality. + +----- + +## Installation + +For standard use, install from +[Bioconductor](https://bioconductor.org/packages/biotmle) using +[`BiocManager`](https://CRAN.R-project.org/package=BiocManager): + +``` r +if (!requireNamespace("BiocManager", quietly=TRUE)) { + install.packages("BiocManager") +} +BiocManager::install("biotmle") +``` + +To contribute, install the bleeding-edge *development version* from +GitHub via [`remotes`](https://CRAN.R-project.org/package=remotes): + +``` r +remotes::install_github("nhejazi/biotmle") +``` + +Current and prior [Bioconductor](https://bioconductor.org) releases are +available under branches with numbers prefixed by “RELEASE\_”. For +example, to install the version of this package available via +Bioconductor 3.6, use + +``` r +remotes::install_github("nhejazi/biotmle", ref = "RELEASE_3_6") +``` + +----- + +## Example + +For details on how to best use the `biotmle` R package, please consult +the most recent [package +vignette](https://bioconductor.org/packages/release/bioc/vignettes/biotmle/inst/doc/exposureBiomarkers.html) +available through the [Bioconductor +project](https://bioconductor.org/packages/biotmle). + +----- + +## Issues + +If you encounter any bugs or have any specific feature requests, please +[file an issue](https://github.com/nhejazi/biotmle/issues). + +----- + +## Contributions + +Contributions are very welcome. Interested contributors should consult +our [contribution +guidelines](https://github.com/nhejazi/biotmle/blob/master/CONTRIBUTING.md) +prior to submitting a pull request. + +----- + +## Citation + +After using the `biotmle` R package, please cite both of the following: + +``` + @article{hejazi2017biotmle, + author = {Hejazi, Nima S and Cai, Weixin and Hubbard, Alan E}, + title = {biotmle: Targeted Learning for Biomarker Discovery}, + journal = {The Journal of Open Source Software}, + volume = {2}, + number = {15}, + month = {July}, + year = {2017}, + publisher = {The Open Journal}, + doi = {10.21105/joss.00295}, + url = {https://doi.org/10.21105/joss.00295} + } + + @article{hejazi2021generalization, + author = {Hejazi, Nima S and Boileau, Philippe and {van der Laan}, + Mark J and Hubbard, Alan E}, + title = {A generalization of moderated statistics to data adaptive + semiparametric estimation in high-dimensional biology}, + journal={under review}, + volume={}, + number={}, + pages={}, + year = {2021+}, + publisher={}, + doi = {}, + url = {https://arxiv.org/abs/1710.05451} + } + + @manual{hejazi2019biotmlebioc, + author = {Hejazi, Nima S and {van der Laan}, Mark J and Hubbard, Alan + E}, + title = {{biotmle}: {Targeted Learning} with moderated statistics for + biomarker discovery}, + doi = {10.18129/B9.bioc.biotmle}, + url = {https://bioconductor.org/packages/biotmle}, + note = {R package version 1.10.0} + } +``` + +----- + +## Related + + - [R/`biotmleData`](https://github.com/nhejazi/biotmleData) - R + package with example experimental data for use with this analysis + package. + +----- + +## Funding + +The development of this software was supported in part through grants +from the National Institutes of Health: [P42 +ES004705-29](https://projectreporter.nih.gov/project_info_details.cfm?aid=9260357&map=y) +and [R01 +ES021369-05](https://projectreporter.nih.gov/project_info_description.cfm?aid=9210551&icde=37849782&ddparam=&ddvalue=&ddsub=&cr=1&csb=default&cs=ASC&pball=). + +----- + +## License + +© 2016-2021 [Nima S. Hejazi](https://nimahejazi.org) + +The contents of this repository are distributed under the MIT license. +See file `LICENSE` for details. + +----- + +## References + +<div id="refs" class="references"> + +<div id="ref-bembom2009biomarker"> + +Bembom, Oliver, Maya L Petersen, Soo-Yon Rhee, W Jeffrey Fessel, Sandra +E Sinisi, Robert W Shafer, and Mark J van der Laan. 2009. “Biomarker +Discovery Using Targeted Maximum-Likelihood Estimation: Application to +the Treatment of Antiretroviral-Resistant Hiv Infection.” *Statistics in +Medicine* 28 (1): 152–72. + +</div> + +<div id="ref-hejazi2021generalization"> + +Hejazi, Nima S, Mark J van der Laan, and Alan E Hubbard. 2021. “A +Generalization of Moderated Statistics to Data Adaptive Semiparametric +Estimation in High-Dimensional Biology.” *Under Review*. +<https://arxiv.org/abs/1710.05451>. + +</div> + +<div id="ref-rosenblum2009confidence"> + +Rosenblum, Michael A, and Mark J van der Laan. 2009. “Confidence +Intervals for the Population Mean Tailored to Small Sample Sizes, with +Applications to Survey Sampling.” *The International Journal of +Biostatistics* 5 (1). + +</div> + +<div id="ref-smyth2004linear"> + +Smyth, Gordon K. 2004. “Linear Models and Empirical Bayes Methods for +Assessing Differential Expression in Microarray Experiments.” +*Statistical Applications in Genetics and Molecular Biology* 3 (1): +1–25. <https://doi.org/10.2202/1544-6115.1027>. + +</div> + +<div id="ref-tsiatis2007semiparametric"> + +Tsiatis, Anastasios. 2007. *Semiparametric Theory and Missing Data*. +Springer Science & Business Media. + +</div> + +<div id="ref-vdl2011targeted"> + +van der Laan, Mark J., and Sherri Rose. 2011. *Targeted Learning: Causal +Inference for Observational and Experimental Data*. Springer Science & +Business Media. + +</div> + +<div id="ref-vdl2018targeted"> + +van der Laan, Mark J, and Sherri Rose. 2018. *Targeted Learning in Data +Science: Causal Inference for Complex Longitudinal Studies*. Springer +Science & Business Media. + +</div> + +</div>