outbreaker2 / Git / Diff of /vignettes/introduction.Rmd

Models:

MarcoTheBlack/

outbreaker2

Downloads: 1

Diff of /vignettes/introduction.Rmd [000000] .. [dfe06d]

Switch to unified view

 b/vignettes/introduction.Rmd
+---
+title: "Introduction to outbreaker2"
+author: "Thibaut Jombart"
+date: "`r Sys.Date()`"
+output:
+   rmarkdown::html_vignette:
+     toc: true
+     toc_depth: 2
+vignette: >
+  %\VignetteIndexEntry{Introduction to outbreaker2}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+```{r, echo = FALSE}
+knitr::opts_chunk$set(
+  collapse = TRUE,
+  comment = "#>",
+  fig.width=8,
+  fig.height=5,
+  fig.path="figs-introduction/"
+)
+```
+This tutorial provides a worked example of outbreak reconstruction using
+*outbreaker2*. For installation guidelines, a general overview of the package's
+functionalities as well as other resources, see the 'overview' vignette:
+```{r, eval=FALSE}
+vignette("Overview", package = "outbreaker2")
+```
+We will be analysing a small simulated outbreak distributed with the package,
+`fake_outbreak`. This dataset contains simulated dates of onsets, partial
+contact tracing data and pathogen genome sequences for 30 cases:
+```{r, data}
+library(ape)
+library(outbreaker2)
+col <- "#6666cc"
+fake_outbreak
+```
+Here, we will use the dates of case isolation `$sample`, DNA sequences `$dna`, contact tracing data `$ctd` and the empirical distribution of the generation time `$w`, which can be visualised as:
+```{r, w}
+plot(fake_outbreak$w, type = "h", xlim = c(0, 5),
+     lwd = 30, col = col, lend = 2,
+     xlab = "Days after infection",
+     ylab = "p(new case)",
+     main = "Generation time distribution")
+```
+<br>
+# Running the analysis with defaults
+By default, *outbreaker2* uses the settings defined by `create_config()`; see
+the documentation of this function for details. Note that the main function of
+*outbreaker2* is called `outbreaker` (without number). The function's arguments are:
+```{r}
+args(outbreaker)
+```
+The only mandatory input really is the data. For most cases, customising the
+method will be done through `config` and the function `create_config()`, which
+creates default and alters settings such as prior parameters, length and rate of
+sampling from the MCMC, and definition of which parameters should be estimated
+('moved'). The last arguments of `outbreaker` are used to specify custom prior,
+likelihood, and movement functions, and are detailed in the '*Customisation*'
+vignette.
+Let us run the analysis with default settings:
+```{r, first_run, cache = TRUE}
+dna <- fake_outbreak$dna
+dates <- fake_outbreak$sample
+ctd <- fake_outbreak$ctd
+w <- fake_outbreak$w
+data <- outbreaker_data(dna = dna, dates = dates, ctd = ctd, w_dens = w)
+## we set the seed to ensure results won't change
+set.seed(1)
+res <- outbreaker(data = data)
+```
+This analysis will take around 40 seconds on a modern computer. Note that
+*outbreaker2* is slower than *outbreaker* for the same number of iterations, but
+the two implementations are actually different. In particular, *outbreaker2*
+performs many more moves than the original package for each iteration of the
+MCMC, resulting in more efficient mixing. In short: *outbreaker2* is slower, but
+it requires far less iterations.
+Results are stored in a `data.frame` with the special class `outbreaker_chains`:
+```{r}
+class(res)
+dim(res)
+res
+```
+Each row of `res` contains a sample from the MCMC. For each, informations about
+the step (iteration of the MCMC), log-values of posterior, likelihood and
+priors, and all parameters and augmented data are returned. Ancestries
+(i.e. indices of the most recent ancestral case for a given case), are indicated
+by `alpha_[index of the case]`, dates of infections by `t_inf_[index of the
+case]`, and number of generations between cases and their infector / ancestor by
+`kappa_[index of the case]`:
+```{r}
+names(res)
+```
+<br>
+# Analysing the results
+## Graphics
+Results can be visualised using `plot`, which has several options and can be
+used to derive various kinds of graphics (see `?plot.outbreaker_chains`).  The
+basic plot shows the trace of the log-posterior values, which is useful to
+assess mixing:
+```{r, basic_trace}
+plot(res)
+```
+The second argument of `plot` can be used to visualise traces of any
+other column in `res`:
+```{r, traces}
+plot(res, "prior")
+plot(res, "mu")
+plot(res, "t_inf_15")
+```
+`burnin` can be used to discard the first iterations prior to mixing:
+```{r, basic_trace_burn}
+## compare this to plot(res)
+plot(res, burnin = 2000)
+```
+`type` indicates the type of graphic to plot; roughly:
+- `trace` for traces of the MCMC (default)
+- `hist`, `density` to assess distributions of quantitative values
+- `alpha`, `network` to visualise ancestries / transmission tree; note that
+  `network` opens up an interactive plot and requires a web browser with
+  Javascript enabled; the argument `min_support` is useful to select only the
+  most supported ancestries and avoid displaying too many links
+- `kappa` to visualise the distributions generations between cases and their
+  ancestor / infector
+Here are a few examples:
+```{r, many_plots}
+plot(res, "mu", "hist", burnin = 2000)
+plot(res, "mu", "density", burnin = 2000)
+plot(res, type = "alpha", burnin = 2000)
+plot(res, type = "t_inf", burnin = 2000)
+plot(res, type = "kappa", burnin = 2000)
+plot(res, type = "network", burnin = 2000, min_support = 0.01)
+```
+## Using `summary`
+The summary of results derives various distributional statistics for posterior,
+likelihood and prior densities, as well as for the quantitative parameters. It
+also builds a consensus tree, by finding for each case the most frequent
+infector / ancestor in the posterior samples. The corresponding frequencies are
+reported as 'support'. The most frequent value of kappa is also reported as 'generations':
+```{r, summary}
+summary(res)
+```
+<br>
+# Customising settings and priors
+As said before, most customisation can be achieved via `create_config`.
+In the following, we make the following changes to the defaults:
+- increase the number of iterations to 30,000
+- set the sampling rate to 20
+- use a star-like initial tree
+- disable to movement of `kappa`, so that we assume that all cases have
+  observed
+- set a lower rate for the exponential prior of `mu` (10 instead of 1000)
+```{r, config2, cache = TRUE}
+config2 <- create_config(n_iter = 3e4,
+                         sample_every = 20,
+                 init_tree ="star",
+             move_kappa = FALSE,
+             prior_mu = 10)
+set.seed(1)
+res2 <- outbreaker(data, config2)
+plot(res2)
+plot(res2, burnin = 2000)
+```
+We can see that the burnin is around 2,500 iterations (i.e. after the initial
+step corresponding to a local optimum).  We get the consensus tree from the new
+results, and compare the inferred tree to the actual ancestries stored in the
+dataset (`fake_outbreak$ances`):
+```{r, res2}
+summary(res2, burnin = 3000)
+tree2 <- summary(res2, burnin = 3000)$tree
+comparison <- data.frame(case = 1:30,
+                         inferred = paste(tree2$from),
+             true = paste(fake_outbreak$ances),
+             stringsAsFactors = FALSE)
+comparison$correct <- comparison$inferred == comparison$true
+comparison
+mean(comparison$correct)
+```
+Let's visualise the posterior trees:
+```{r, net2}
+plot(res2, type = "network", burnin = 3000, min_support = 0.01)
+```