|
a |
|
b/vignettes/customisation.Rmd |
|
|
1 |
--- |
|
|
2 |
title: "Using custom priors, likelihood, or movements in outbreaker2" |
|
|
3 |
author: "Thibaut Jombart" |
|
|
4 |
date: "`r Sys.Date()`" |
|
|
5 |
output: |
|
|
6 |
rmarkdown::html_vignette: |
|
|
7 |
toc: true |
|
|
8 |
toc_depth: 2 |
|
|
9 |
vignette: > |
|
|
10 |
%\VignetteIndexEntry{Using custom priors, likelihood, or movements in outbreaker2} |
|
|
11 |
%\VignetteEngine{knitr::rmarkdown} |
|
|
12 |
%\VignetteEncoding{UTF-8} |
|
|
13 |
--- |
|
|
14 |
|
|
|
15 |
|
|
|
16 |
|
|
|
17 |
```{r, echo = FALSE} |
|
|
18 |
knitr::opts_chunk$set( |
|
|
19 |
collapse = TRUE, |
|
|
20 |
comment = "#>", |
|
|
21 |
fig.width=8, |
|
|
22 |
fig.height=5, |
|
|
23 |
fig.path="figs-customisation/" |
|
|
24 |
) |
|
|
25 |
``` |
|
|
26 |
|
|
|
27 |
|
|
|
28 |
In this vignette, we show how custom functions for priors, likelihood, or |
|
|
29 |
movement of parameters and augmented data can be used in *outbreaker2*. In all |
|
|
30 |
these functions, the process will be similar: |
|
|
31 |
|
|
|
32 |
1. write your own function with the right arguments |
|
|
33 |
2. pass this function as an argument to a `custom...` function |
|
|
34 |
3. pass the result to *outbreaker2* |
|
|
35 |
|
|
|
36 |
Note that 2-3 can be a single step if passing the function to the arguments of |
|
|
37 |
*outbreaker2* directly. Also note that **all priors and likelihoods are expected |
|
|
38 |
on a log scale**. Finally, also note that while the various `custom...` |
|
|
39 |
functions will try to some extent to check that the provided functions are |
|
|
40 |
valid, such tests are very difficult to implement. In short: you are using these |
|
|
41 |
custom features at your own risks - make sure these functions work before |
|
|
42 |
passing them to *outbreaker2*. |
|
|
43 |
|
|
|
44 |
|
|
|
45 |
|
|
|
46 |
|
|
|
47 |
<br> |
|
|
48 |
|
|
|
49 |
# Customising priors |
|
|
50 |
|
|
|
51 |
Priors of *outbreaker2* must be a function of an `outbreaker_param` list (see |
|
|
52 |
`?outbreaker_param`). Here, we decide to use a step function rather than the |
|
|
53 |
default Beta function as a prior for *pi*, the reporting probability, and a flat |
|
|
54 |
prior between 0 and 1 for the mutation rate (which is technically a probability |
|
|
55 |
in the basic genetic model used in *outbreaker2*). |
|
|
56 |
|
|
|
57 |
|
|
|
58 |
We start by defining two functions: an auxiliary function `f` which returns |
|
|
59 |
values on the natural scale, and which we can use for plotting the prior |
|
|
60 |
distribution, and then a function `f_pi` which will be used for the |
|
|
61 |
customisation. |
|
|
62 |
|
|
|
63 |
```{r, f_pi} |
|
|
64 |
f <- function(pi) { |
|
|
65 |
ifelse(pi < 0.8, 0, 5) |
|
|
66 |
} |
|
|
67 |
|
|
|
68 |
f_pi <- function(param) { |
|
|
69 |
log(f(param$pi)) |
|
|
70 |
} |
|
|
71 |
|
|
|
72 |
plot(f, type = "s", col = "blue", |
|
|
73 |
xlab = expression(pi), ylab = expression(p(pi)), |
|
|
74 |
main = expression(paste("New prior for ", pi))) |
|
|
75 |
|
|
|
76 |
``` |
|
|
77 |
|
|
|
78 |
|
|
|
79 |
While `f` is a useful function to visualise the prior, `f_pi` is the function |
|
|
80 |
which will be passed to `outbreaker`. To do so, we pass it to `custom_priors`: |
|
|
81 |
|
|
|
82 |
```{r, custom_prior} |
|
|
83 |
library(outbreaker2) |
|
|
84 |
|
|
|
85 |
f_mu <- function(param) { |
|
|
86 |
if (param$mu < 0 || param$mu > 1) { |
|
|
87 |
return(-Inf) |
|
|
88 |
} else { |
|
|
89 |
return(0.0) |
|
|
90 |
} |
|
|
91 |
|
|
|
92 |
} |
|
|
93 |
|
|
|
94 |
priors <- custom_priors(pi = f_pi, mu = f_mu) |
|
|
95 |
priors |
|
|
96 |
|
|
|
97 |
``` |
|
|
98 |
|
|
|
99 |
Note that `custom_priors` does more than just adding the custom function to a |
|
|
100 |
list. For instance, the following customisations are all wrong, and rightfully |
|
|
101 |
rejected: |
|
|
102 |
|
|
|
103 |
```{r, wrong_prior, error = TRUE} |
|
|
104 |
|
|
|
105 |
## wrong: not a function |
|
|
106 |
## should be pi = function(x){0.0} |
|
|
107 |
custom_priors(pi = 0.0) |
|
|
108 |
|
|
|
109 |
## wrong: two arguments |
|
|
110 |
custom_priors(pi = function(x, y){0.0}) |
|
|
111 |
|
|
|
112 |
``` |
|
|
113 |
|
|
|
114 |
We can now use the new priors to run `outbreaker` on the `fake_outbreak` data |
|
|
115 |
(see [*introduction vignette*](introduction.html)): |
|
|
116 |
|
|
|
117 |
```{r, run_custom_priors} |
|
|
118 |
|
|
|
119 |
dna <- fake_outbreak$dna |
|
|
120 |
dates <- fake_outbreak$sample |
|
|
121 |
w <- fake_outbreak$w |
|
|
122 |
data <- outbreaker_data(dna = dna, dates = dates, w_dens = w) |
|
|
123 |
|
|
|
124 |
## we set the seed to ensure results won't change |
|
|
125 |
set.seed(1) |
|
|
126 |
|
|
|
127 |
|
|
|
128 |
res <- outbreaker(data = data, priors = priors) |
|
|
129 |
|
|
|
130 |
``` |
|
|
131 |
|
|
|
132 |
We can check the results first by looking at the traces, and then by plotting |
|
|
133 |
the posterior distributions of `pi` and `mu`, respectively: |
|
|
134 |
|
|
|
135 |
```{r, traces_custom_priors} |
|
|
136 |
|
|
|
137 |
plot(res) |
|
|
138 |
plot(res, "pi", burnin = 500) |
|
|
139 |
plot(res, "mu", burnin = 500) |
|
|
140 |
plot(res, "pi", type = "density", burnin = 500) |
|
|
141 |
plot(res, "mu", type = "hist", burnin = 500) |
|
|
142 |
|
|
|
143 |
``` |
|
|
144 |
|
|
|
145 |
Note that we are using density and histograms here for illustrative purposes, |
|
|
146 |
but there is no reason to prefer one or the other for a specific |
|
|
147 |
parameter. |
|
|
148 |
|
|
|
149 |
|
|
|
150 |
Interestingly, the trace of `pi` suggests that the MCMC oscillates between two |
|
|
151 |
different states, on either bound of the interval on which the prior is positive |
|
|
152 |
(it is `-Inf` outside (0.8; 1)). This may be a consequence of the step function, |
|
|
153 |
which causes sharp 'cliffs' in the posterior landscape. What shall one do to |
|
|
154 |
derive good samples from the posterior distribution in this kind of situation? |
|
|
155 |
There are several options, which in fact apply to typical cases of multi-modal |
|
|
156 |
posterior distributions: |
|
|
157 |
|
|
|
158 |
- Avoid 'cliffs', i.e. sharp drops in the posterior landscape, typically created |
|
|
159 |
by using step-functions in likelihoods and in priors. |
|
|
160 |
|
|
|
161 |
- Use larger samples, i.e. run more MCMC iterations. |
|
|
162 |
|
|
|
163 |
- Use a different sampler, better than Metropolis-Hasting at deriving samples |
|
|
164 |
from multi-modal distributions. |
|
|
165 |
|
|
|
166 |
|
|
|
167 |
Because we know what the real transmission tree is for this dataset, we can |
|
|
168 |
assess how the new priors impacted the inference of the transmission tree. |
|
|
169 |
|
|
|
170 |
```{r, tree_custom_priors} |
|
|
171 |
|
|
|
172 |
summary(res, burnin = 500) |
|
|
173 |
tree <- summary(res, burnin = 500)$tree |
|
|
174 |
|
|
|
175 |
comparison <- data.frame(case = 1:30, |
|
|
176 |
inferred = paste(tree$from), |
|
|
177 |
true = paste(fake_outbreak$ances), |
|
|
178 |
stringsAsFactors = FALSE) |
|
|
179 |
|
|
|
180 |
comparison$correct <- comparison$inferred == comparison$true |
|
|
181 |
comparison |
|
|
182 |
mean(comparison$correct) |
|
|
183 |
|
|
|
184 |
``` |
|
|
185 |
|
|
|
186 |
|
|
|
187 |
|
|
|
188 |
|
|
|
189 |
|
|
|
190 |
<br> |
|
|
191 |
|
|
|
192 |
# Customizing likelihood |
|
|
193 |
|
|
|
194 |
Likelihood functions customisation works identically to prior functions. The |
|
|
195 |
only difference is that custom functions will take two arguments (`data` and |
|
|
196 |
`param`) instead of one in the prior functions. The function used to specify |
|
|
197 |
custom likelihood is `custom_likelihoods`. Each custom function will correspond |
|
|
198 |
to a specific likelihood component: |
|
|
199 |
|
|
|
200 |
```{r, likelihood_components} |
|
|
201 |
|
|
|
202 |
custom_likelihoods() |
|
|
203 |
|
|
|
204 |
``` |
|
|
205 |
|
|
|
206 |
see `?custom_likelihoods` for details of these components, and see the section |
|
|
207 |
'Extending the model' for new, other components. As for `custom_priors`, a few |
|
|
208 |
checks are performed by `custom_likelihoods`: |
|
|
209 |
|
|
|
210 |
```{r, wrong_likelihood, error = TRUE} |
|
|
211 |
|
|
|
212 |
## wrong: not a function |
|
|
213 |
custom_likelihoods(genetic = "fubar") |
|
|
214 |
|
|
|
215 |
## wrong: only one argument |
|
|
216 |
custom_likelihoods(genetic = function(x){ 0.0 }) |
|
|
217 |
|
|
|
218 |
``` |
|
|
219 |
|
|
|
220 |
A trivial customisation is to disable some or all of the likelihood components |
|
|
221 |
of the model by returning a finite constant. Here, we apply this to two cases: |
|
|
222 |
first, we will disable all likelihood components as a sanity check, making sure |
|
|
223 |
that the transmission tree landscape is explored freely by the MCMC. Second, we |
|
|
224 |
will recreate the [Wallinga & Teunis (1994)](http://dx.doi.org/10.1093/aje/kwh255) |
|
|
225 |
model, by disabling specific components. |
|
|
226 |
|
|
|
227 |
|
|
|
228 |
|
|
|
229 |
## A null model |
|
|
230 |
|
|
|
231 |
```{r, null_model} |
|
|
232 |
|
|
|
233 |
f_null <- function(data, param) { |
|
|
234 |
return(0.0) |
|
|
235 |
} |
|
|
236 |
|
|
|
237 |
null_model <- custom_likelihoods(genetic = f_null, |
|
|
238 |
timing_sampling = f_null, |
|
|
239 |
timing_infections = f_null, |
|
|
240 |
reporting = f_null, |
|
|
241 |
contact = f_null) |
|
|
242 |
|
|
|
243 |
null_model |
|
|
244 |
|
|
|
245 |
``` |
|
|
246 |
|
|
|
247 |
|
|
|
248 |
We also specify settings via the `config` argument to avoid detecting imported |
|
|
249 |
cases, reduce the number of iterations and sampling each of them: |
|
|
250 |
|
|
|
251 |
```{r, run_null_model, cache = TRUE} |
|
|
252 |
|
|
|
253 |
null_config <- list(find_import = FALSE, |
|
|
254 |
n_iter = 500, |
|
|
255 |
sample_every = 1) |
|
|
256 |
|
|
|
257 |
set.seed(1) |
|
|
258 |
|
|
|
259 |
res_null <- outbreaker(data = data, |
|
|
260 |
config = null_config, |
|
|
261 |
likelihoods = null_model) |
|
|
262 |
|
|
|
263 |
``` |
|
|
264 |
|
|
|
265 |
|
|
|
266 |
```{r, res_null_model} |
|
|
267 |
|
|
|
268 |
plot(res_null) |
|
|
269 |
plot(res_null, "pi") |
|
|
270 |
plot(res_null, "mu") |
|
|
271 |
|
|
|
272 |
``` |
|
|
273 |
|
|
|
274 |
By typical MCMC standards, these traces look appalling, as they haven't reach |
|
|
275 |
stationarity (i.e. same mean and variance over time), and are grossly |
|
|
276 |
autocorrelated in parts. Fair enough, as these are only the first 500 iterations |
|
|
277 |
of the MCMC, so that autocorrelation is expected. In fact, what we observe here |
|
|
278 |
literally is the random walk across the posterior landscape, which in this case |
|
|
279 |
is only impacted by the priors. |
|
|
280 |
|
|
|
281 |
|
|
|
282 |
We can check that transmission trees are indeed freely explored: |
|
|
283 |
|
|
|
284 |
```{r, null_trees} |
|
|
285 |
|
|
|
286 |
plot(res_null, type = "alpha") |
|
|
287 |
|
|
|
288 |
``` |
|
|
289 |
|
|
|
290 |
Do **not** try to render the corresponding network using `plot(..., type = |
|
|
291 |
"network")` as the force-direction algorithm will go insane. However, this |
|
|
292 |
network can be visualised using *igraph*, extracting the edges and nodes from |
|
|
293 |
the plot (without displaying it): |
|
|
294 |
|
|
|
295 |
```{r, null_net} |
|
|
296 |
|
|
|
297 |
## extract nodes and edges from the visNetwork object |
|
|
298 |
temp <- plot(res_null, type = "network", min_support = 0) |
|
|
299 |
class(temp) |
|
|
300 |
head(temp$x$edges) |
|
|
301 |
head(temp$x$nodes) |
|
|
302 |
|
|
|
303 |
## make an igraph object |
|
|
304 |
library(igraph) |
|
|
305 |
|
|
|
306 |
net_null <- graph.data.frame(temp$x$edges, |
|
|
307 |
vertices = temp$x$nodes[1:4]) |
|
|
308 |
|
|
|
309 |
plot(net_null, layout = layout.circle, |
|
|
310 |
main = "Null model, posterior trees") |
|
|
311 |
|
|
|
312 |
``` |
|
|
313 |
|
|
|
314 |
We can derive similar diagnostics for the number of generations between cases |
|
|
315 |
(`kappa`), only constrained by default settings to be between 1 and 5, and for |
|
|
316 |
the infection dates (*t_inf*): |
|
|
317 |
|
|
|
318 |
```{r, res_null_diag} |
|
|
319 |
|
|
|
320 |
plot(res_null, type = "kappa") |
|
|
321 |
plot(res_null, type = "t_inf") |
|
|
322 |
|
|
|
323 |
``` |
|
|
324 |
|
|
|
325 |
Finally, we can verify that the distributions of `mu` and `pi` match their |
|
|
326 |
priors, respectively an exponential distribution with rate 1000 and a beta with |
|
|
327 |
parameters 10 and 1. Here, we get a qualitative assessment by comparing the |
|
|
328 |
observed distribution (histograms) to the densities of similar sized random |
|
|
329 |
samples from the priors: |
|
|
330 |
|
|
|
331 |
```{r, res_null_priors} |
|
|
332 |
|
|
|
333 |
par(xpd=TRUE) |
|
|
334 |
hist(res_null$mu, prob = TRUE, col = "grey", |
|
|
335 |
border = "white", |
|
|
336 |
main = "Distribution of mu") |
|
|
337 |
|
|
|
338 |
invisible(replicate(30, |
|
|
339 |
points(density(rexp(500, 1000)), type = "l", col = "blue"))) |
|
|
340 |
|
|
|
341 |
|
|
|
342 |
hist(res_null$pi, prob = TRUE, col = "grey", |
|
|
343 |
border = "white", main = "Distribution of pi") |
|
|
344 |
|
|
|
345 |
invisible(replicate(30, |
|
|
346 |
points(density(rbeta(500, 10, 1)), type = "l", col = "blue"))) |
|
|
347 |
|
|
|
348 |
``` |
|
|
349 |
|
|
|
350 |
|
|
|
351 |
|
|
|
352 |
|
|
|
353 |
|
|
|
354 |
|
|
|
355 |
<br> |
|
|
356 |
|
|
|
357 |
## A model using symptom onset only |
|
|
358 |
|
|
|
359 |
We can use data and likelihood customisation to change the default *outbreaker2* |
|
|
360 |
model into a [Wallinga & Teunis (1994)](http://dx.doi.org/10.1093/aje/kwh255) |
|
|
361 |
model. To do so, we need to: |
|
|
362 |
|
|
|
363 |
- Remove the DNA sequences from the data; alternatively we could also specify a |
|
|
364 |
'null' function (i.e. returning a finite constant, as above) for the genetic |
|
|
365 |
likelihood. |
|
|
366 |
|
|
|
367 |
- Disable all likelihood components other than `timing_infections` using |
|
|
368 |
`custom_likelihoods`. This means that the dates provided will be treated as |
|
|
369 |
dates of symptom onset, and the timing distribution `w` will be taken as the |
|
|
370 |
serial interval. |
|
|
371 |
|
|
|
372 |
- Disable the detection of imported cases, and forcing all `kappa` values to be |
|
|
373 |
1. |
|
|
374 |
|
|
|
375 |
|
|
|
376 |
|
|
|
377 |
While these are fairly major changes, they are straightforward to implement. We |
|
|
378 |
first create the dataset and custom likelihood functions: |
|
|
379 |
|
|
|
380 |
```{r, wt_custom} |
|
|
381 |
|
|
|
382 |
onset_data <- outbreaker_data(dates = fake_outbreak$onset, |
|
|
383 |
w_dens = fake_outbreak$w) |
|
|
384 |
|
|
|
385 |
wt_model <- custom_likelihoods(timing_sampling = f_null, |
|
|
386 |
reporting = f_null) |
|
|
387 |
|
|
|
388 |
``` |
|
|
389 |
|
|
|
390 |
To fix parameters or augmented data (here, fix all `kappa` values to 1), we set |
|
|
391 |
the initial states to the desired values and disable the corresponding moves: |
|
|
392 |
|
|
|
393 |
```{r, wt_config} |
|
|
394 |
|
|
|
395 |
wt_config <- create_config(init_kappa = 1, move_kappa = FALSE, |
|
|
396 |
init_pi = 1, move_pi = FALSE, |
|
|
397 |
move_mu = FALSE) |
|
|
398 |
|
|
|
399 |
``` |
|
|
400 |
|
|
|
401 |
We can now run the analyses for this new model: |
|
|
402 |
```{r, run_wt, cache = TRUE} |
|
|
403 |
|
|
|
404 |
set.seed(1) |
|
|
405 |
res_wt <- outbreaker(data = onset_data, |
|
|
406 |
config = wt_config, |
|
|
407 |
likelihoods = wt_model) |
|
|
408 |
|
|
|
409 |
``` |
|
|
410 |
|
|
|
411 |
|
|
|
412 |
```{r, res_wt} |
|
|
413 |
|
|
|
414 |
plot(res_wt) |
|
|
415 |
plot(res_wt, burnin = 500) |
|
|
416 |
plot(res_wt, burnin = 500, type = "alpha") |
|
|
417 |
summary(res_wt) |
|
|
418 |
|
|
|
419 |
``` |
|
|
420 |
|
|
|
421 |
As before for the 'null' model, the transmission tree is very poorly resolved in |
|
|
422 |
this case. We use the same approach to visualise it: extract nodes and edges |
|
|
423 |
from the `visNetork` object, use this information to create an `igraph` object, |
|
|
424 |
and visualise the result using a circular layout: |
|
|
425 |
|
|
|
426 |
```{r, wt_net} |
|
|
427 |
|
|
|
428 |
## extract nodes and edges from the visNetwork object |
|
|
429 |
temp <- plot(res_wt, type = "network", min_support = 0.05) |
|
|
430 |
class(temp) |
|
|
431 |
head(temp$x$edges) |
|
|
432 |
head(temp$x$nodes) |
|
|
433 |
|
|
|
434 |
## make an igraph object |
|
|
435 |
|
|
|
436 |
net_wt <- graph.data.frame(temp$x$edges, |
|
|
437 |
vertices = temp$x$nodes[1:4]) |
|
|
438 |
|
|
|
439 |
plot(net_wt, layout = layout.circle, |
|
|
440 |
main = "WT model, posterior trees") |
|
|
441 |
|
|
|
442 |
``` |
|
|
443 |
|
|
|
444 |
|
|
|
445 |
|
|
|
446 |
|
|
|
447 |
|
|
|
448 |
|
|
|
449 |
<br> |
|
|
450 |
|
|
|
451 |
# Customising movements |
|
|
452 |
|
|
|
453 |
Customising movements works in similar ways to priors and likelihoods. In |
|
|
454 |
practice, this type of customisation is more complex as it most likely will |
|
|
455 |
require evaluation of likelihoods and priors, and therefore require the user to |
|
|
456 |
know which functions to all, and how. These are documented in the [API |
|
|
457 |
vignette](Rcpp_API.html). In the following, we provide two examples: |
|
|
458 |
|
|
|
459 |
- a (fake) Gibbs sampler for the movement of the mutation rate `mu` |
|
|
460 |
|
|
|
461 |
- a new 'naive' movement of ancestries in which infectors are picked at random |
|
|
462 |
from all cases |
|
|
463 |
|
|
|
464 |
But before getting into these, we need to explicit how movements are happening |
|
|
465 |
in *outbreaker2*. |
|
|
466 |
|
|
|
467 |
|
|
|
468 |
|
|
|
469 |
## Movements in *outbreaker2* |
|
|
470 |
|
|
|
471 |
At the core of the `outbreaker` function, movements are implemented as a list of |
|
|
472 |
functions, which are all evaluated in turn during every iteration of the |
|
|
473 |
MCMC. All movement functions must obey two rules: |
|
|
474 |
|
|
|
475 |
- The first argument must be an `outbreaker_param` object (typically called |
|
|
476 |
`param` in the original code); see `?create_param` for details. |
|
|
477 |
|
|
|
478 |
- All movement functions must return a valid, `outbreaker_param` object. |
|
|
479 |
|
|
|
480 |
|
|
|
481 |
However, a new difficulty compared to prior or likelihood customisation is that |
|
|
482 |
different movements may require different components of the model, and a |
|
|
483 |
different set of arguments after `param`. In fact, this can be seen by examining |
|
|
484 |
the arguments of all the default movement functions: |
|
|
485 |
|
|
|
486 |
```{r, move_defaults} |
|
|
487 |
|
|
|
488 |
lapply(custom_moves(), args) |
|
|
489 |
|
|
|
490 |
``` |
|
|
491 |
|
|
|
492 |
To handle this difficulty, *outbreaker2* transforms every movement function |
|
|
493 |
before running the MCMC into a new function with a single parameter `param`, |
|
|
494 |
attaching all other required argument to the function's environment. The |
|
|
495 |
function achieving this transformation is called `bind_moves`. This function |
|
|
496 |
'knows' what these components are for known moves listed above. For new, |
|
|
497 |
unknown moves, it attaches the following components, provided they are used as |
|
|
498 |
arguments of the new function: |
|
|
499 |
|
|
|
500 |
- `data`: the processed data; see `?outbreaker_data` |
|
|
501 |
|
|
|
502 |
- `config`: the configuration list; see `create_config` |
|
|
503 |
|
|
|
504 |
- `likelihoods`: a list of custom likelihood functions; see |
|
|
505 |
`?custom_likelihoods` |
|
|
506 |
|
|
|
507 |
- `priors`: a list of custom prior functions; see `?custom_priors` |
|
|
508 |
|
|
|
509 |
|
|
|
510 |
See examples in `?bind_moves` for details of how this process works. |
|
|
511 |
|
|
|
512 |
|
|
|
513 |
|
|
|
514 |
|
|
|
515 |
|
|
|
516 |
## A (fake) Gibbs sampler for `mu` |
|
|
517 |
|
|
|
518 |
A Gibbs sampler supposes that the conditional distribution of a parameter is |
|
|
519 |
known and can directly be sampled from. Here, we use this principle to provide a |
|
|
520 |
toy example of custom movement for `mu`, assuming that this conditional |
|
|
521 |
distribution is always an Exponential distribution with a rate of 1000. This is |
|
|
522 |
easy to implement; to make sure that the function is actually used, we set a |
|
|
523 |
global variable changed when the function is called. |
|
|
524 |
|
|
|
525 |
|
|
|
526 |
```{r, custom_move_mu} |
|
|
527 |
|
|
|
528 |
move_mu <- function(param, config) { |
|
|
529 |
|
|
|
530 |
NEW_MOVE_HAS_BEEN_USED <<- TRUE |
|
|
531 |
param$mu <- rexp(1, 1000) |
|
|
532 |
return(param) |
|
|
533 |
|
|
|
534 |
} |
|
|
535 |
|
|
|
536 |
moves <- custom_moves(mu = move_mu) |
|
|
537 |
|
|
|
538 |
quick_config <- list(n_iter = 500, sample_every = 1, find_import = FALSE) |
|
|
539 |
|
|
|
540 |
``` |
|
|
541 |
|
|
|
542 |
Note that the new movement function `move_mu` has two arguments, and that we do |
|
|
543 |
not specify `config`. Internally, what happens is: |
|
|
544 |
|
|
|
545 |
```{r, bind_moves} |
|
|
546 |
|
|
|
547 |
## bind quick_config to function |
|
|
548 |
move_mu_intern <- bind_to_function(move_mu, config = quick_config) |
|
|
549 |
|
|
|
550 |
## new function has just one argument |
|
|
551 |
move_mu_intern |
|
|
552 |
|
|
|
553 |
## 'config' is in the function's environment |
|
|
554 |
names(environment(move_mu_intern)) |
|
|
555 |
|
|
|
556 |
## 'config' is actually 'quick_config' |
|
|
557 |
identical(environment(move_mu_intern)$config, quick_config) |
|
|
558 |
|
|
|
559 |
``` |
|
|
560 |
|
|
|
561 |
|
|
|
562 |
We perform a quick run using this new movement: |
|
|
563 |
|
|
|
564 |
```{r, run_custom_move_mu} |
|
|
565 |
|
|
|
566 |
NEW_MOVE_HAS_BEEN_USED <- FALSE |
|
|
567 |
|
|
|
568 |
set.seed(1) |
|
|
569 |
res_move_mu <- outbreaker(data, quick_config, moves = moves) |
|
|
570 |
NEW_MOVE_HAS_BEEN_USED |
|
|
571 |
|
|
|
572 |
plot(res_move_mu) |
|
|
573 |
plot(res_move_mu, "pi") |
|
|
574 |
plot(res_move_mu, "mu") |
|
|
575 |
|
|
|
576 |
``` |
|
|
577 |
|
|
|
578 |
This short, full trace, clearly hasn't mixed well (as is to be expected). But |
|
|
579 |
while we see the effect of accept/reject sampling (Metropolis algorithm) for |
|
|
580 |
`pi` with a lot of autocorrelation, the trace of `mu` shows complete |
|
|
581 |
independence between successive values. While the Gibbs sampler used here is not |
|
|
582 |
correct, this result is: a Gibbs sampler will be more efficient than the |
|
|
583 |
classical Metropolis(-Hasting) algorithm for a given number a iterations. |
|
|
584 |
|
|
|
585 |
|
|
|
586 |
|
|
|
587 |
|
|
|
588 |
<br> |
|
|
589 |
|
|
|
590 |
## A new movement of ancestries |
|
|
591 |
|
|
|
592 |
Moves of ancestries are done in two ways in outbreaker: by picking ancestors at |
|
|
593 |
random from any prior case, and by swapping cases from a transmission |
|
|
594 |
link. Here, we implement a new move, which will propose infectors which have |
|
|
595 |
been infected on the same day of the current infector. As before, we will use |
|
|
596 |
global variables to keep track of the resulting movements (see `N_ACCEPT` and |
|
|
597 |
`N_REJECT`). |
|
|
598 |
|
|
|
599 |
|
|
|
600 |
```{r, new_move_ances} |
|
|
601 |
|
|
|
602 |
api <- get_cpp_api() |
|
|
603 |
|
|
|
604 |
new_move_ances <- function(param, data, custom_likelihoods = NULL) { |
|
|
605 |
|
|
|
606 |
for (i in 1:data$N) { |
|
|
607 |
current_ances <- param$alpha[i] |
|
|
608 |
if (!is.na(current_ances)) { |
|
|
609 |
## find cases infected on the same days |
|
|
610 |
current_t_inf <- param$t_inf[current_ances] |
|
|
611 |
pool <- which(param$t_inf == current_t_inf) |
|
|
612 |
pool <- setdiff(pool, i) |
|
|
613 |
|
|
|
614 |
if (length(pool) > 0) { |
|
|
615 |
## propose new ancestor |
|
|
616 |
current_ll <- api$cpp_ll_all(data, param, i = i, custom_likelihoods) |
|
|
617 |
param$alpha[i] <- sample(pool, 1) |
|
|
618 |
new_ll <- api$cpp_ll_all(data, param, i = i, custom_likelihoods) |
|
|
619 |
|
|
|
620 |
## likelihood ratio - no correction, move is symmetric |
|
|
621 |
ratio <- exp(new_ll - current_ll) |
|
|
622 |
|
|
|
623 |
## accept / reject |
|
|
624 |
if (runif(1) <= ratio) { # accept |
|
|
625 |
N_ACCEPT <<- N_ACCEPT + 1 |
|
|
626 |
} else { # reject |
|
|
627 |
N_REJECT <<- N_REJECT + 1 |
|
|
628 |
param$alpha[i] <- current_ances |
|
|
629 |
} |
|
|
630 |
} |
|
|
631 |
} |
|
|
632 |
} |
|
|
633 |
return(param) |
|
|
634 |
} |
|
|
635 |
|
|
|
636 |
moves <- custom_moves(new_move = new_move_ances) |
|
|
637 |
|
|
|
638 |
``` |
|
|
639 |
|
|
|
640 |
We can now use this new move in our transmission tree reconstruction. We will |
|
|
641 |
use a shorter chain than the defaults as this new move is likely to be computer |
|
|
642 |
intensive. |
|
|
643 |
|
|
|
644 |
|
|
|
645 |
|
|
|
646 |
```{r, run_new_move, cache = TRUE} |
|
|
647 |
|
|
|
648 |
N_ACCEPT <- 0 |
|
|
649 |
N_REJECT <- 0 |
|
|
650 |
|
|
|
651 |
set.seed(1) |
|
|
652 |
res_new_move <- outbreaker(data, list(move_kappa = FALSE), moves = moves) |
|
|
653 |
|
|
|
654 |
N_ACCEPT |
|
|
655 |
N_REJECT |
|
|
656 |
|
|
|
657 |
``` |
|
|
658 |
|
|
|
659 |
|
|
|
660 |
```{r, res_new_move} |
|
|
661 |
|
|
|
662 |
plot(res_new_move) |
|
|
663 |
plot(res_new_move, type = "alpha") |
|
|
664 |
summary(res_new_move) |
|
|
665 |
|
|
|
666 |
``` |
|
|
667 |
|
|
|
668 |
Results show a switch to a new mode at about 5000 iterations. Let us compare the |
|
|
669 |
consensus tree to the actual one (store in `fake_outbreak$ances`): |
|
|
670 |
|
|
|
671 |
```{r, check_new_move} |
|
|
672 |
|
|
|
673 |
summary(res_new_move, burnin = 5000) |
|
|
674 |
tree2 <- summary(res_new_move, burnin = 5000)$tree |
|
|
675 |
|
|
|
676 |
comparison <- data.frame(case = 1:30, |
|
|
677 |
inferred = paste(tree2$from), |
|
|
678 |
true = paste(fake_outbreak$ances), |
|
|
679 |
stringsAsFactors = FALSE) |
|
|
680 |
|
|
|
681 |
comparison$correct <- comparison$inferred == comparison$true |
|
|
682 |
comparison |
|
|
683 |
mean(comparison$correct) |
|
|
684 |
|
|
|
685 |
``` |