Switch to unified view

a b/vignettes/PredictFromData.Rnw
1
\documentclass[a4paper]{article}
2
\usepackage[utf8]{inputenc}
3
\usepackage{amsmath}
4
\usepackage{hyperref}
5
\usepackage{fullpage}
6
\title{Predict from data - Basic tutorial}
7
\author{Daniel Dalevi and Nik Burkoff}
8
% \VignetteIndexEntry{Predict from data - Basic tutorial}
9
%\VignetteEngine{knitr::knitr} 
10
11
\begin{document}
12
\sloppy
13
14
<<include=FALSE>>=
15
library(knitr)
16
opts_chunk$set(
17
concordance=TRUE
18
)
19
@
20
21
\maketitle
22
23
\section{Introduction}
24
This is a tutorial describing how to use the \emph{predict from data} functionality of the eventPrediction package (version $\ge 2.1.5$). The aim is to perform event predictions for time-to-event outcome clinical trials, specifically in oncology where the endpoint is either Overall Survival (OS) or Progression Free Survival (PFS), or Cardiovascular and Diabetes trials with composite endpoints. Accumulated data is analysed during an ongoing trial at an analysis date (a cutpoint in time where observations are censored). A survival model (by default Weibull) is fitted to the available data and is used to predict when patients (who have not had events) will have events. When studies are still recruiting, additional patients may be added stochastically or deterministically using a power law, a Poisson process or by defining your own sampling distribution. The aim is to predict when a required number of events will be reached (the target level), or the expected number of events at a given time.
25
26
\section{Preliminary steps}
27
Before starting this tutorial you will need to install the eventPrediction package and its dependencies. For all examples below, ensure you load the library first. 
28
<<load,message=FALSE>>=
29
library(eventPrediction)
30
@
31
32
\section{Loading the survival data}
33
A simulated OS data set is used in this example to illustrate the required data format. The format is the same for CRGI/Diabetes but differs slightly for PFS where dthDate and progDate are used instead of the eventDate column. The OS data can be obtained by using the data-command. 
34
35
<<>>=
36
data(event.data)
37
38
head(event.data)
39
@
40
41
The following three date formats are recognized by the package: `YYYY-MM-DD', `DD/MM/YYYY' and `DD Month YYYY'. The package also recognizes R's S3 Date class. The first step is to create an \texttt{EventData} object:
42
<<>>=
43
my.data <- EventData(data=event.data,
44
                     subject="subject",
45
                     rand.date="randDate",
46
                     has.event="hasEvent",
47
                     withdrawn="withdrawn",
48
                     time="time",
49
                     site="site", #optional
50
                     event.type="eventType") #optional
51
@
52
53
54
The \texttt{data} argument is a data frame containing time-to-event data and the remaining arguments are the column names for: the subject identifier (\texttt{subject}); date of randomization onto trial (\texttt{rand.date}); whether the subject has had an event (\texttt{has.event} -- 1 if had event 0 otherwise); whether the subject has withdrawn from the trial (\texttt{withdrawn} -- 1 if withdrawn 0 otherwise) and the number of days the subject has been on the trial until an event/subject withdrawal/censoring has occurred (\texttt{time}). The column name of the site, i.e.\ the centre/hospital where the subject enrolled is optional. The column name of the event type (for example `Progressed' or `Death') is also optional, if it is not included then all events are treated as of the same type. Note, PFS or composite endpoint predictions are still possible as the event type column is not used when fitting the model or performing the predictions. 
55
56
If the data set does not have a `time' column it can be derived from additional columns of the data frame by passing a list to the \texttt{time} argument, for example:
57
<<>>=
58
other.data <- EventData(data=event.data,
59
                     subject="subject",
60
                     rand.date="randDate",
61
                     has.event="hasEvent",
62
                     withdrawn="withdrawn",
63
                     time=list(event.date="eventDate",last.date="lastDate"))
64
65
#time has been calculated
66
head(other.data@subject.data$time)
67
@
68
69
There are 5 possible entries in the list and the following logic is applied:
70
\begin{itemize}
71
\item \texttt{last.date}: Time = last.date - rand.date + 1 for subjects who are censored
72
\item \texttt{event.date, prog.date, dth.date}: 
73
Time = min(event.date, prog.date, dth.date) - rand.date + 1 for subjects who have had an event. \texttt{last.date} will be used in place of the other dates if they are all missing.
74
\item \texttt{withdrawn.date}
75
Time = withdrawn.date - rand.date + 1 for subjects who have withdrawn \textit{and not had an event}. \texttt{last.date} will be used in place if \texttt{withdrawn.date} is missing.
76
\end{itemize}
77
78
If time = 0 or set to NA then subject will be ignored when fitting the model but they will be used in the event prediction; it is assumed that they were randomized onto the study and then no further information is known. However if the argument \texttt{remove.0.time} is set to \texttt{TRUE} when calling the \texttt{EventData} function then subjects with time = 0 or NA will be removed from the data set before creating the EventData object. 
79
80
A summary of the data can be displayed:
81
<<sum>>=
82
summary(my.data)
83
@
84
Note: The average recruitment assumes the recruitment period starts when the first subject was randomized and ended when the final subject was randomized onto the study (see Section \ref{recruitlab} for more information).
85
86
The total number of subject years `at risk' can also be calculated using the CalculateDaysAtRisk function:
87
<<atrisk>>=
88
daysinyear <- 365.25
89
90
CalculateDaysAtRisk(my.data)/daysinyear
91
@
92
93
Note for functions in the package which require conversion between days and months/years the option \texttt{eventPrediction.daysinyear} can be used to set the number of days in a year (e.g. to be 365 or 365.25) by default it is 365.25. 
94
95
Further, it is sometimes useful to plot the number of events versus time to see if there are any major changes to the event rate.
96
97
<<eventvstime,fig.height=5,fig.width=5>>=
98
EventVsTimePlot( my.data, timeunit = "month" )
99
@
100
The resolution on the x-axis, set by \texttt{timeunit}, can be either in \texttt{month}, \texttt{weeks} or \texttt{quarter} of a year.
101
102
\subsection{Diagnostics}
103
104
In the next section we are going to fit a Weibull model to this data and we can check whether this model is appropriate by plotting $\log(-\log \hat{S}(t))$ against $\log t$ where $\hat{S(t)}$ is the KM estimate for the survival function. A Weibull survival function would generate a straight line on this diagnostic plot\footnote{As we will be extrapolating the results of the model fit, it is not the case that if this graph suggests a Weibull model is appropriate then it is automatically the case that the event prediction will be accurate.}.
105
<<loglogS,fig.height=5,fig.width=5>>=
106
plot(my.data)
107
@
108
109
110
It is important to check that the data is up to date before running the prediction algorithm (see Section \ref{outofdatelab}: Dealing with out of Date Data Section for further details). If the site argument was used when creating the \texttt{EventData} object then the package can output, for each site, the number of subjects (who have not had an event or withdrawn) who have been censored more than \texttt{ndays} before a given \texttt{analysis.date}. If the \texttt{analysis.date} is not included, a default date of the latest date for which we have any information for any subject in the study, is used.
111
<<site>>=
112
siteInformation(my.data, ndays=7, analysis.date="2015-11-28")
113
@
114
115
It is also possible to extract the subjects who have been censored before a given date:
116
<<censor>>=
117
censor.df <- censorInformation(my.data,censor.date="2015-11-28")
118
head(censor.df)
119
@
120
121
We can visualize this lag between the date of the analysis and the time subjects are censored:
122
<<diagplot>>=
123
DiagnosticPlot(my.data,window.size=30,separate.events=TRUE)
124
@
125
This graph shows the number of days the subjects have been on the study until event/withdrawn/censor (y axis) and the number of days since randomization (x-axis). Ongoing patients, i.e. still in the trial, ought to be found close to the
126
line y=x. If they are censored a long way in the past they will be found far from this line and data may need to be updated before event predictions are performed. 
127
Two additional arguments can be passed to the \texttt{DiagnosticPlot} function:
128
\begin{itemize}
129
\item \texttt{analysis.date}: The reference date used for the x-axis.
130
\item \texttt{window.size}: Lines are drawn at y = x-window.size and y=x-2*window.size. These can be useful when the study has a visit schedule as they highlight subjects who have missed the last one or two visits (cycles). In the example above we use a window size of 30 days.
131
\item \texttt{separate.events}: Should the different event types be coloured individually, if so this should be \texttt{TRUE} otherwise should be \texttt{FALSE}. 
132
\end{itemize} 
133
134
Finally, it is possible to `cut' the data on a given date whereby all subjects who have left the study after a given date are censored on this given date. See the \texttt{CutData} function documentation for further details. This could also be useful if there is a lag in reporting and a more complete dataset can be obtained by cutting (censoring) the data, for example, a few weeks earlier. 
135
136
Further diagnostic checks could also be carried out:
137
\begin{itemize}
138
\item If we currently have $x$ events, cut the data at $y<x$ events and use the prediction tool to try and predict when $x$ events occur and see how close the prediction is to reality -- see below for cutting data.
139
\item Compare the results to the trial assumptions to sanity check the data/assumptions -- see below for simulating from trial assumptions.  
140
\item Fit other models (for example loglogistic, lognormal) and see if any fit the data better than the Weibull model (using AIC for example).
141
\end{itemize}
142
143
\section{Fit the model}
144
145
We next fit a Weibull survival model using the \texttt{fit} function (which calls the \texttt{survreg} function from the \texttt{survival} package). 
146
<<fit>>=
147
my.fit <- fit(my.data)
148
149
my.fit
150
@
151
152
We can compare the fit to the data by plotting it. The units of time can be \texttt{"Days"} (default), \texttt{"Months"} or \texttt{"Years"} and view the number patient still at risk at various time points, i.e. the risk table (see function documentation for further details):
153
<<weibullplot,fig.height=5,fig.width=7>>=
154
plot(my.fit,units="Months")
155
156
risk.table(my.fit,units="Days")
157
@
158
159
\section{Event Prediction}
160
161
We use the \texttt{simulate} procedure to generate when we expect events to occur. This function uses the parameters of the fitted Weibull model to simulate the time-to-event for all subjects who have not had an event. Conditional Weibull distributions are used to take into account the time since randomization for each subject. See \cite{Carroll:2003} for further details.     
162
163
<<res,fig.height=6,fig.width=6>>=
164
results <- simulate(my.fit, 
165
                    Nsim=500, #Number of simulations to perform  
166
                    seed = 20141015) #A random seed for reproducibility
167
168
summary(results)
169
170
plot(results)
171
@
172
173
The blue dotted line shows the last event date and how many events we have up until now. The fitted Weibull model is used to simulate new events (blue line) and confidence intervals (red dashed lines). Observed dropouts (withdrawn patients) will also be shown on the graph (although in this dataset there are none).
174
175
The following are arguments which can be used with the \texttt{simulate} function:
176
\begin{itemize}
177
\item \texttt{accrualGenerator}, \texttt{Naccrual}: See Patient Accrual section below 
178
\item \texttt{data}: An EventData object, if used then instead of simulating events for the data set which was used to fit the model, this data set is used instead. See below for further details. 
179
\item \texttt{Nsim}: The number of independent simulations to run 
180
\item \texttt{seed}: The random seed (default NULL, R choose seed)
181
\item \texttt{limit}: Controls the width of the confidence interval. Default = 0.05 which implies the CI is drawn at the 5th and 95th percentiles.
182
\item \texttt{longlagsettings}: See Dealing with out of Date Data Section below.
183
\item \texttt{HR}, \texttt{r}: Hazard ratio and randomization balance. Advanced options see later in the vignette for further details.
184
\item \texttt{SimParams}: \textbf{Not recommended;} instead of using the estimated parameters from the Weibull model, specific rate and shape parameters can be used to simulate event times. If both \texttt{data} and this argument are used then the model fit is not required. See below for further details.
185
\item \texttt{dropout} An option to include subject dropout. See below for further details.
186
\end{itemize}
187
188
\subsection{Perform predictions}
189
190
To predict when a target level, say $900$ events will be reached we use the \texttt{predict} method and plotting the graph now shows brown dotted lines at the target level. 
191
<<>>=
192
193
results <- predict(results,event.pred=900)
194
195
summary(results)
196
 
197
plot(results)
198
@
199
200
Similarly to the predict from parameters part of the package, it is possible to predict multiple target events and also the expected number of events at given times.  
201
<<pred>>=
202
results <- predict(results,event.pred=950,time.pred=c("2017-10-10","2017-12-12"))
203
204
#we do not output the number of years at risk in this summary
205
summary(results,round.method="toMonths",show.at.risk=FALSE)
206
207
results@event.pred.data
208
results@time.pred.data
209
@
210
211
The (median) number of days at risk at the given time (i.e. the time column of the event.pred.data and time.pred.data) is also calculated and by default output when displaying the summary, set \texttt{show.at.risk} to \texttt{FALSE} to suppress this output.
212
213
The results also stores the shape ($a$) and rate ($\lambda$) of the fitted Weibull distribution. \textbf{Note the unit of the rate is day$^{-1}$, unlike in predict from parameters where it is months$^{-1}$.} 
214
<<>>=
215
 results@simParams@parameters$shape
216
 results@simParams@parameters$rate
217
218
# Note scale = 1/rate 
219
@
220
221
which has a survival function:
222
\begin{equation}
223
S(t) = \exp(-(\lambda t)^a)
224
\end{equation}
225
226
227
Additional options which can be passed to the plot function:
228
\begin{itemize}
229
\item \texttt{show.title} If TRUE display title on graph
230
\item \texttt{title} The title to be displayed on the graph, by default it is the summary text for the results
231
\item \texttt{text.width} The text width to be used for the title (this option is only used if the default title is used)
232
\item \texttt{show.obs} If TRUE show the (actually occurred) events on the graph. 
233
\item \texttt{round.method} The rounding method used for the text of the title, if `toMonths' then dates are rounded (lower CI date is rounded to the month of 15 days earlier, median is rounded to the nearest month and upper CI date is rounded to the month of 15 days later)
234
\item \texttt{show.predictions} If TRUE show the brown lines for the prediction 
235
\item \texttt{pred.to.present} \textbf{Not recommended;} move the predictions which occur before the date of the last subject has an event/is censored to this date. 
236
\item \texttt{xlim} The x axis limits for the graph (in months from the first recruitment date)
237
\item \texttt{ylim} The y axis limits for the graph
238
\item \texttt{include.dropouts} Logical, should the dropouts be displayed
239
\item \texttt{legend.position} Character, position of legend for example ``bottomright" or ``top"
240
\item \texttt{custom.dates} Vector of Dates, the dates to be shown on the x axis of the graph.
241
\end{itemize}
242
243
\section{Subject Dropout}
244
245
If the study protocol has a fixed follow up period then subjects will censored if they have not had an event at the end of the follow period. This is implemented in the event Prediction package by including a followup argument when creating the eventData object. The followup argument is in units of days.
246
<<>>=
247
my.data.with.follow.up <- EventData(data=event.data,
248
                     subject="subject",
249
                     rand.date="randDate",
250
                     has.event="hasEvent",
251
                     withdrawn="withdrawn",
252
                     time="time",
253
                     site="site",
254
                     followup=1095) #Note followup period
255
256
fit.with.followup <- fit(my.data.with.follow.up) 
257
258
@
259
260
261
262
It is all possible to simulate subjects dropping out of the clinical trial using a \texttt{dropout} argument. This argument should contain a list with proportion and time and optionally shape i.e.
263
\texttt{dropout=list(proportion=0.03,time=365,shape=1.2)} means in the absence of events 3\% of subjects
264
will have dropped out after 365 days with a Weibull hazard rate with shape$=1.2$. If shape is not included then 
265
it defaults to 1 (exponential rate). 
266
267
<<>>=
268
results.with.dropout <- simulate(fit.with.followup, 
269
                    Nsim=500,  
270
                    seed = 20141015,
271
                    dropout=list(proportion=0.1,time=100,shape=1.2)) 
272
@
273
274
275
When plotting the graph we notice that subjects drop out (not all subjects recruited have events)
276
and that not events occur after three years after last subject randomization (as all subjects have been censored at the end of the follow up period by this time).
277
278
<<>>=
279
plot(results.with.dropout,legend.position="right")
280
@
281
282
When including subject dropout, a different number of subjects have an event each simulation. Therefore, for some values of the  \texttt{event.pred} argument the upper confidence interval (and even the median) are not defined. For example:
283
284
<<>>=
285
summary(predict(results.with.dropout,event.pred=900))
286
@
287
288
\clearpage
289
\section{Patient Accrual}
290
291
Sometimes the recruitment has not finished and it will be necessary to simulate more patients. This can be achieved using an accrual generator function. The package contains two accrual generators and additional accrual generators can easily be implemented. 
292
293
\subsection{Poisson process}
294
Subjects can be recruited using a Poisson process with rate $\lambda$. This is implemented using the \texttt{Generate.PoissonAccrual} function. If we are expecting $x$ subjects a year then the rate should be specified as $x/$daysinyear. \textbf{This is the  standard definition of the rate of a Poisson process, but the reciprocal of the value used in earlier versions of this package}
295
<<>>=
296
N <- 300
297
#start.date, the day after all known subjects have been censored/had event
298
start.date <- max(my.data@subject.data$rand.date+my.data@subject.data$time)
299
300
#Generate the accrual 
301
my.accrual <- Generate.PoissonAccrual(start.date,rate=300/daysinyear) 
302
@
303
304
In order to use the requested accrual method two additional arguments are passed to the simulate function:
305
306
<<accr,fig.width=5,fig.height=5>>=
307
results <- simulate(my.fit, 
308
                    accrualGenerator=my.accrual, #method of accrual
309
                    Naccrual=N, #how many subjects to accrue 
310
                    Nsim=500, seed=20141015)  
311
312
results <- predict(results,event.pred=1100)
313
summary(results)
314
315
plot(results)
316
@
317
318
Note the confidence intervals on the recruitment curve when plotting the results.
319
320
\subsection{Power Law Accrual $t^k/B^k$}
321
Subjects can be accrued according to the function $G(t)=t^k/B^k$ where $k$ is a parameter, $t$ is the time and $B$ is the recruitment period (for more details see the predict from parameters vignette) and the \texttt{Generate.Accrual} function is used. In this case we say that we want a fixed number of patients within a time period, e.g. in a year from now we will have recruited N additional patients and the recruitment times are evenly spaced so that their cumulative distribution function is $G(t)$. 
322
By setting \texttt{deterministic=FALSE} recruitment times will be stochastically sampled using G(t). 
323
324
<<det,fig.width=5,fig.height=5>>=
325
k <- 2
326
end.date <- as.Date(start.date + daysinyear)
327
328
my.accrual <- Generate.Accrual(start.date,end.date,k,deterministic=TRUE) 
329
results <- simulate(my.fit, accrualGenerator=my.accrual, Naccrual=N, 
330
                    Nsim=500, seed=20141015)
331
summary(results)
332
@
333
334
It is possible to recruit subjects between \texttt{start.date} and \texttt{end.date} but assume the recruitment period started at an earlier date using the argument \texttt{rec.start.date}. In this case it is assumed subjects are accrued using $G(t)$ between \texttt{rec.start.date} and \texttt{end.date} but we sample subjects conditional on being recruited after \texttt{start.date}. See \texttt{help("Generate.Accrual")} for further details.
335
This option is particularly useful if $k$ is known (or assumed from trial assumptions) and recruitment is not completed. For example if recruitment is from 1/1/15 until 1/1/16, $k=2$ and the analysis date is 1/6/15 then setting \texttt{rec.start.date} as 1/1/15, \texttt{start.date} as 1/6/15 and \texttt{end.date} as 1/1/16.
336
337
\subsubsection{Estimating $k$}
338
\label{recruitlab}
339
Given a data set \textbf{for which subject recruitment has been completed}, a maximum likelihood estimate of $k$ can be obtained using the \texttt{estimateAccrualParameter} function. By default the recruitment period is assumed to be the period between the first and last subjects who are recruited. These default options can be changed using the \texttt{start.date} and \texttt{end.date} arguments:
340
<<estimatek>>=
341
estimateAccrualParameter(my.data)
342
343
estimateAccrualParameter(my.data,start.date="2013-01-01",
344
                                 end.date="2015-12-10")
345
@
346
347
If the total recruitment period is $B$ and the individual subject recruitment times are $t_1,t_2,\ldots,t_N$ (in the same units as $B$) then $$k^* = \left.\frac{1}{\log B - \frac{1}{N}\sum_{i=1}^N \log t_i}\right. $$  
348
349
Internally this function works in units of days and if subject $i$ is recruited on start.date then we set $t_i=0.5$ in order to generate a meaningful estimate.
350
351
\subsection{Additional Accrual Distributions}
352
353
An AccrualGenerator S4 object can be created to allow additional accrual options. It has the following 3 slots:
354
\begin{itemize}
355
\item \texttt{f}: A function with a single argument (the number of subjects to be recruited which returns a vector of recruitment dates
356
\item \texttt{model}: A string containing the name of the accrual model 
357
\item \texttt{text}: The string to be used when outputting the summary of the results
358
\end{itemize}
359
360
For example, the following function allows subject accrual to follow a gamma distribution:
361
<<extraacc>>=
362
Generate.GammaAccrual <- function(start.date, rate, shape){
363
  #validate input arguments
364
  if(rate <= 0 || shape <= 0){
365
    stop("rate and shape must be positive")
366
  }
367
  
368
  #use the date validation routine from the eventPrediction package
369
  #to allow dates in the same format as in the rest of the package
370
  start.date <- eventPrediction:::FixDates(start.date)
371
  
372
  #function to generate the recruitment times
373
  f <- function(N){
374
    start.date + rgamma(n=N, shape=shape, rate=rate)
375
  }
376
  
377
  text <- paste("a gamma distribution with rate=",round(rate,2),
378
                ", shape=",round(shape,2),
379
                " and start date=",start.date,".",sep="")
380
  
381
  new("AccrualGenerator", f=f,
382
      model="Gamma distribution", text=text)
383
}
384
385
#Can use this function to recruit additional subjects
386
gamma.accrual <- Generate.GammaAccrual(start.date, rate=1.1, shape=1.1)
387
388
gamma.results <- simulate(my.fit, accrualGenerator=gamma.accrual,
389
                          Naccrual=N, Nsim=500)
390
391
@
392
393
394
395
\section{Dealing with out of Date Data}
396
\label{outofdatelab}
397
The package predicts when events will occur rather than the time at which the study team is informed of their occurrences. By default the \texttt{simulate} function does not take this lag into account which can cause problems; if there has been a large gap between the last time the status of a subject was known and the time when the analysis is performed the algorithm can predict events occurring in this lag time and therefore the expected date can be in the past.
398
399
400
Using a \texttt{LongLagSettings} object different methods of dealing with this problem can be investigated:
401
\begin{itemize}
402
\item \textbf{Treat Patients as Withdrawn from Study:} Assume if the time lag between the date of analysis (\texttt{analysis.date}) and the date at which a subject is censored is greater than a given number of days (\texttt{ndays}) then assume the subject has withdrawn.  
403
<<lag1>>=
404
lls <- LongLagSettings(analysis.date="2015-11-28",
405
                      ndays=10,
406
                      toWithdraw=TRUE)
407
408
results2 <- simulate(my.fit,longlagsettings=lls, 
409
                    Nsim=500, seed=20141015)
410
411
@
412
\item \textbf{Censor at Analysis Date:} Assume no subjects have had an event during the lag time and the study team have been informed on the analysis date:
413
<<lag2>>=
414
lls <- LongLagSettings(ndays=10, 
415
                       toWithdraw=FALSE)
416
                      #using default analysis.date (the lastest date we have
417
                      #information reported for any subject in the study)
418
419
results2 <- simulate(my.fit,longlagsettings=lls, 
420
                    Nsim=500, seed=20141015)
421
422
@
423
\item \textbf{Censor at their last visit, e.g. the most recent RECIST scan}: For progression free survival data with a visit schedule: assume no subjects have had an event during the lag time and the study team have been informed of no event at the last scheduled visit. For example, if a visit schedule is 42 days and the subject would have been on the study for 170 days we censor at 4*42=168 days.
424
<<lag3>>=
425
lls <- LongLagSettings(ndays=10, 
426
                       toWithdraw=FALSE,
427
                       visitschedule=25) #visit.schedule in days
428
                      
429
results3 <- simulate(my.fit,longlagsettings=lls, 
430
                    Nsim=500, seed=20141015)
431
432
@
433
\end{itemize}
434
Note, this may also be useful in other studies with visit schedules (e.g. CV studies).
435
436
\section{User-specified parameters}
437
438
It is possible to override the fitted model parameters when running a simulation by using the \texttt{SimParams} argument. This parameter requires a \texttt{FromDataSimParam} object which can be created by calling the \texttt{FromDataParam} with \texttt{type="weibull"} and the desired rate (in units of days$^{-1}$) and shape:    
439
440
<<>>=
441
sim <- simulate(my.fit,
442
                SimParams=FromDataParam(type="weibull",rate=0.00784812,shape=1),
443
                Nsim=500)
444
@
445
446
\section{Using only subject recruitment times}
447
448
By using the \texttt{data} argument with the \texttt{simulate} function it is possible to ignore all events and only use the study recruitment times to generate event predictions. This can be used to compare how the event predictions change when the recruitment pattern is different from the pattern expected in the study assumptions. The function \texttt{OnlyUseRecTimes} creates a new EventData object with time=0 for all subjects (and no subjects have withdrawn, had event or were censored at end of follow up period). 
449
450
<<>>=
451
#Make a copy of the EventData with time
452
#=0 for all subjects
453
my.data2 <- OnlyUseRecTimes(my.data)
454
455
#simulate only using the recruitment times
456
only.recruit.results <- simulate(my.fit,data=my.data2,Nsim=500)
457
@
458
459
When only using subject recruitment times two further arguments can be used with the \texttt{simulate} function, \texttt{HR} and \texttt{r}. These options allow studies with two arms to be investigated: if there are $N$ subjects in the trial then for each simulation $N/(r+1)$ subjects have event times simulated with the rate parameter from the model fit -- these are the control group. The remaining subjects, the active group, have an event time simulated using a rate parameter chosen such that the trial will have the hazard ratio \texttt{HR}.
460
461
In the example code below half of subjects will be sampled using rate and shape parameters derived from the model fit (say $\lambda$ and $a$) whereas the other half will be sampled with the rate $=\lambda(0.75)^{(1/a)}$ and shape$=a$.
462
<<>>=
463
test.study.assumptions <- simulate(my.fit,data=my.data2,Nsim=500,
464
                                   HR=0.75,r=1)
465
466
@
467
468
By combining the \texttt{data} argument and the \texttt{SimParams}, it is possible to combine protocol assumptions with recruitment data and in this case no model fit is needed. Suppose the control arm median time-to-event was 3 months and the Weibull shape parameter is 1.1:
469
<<>>=
470
shape <- 1.1
471
ctrl.median <- 3
472
473
#rate in units of months^-1
474
rate <- log(2)^(1/shape)/ctrl.median
475
476
#rate in unit of days^-1
477
rate <- rate*12/365
478
479
#perform prediction
480
test.study.assumptions <- simulate(data=my.data2,Nsim=500,
481
                            HR=0.75,r=1,
482
                            SimParams=FromDataParam(type="weibull",rate=rate,shape=shape))
483
484
#Note in this case we can skip the creation of my.fit and use simulate directly
485
#with the EventData object (see section below)
486
test.study.assumptions <- simulate(data=my.data2,Nsim=500,
487
                            HR=0.75,r=1,
488
                            SimParams=FromDataParam(type="weibull",rate=rate,shape=shape))
489
490
@
491
492
\section{Simulating without data}
493
494
By calling the \texttt{simulate} function with an empty \texttt{EventData} data frame it is possible 
495
to simulate a trial for which no subjects have currently been recruited, essentially simulating the 
496
theoretical results found in the from Parameters part of the package.
497
498
<<param>>=
499
#create single arm study see from Parameters vignette
500
study <- SingleArmStudy(N=800,
501
             study.duration=36,
502
             ctrl.median=3,
503
             k=0.5,
504
             shape=1.1,
505
             acc.period=10)
506
507
#perform prediction
508
from.param <- predict(study,step.size=0.1)
509
510
#calculate the rate of the Weibull distribution used  
511
rate <- log(2)^(1/1.1)/3
512
#convert rate into units used in from data part of package
513
rate <- rate*12/daysinyear
514
  
515
#create an empty EventData object
516
empty.event.data <- EmptyEventData()
517
518
#create appropriate accrual generator  
519
my.a <- Generate.Accrual(start.date="2015-01-01",
520
                         end.date="2015-11-01",
521
                         k=0.5,deterministic=FALSE)
522
523
#perform the simulation, with the first argument the
524
#EventData object (instead of applying my.fit <- fit(my.data) first)
525
sim <- simulate(data=empty.event.data,
526
                Naccrual=800,
527
                accrualGenerator=my.a,
528
                SimParams=FromDataParam(type="weibull",rate=rate,shape=shape),
529
                Nsim=500)
530
531
##Could compare the two methods:###########################
532
#rec <- as.numeric(sim@recQuantiles@median-as.Date("2015-01-01"))
533
#plot(rec,1:length(rec),type="l",xlim=c(0,900),xlab="Time",ylab="N")
534
#lines(from.param@grid$time*daysinyear/12,from.param@grid$recruit.tot,col="red")
535
  
536
#q <- as.numeric(sim@eventQuantiles@median-as.Date("2015-01-01"))
537
#lines(q,1:length(q),col="black")
538
#lines(from.param@grid$time*daysinyear/12,from.param@grid$events.tot,col="red")
539
540
@
541
542
\section{Loglogistic distribution}
543
544
It is also possible to use the event prediction package with loglogistic models. The survival function is given by
545
$$S(t) = 1 - \frac{1}{1+(x\lambda)^{-a}}$$ where $\lambda > 0$ is the rate parameter and $a > 0$ is the shape parameter.
546
547
First we fit the model:
548
<<>>=
549
#Fit the model
550
llog.fit <- fit(my.data,dist="loglogistic")
551
552
#Plot the KM curve, note for this example the fit is rather poor
553
plot(llog.fit)
554
@
555
556
Then we perform the simulation and prediction. Note it is not possible to use the \texttt{HR} argument with the loglogistic distribution as this is not a proportional hazards model.
557
<<>>=
558
#Perform the simulation
559
llog.result <- simulate(llog.fit,Nsim=500)
560
#could also use SimParams=FromDataParam(type="loglogistic",rate=rate,shape=shape) 
561
#argument to specify own parameters
562
563
#perform prediction
564
llog.result <- predict(llog.result,event.pred=900)
565
566
#Plot the results
567
plot(llog.result,xlim=c(0,100))
568
@
569
It is also possible to add other survival distributions as long as it is possible to derive a closed expression for the remaining life time conditioned on the analysis date. See \texttt{help(FromDataSimParam.class)} for further details. 
570
571
572
\section*{Appendix}
573
Additional documentation can be found at \url{http://www.seml.astrazeneca.net/~raac/event-prediction/Event-Predictions.pdf}
574
575
\bibliographystyle{plain}
576
\bibliography{eventPred}
577
578
\end{document}
579