Diff of /presentationGenerator.Rmd [000000] .. [37bd50]

Switch to side-by-side view

--- a
+++ b/presentationGenerator.Rmd
@@ -0,0 +1,604 @@
+---
+title: "Penalized logistic regression on time-to-event data using casebase sampling"
+author: "Jesse Islam"
+date: "2/27/2020"
+mathspec: true
+theme: "metropolis"
+output:
+  beamer_presentation:
+    incremental: FALSE
+    fig_caption: false
+header-includes:
+  \usepackage{xcolor}
+---
+
+```{r setup, include=FALSE}
+source("packages.R")
+set.seed(1)
+knitr::opts_chunk$set( warning=FALSE,message=FALSE,echo=FALSE)
+knitr::opts_chunk$set(dev = 'pdf')
+```
+
+
+
+# Popular methods in time-to-event analysis 
+
+* In disease etiology, we tend to make use of the proportional hazards hypothesis.
+    * Cox Regression
+
+* When we want the absolute risk:
+    * Breslow estimator
+    * Parametric models
+    
+# Motivations for a new method
+
+* Julien and Hanley found that survival analysis rarely produces prognostic functions, even though the software is widely available in cox regression packages. [2]
+* They believe the stepwise nature is the reason, as it reduces interpretability. [2]
+* A streamlined approach for reaching a **smooth absolute risk** curve. [2]
+
+# Dr. Cox's perspective
+
+![](coxquote.png)
+
+# Itinerary
+
+1. SUPPORT study
+2. Casebase sampling
+3. Penalized logistic regression on survival data
+4. Maximum likelihood with regularization
+5. Absolute risk comparison
+6. Conclusion
+7. Future work
+8. References
+
+# SUPPORT dataset [4]
+
+* **Study to Understand Prognoses and Preferences for Outcomes and Risks Treatments**
+* Design: Prospective cohort study.
+* Setting: 5 academic care centers in the United States.
+* Participants: 9105 hospitalized.
+* Follow-up-time: 5.56 years.
+* 68% incidence rate.
+
+# SUPPORT manual imputation [4]
+
+* Notorious for missing data
+\begin{table}[]
+\centering
+\resizebox{\textwidth}{!}{%
+\begin{tabular}{ll}
+\hline
+Baseline Variable      & Normal Fill-in Value \\ \hline
+Bilirubin              & 1.01                 \\
+BUN                    & 6.51                 \\
+Creatinine             & 1.01                 \\
+PaO2/FiO2 ratio (pafi) & 333.3                \\
+Serum albumin          & 3.5                  \\
+Urine output           & 2502                 \\
+White blood count      & 9 (thousands)        \\ \hline
+\end{tabular}%
+}
+\caption{Suggested imputation values. {[}3{]}}
+\label{tab:my-table}
+\end{table}
+
+# SUPPORT automated imputation
+
+* Mice imputation package (R)
+
+1. PMM (Predictive Mean Matching)  – For numeric variables
+2. logreg(Logistic Regression) – Binary Variables 
+3. polyreg(Bayesian polytomous regression) Factor Variables
+
+# Removed variables [4]
+
+* **Response variables**
+  * Hospital Charges.
+  * Patient ratio of costs to charges.
+  * Patient Micro-costs.
+* **Ordinal covariates**
+  * functional disability.
+  * Income.
+* **Sparse covariate**
+  * Surrogate activities of daily living.
+* Previous model results. (6)
+
+# Variable overview [4]
+
+* **Response variables**
+  * follow-up time, death.
+
+* **Covariates**
+  * Age, sex, race, education (6)
+  * Disease group/class, comorbidities. (3)
+  * Coma score, Therapeutic Intervention Scoring System (2)
+  * Physiological variables. (11)
+  * Activities of daily living. (2)
+
+
+
+
+# Original SUPPORT analysis [4]
+
+* Determined SUPPORT prognostic model on phase I patients.
+* Tested on Phase II.
+* Both on the scale of 180 days.
+
+# Original SUPPORT analysis [4]
+
+SUPPORT physiology score (SPS) was developed.
+```{r, out.width = "80%",include=TRUE}
+knitr::include_graphics("SPS.png")
+```
+
+
+
+# SUPPORT: my question
+
+* How does their SPS perform over 5.56 years?
+* How does the Apache III physiology score (APS) perform over 5.56 years?
+* How does a model with all the covariates, excluding SPS and APS, perform?
+
+# Analysis Process
+
+1. Impute
+2. Compare SPS and APS over ~5.56 years using absolute risk.
+3. Compare to Kaplan-Meier curve (unadjusted model).
+4. Compare to full model (excluding SPS and APS).
+
+* All models is trained on 80% of the observations.
+* Remaining observations are used to generate comparative absolute risk curves.
+  * The absolute risk curve for each individual is averaged.
+  * Averaged curve is expected to approach Kaplan-meier curve.
+
+```{r data processing,include=FALSE}
+
+#DATA retrieved from http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/support2csv.zip
+ew <- read_csv("support2.csv")
+data<-read.csv('support2.csv', col.names=colnames(ew), fill=TRUE, header=TRUE)
+rm(ew)
+```
+
+
+```{r cleaningImputation,include=FALSE}
+#manual imputation
+data$alb[is.na(data$alb)]=3.5
+data$pafi[is.na(data$pafi)]=333.3
+data$bili[is.na(data$bili)]=1.01
+data$crea[is.na(data$crea)]=1.01
+data$bun[is.na(data$bun)]=6.51
+data$wblc[is.na(data$wblc)]=9
+data$urine[is.na(data$urine)]=2502
+
+# automated imputation
+previousModelPositions<-c(18,19,20,21,22,23,24,25,26,27,28,29)
+
+supportMain<-data[,-previousModelPositions]
+supportPrevious<-data[,previousModelPositions]
+#keep 1,2 after imputation from previous usable models.
+
+#missingPattern<-md.pattern(supportMain)
+supportMainImpute <-mice(supportMain[,-c(11,34,13,14,15)], m = 1,printFlag=FALSE) #impute while removing ordinal variables and other response variables.
+imputedData<-cbind(complete(supportMainImpute,1),supportPrevious[,c(1,2)])
+#adls is missing 33%, and is removed accordingly and hospitaldeath
+#md.pattern(completeData)
+completeData<-na.omit(imputedData[,-c(4,29)])
+workingCompleteData=model.matrix(death~ .-d.time,data=completeData)[,-c(1)]#remove intercept
+#Create u and xCox, which will be used when fitting Cox with glmnet.
+newData=workingCompleteData
+x=workingCompleteData
+#x and y will be used to fit the casebase model under glmnet.
+#x=as.matrix(sparse.model.matrix(death~ .-d.time+0,data=workingData))
+y=data.matrix(completeData[,c(2,5)])
+```
+
+```{r CoxExplorationModels,include=FALSE, eval = TRUE}
+fullDataCox<-as.data.frame(cbind(as.numeric(y[,2]),as.numeric(y[,1]),x))
+train_index <- sample(1:nrow(fullDataCox), 0.8 * nrow(fullDataCox))
+test_index <- setdiff(1:nrow(fullDataCox), train_index)
+
+train<-fullDataCox[train_index,]
+test<-fullDataCox[test_index,]
+coxSPS<- survival::coxph(Surv(time=V1,event=V2) ~ sps, data = train)
+abcoxSPS<-survival::survfit(coxSPS,newdata = test)
+
+coxAPS<- survival::coxph(Surv(time=V1,event=V2) ~ aps, data = train)
+abcoxAPS<-survival::survfit(coxAPS,newdata = test)
+
+coxFull<- survival::coxph(Surv(time=V1,event=V2) ~ .-sps -aps, data = train)
+abcoxFull<-survival::survfit(coxAPS,newdata = test)
+
+KM<- survival::coxph(Surv(time=V1,event=V2) ~ 1, data = test)
+abKM<-survival::survfit(KM)
+abKMcompare<-cbind(abKM$time[abKM$time<=1631],1-abKM$surv[abKM$time<=1631])
+colnames(abKMcompare)<-c("Time","KM")
+
+baseCoxComparisonData<-cbind(abcoxSPS$time[abcoxSPS$time<=1631],1-rowMeans(abcoxSPS$surv[abcoxSPS$time<=1631,]),1-rowMeans(abcoxAPS$surv[abcoxSPS$time<=1631,]),1-abKM$surv,1-rowMeans(abcoxFull$surv[abcoxSPS$time<=1631,]))
+colnames(baseCoxComparisonData)<-c("Time","SPS","APS","Unadjusted","Full")
+myColors <- brewer.pal(5,"Dark2")
+FigureWidth='90%'
+```
+
+
+# SPS vs APS
+```{r SPSvsAPS,include=TRUE, out.width = FigureWidth}
+#Full
+ggplot(as.data.frame(baseCoxComparisonData), aes(Time)) + 
+  geom_line(aes(y = SPS, colour = "SPS"),lwd=2) +
+  geom_line(aes(y = APS, colour = "APS"),lwd=2) +
+  scale_colour_manual(name = "Models",values = myColors[c(3,4)])+
+  labs(y= "Probability of Death", x = "Survival time (Days)")
+```
+
+# SPS vs. Kaplan-Meier
+```{r SPSvsKM,include=TRUE, out.width = FigureWidth}
+ggplot(as.data.frame(baseCoxComparisonData), aes(Time)) + 
+  geom_line(aes(y = SPS, colour = "SPS"),lwd=2) +
+  geom_line(aes(y = APS, colour = "APS"),lwd=2) +
+  geom_step(data=as.data.frame(abKMcompare),mapping=aes(x=Time,y=KM,color="Unadjusted"),lwd=2)+
+  scale_colour_manual(name = "Models",values = myColors[c(3,4,2)])+
+  labs(y= "Probability of Death", x = "Survival time (Days)")
+```
+
+# All covariates vs. physiology scores vs unadjusted
+```{r FullvsKM,include=TRUE, out.width = FigureWidth}
+ggplot(as.data.frame(baseCoxComparisonData), aes(Time)) + 
+  geom_line(aes(y = SPS, colour = "SPS"),lwd=2) + 
+  geom_line(aes(y = APS, colour = "APS"),lwd=2) +
+  geom_step(data=as.data.frame(abKMcompare),mapping=aes(x=Time,y=KM,color="Unadjusted"),lwd=2)+
+  geom_line(aes(y = Full, col = "All Covariates without SPS APS"),lwd=2)+
+  scale_colour_manual(name = "Models",values = myColors[c(1,3,4,2)])+
+  labs(y= "Probability of Death", x = "Survival time (Days)")
+```
+
+
+# Chosen absolute risk comparisons
+```{r ,include=TRUE, out.width = FigureWidth}
+ggplot(as.data.frame(baseCoxComparisonData), aes(Time)) + 
+  geom_step(data=as.data.frame(abKMcompare),mapping=aes(x=Time,y=KM,color="Unadjusted"),lwd=2)+
+  geom_line(aes(y = Full, colour = "Full"),lwd=2)+
+  scale_colour_manual(name = "Models",values = myColors[c(1,2)])+
+  labs(y= "Probability of Death", x = "Survival time (Days)")
+```
+
+# Chosen absolute risk comparisons: conclusion
+
+* Linear associations without physiology scores perform similarly to SPS and APS alone.
+* We choose the linear associations without physiology scores as the model of choice (Full model). 
+
+# Casebase sampling overview
+
+1. Clever sampling.
+2. Implicitly deals with censoring.
+3. Allows a parametric fit using *logistic regression*.
+
+* Casebase is parametric, and allows different parametric fits by incorporation of the time component.
+* Package contains an implementation for generating *population-time* plots.
+
+# Casebase: Sampling
+
+```{r DataModel, echo=FALSE, warning=FALSE,message=FALSE,include=TRUE, out.width = FigureWidth}
+nobs <- nrow(data)
+ftime <- data$d.time
+ord <- order(ftime, decreasing = FALSE)
+#ordSample=order(sample(ord,1000))
+yCoords <- cbind(cumsum(data[ord, "death"] == 1),
+cumsum(data[ord, "death"] == 0))
+yCoords <- cbind(yCoords, nobs - rowSums(yCoords))
+aspectRatio <- 0.75
+height <- 8.5 * aspectRatio; width <- 11 * aspectRatio
+cases <- data$death == 1
+comps <-data$death == 2
+# randomly move the cases vertically
+moved_cases <- yCoords[cases[ord], 3] * runif(sum(cases))
+moved_comps <- yCoords[comps[ord], 3] * runif(sum(comps))
+moved_bases <- yCoords[cases[ord], 3] * runif(sum(cases))
+plot(0, type = 'n', xlim = c(0, max(ftime)), ylim = c(0, nobs),
+xlab = 'Follow-up time (days)', ylab = 'Population',cex.lab=1.5, cex.axis=1.5, cex.main=1.5, cex.sub=1.5)
+ptsz=0.5
+```
+
+# Casebase: Sampling
+```{r , echo=FALSE, warning=FALSE,message=FALSE,include=TRUE, out.width = FigureWidth}
+plot(0, type = 'n', xlim = c(0, max(ftime)), ylim = c(0, nobs),
+xlab = 'Follow-up time (days)', ylab = 'Population',cex.lab=1.5, cex.axis=1.5, cex.main=1.5, cex.sub=1.5)
+polygon(c(0, 0, ftime[ord], max(ftime), 0),
+c(0, nobs, yCoords[,3], 0, 0),
+col = "grey60")
+legend("topright", legend = c("Base"),
+col = c("grey60"),
+pch = 19,cex=1.5)
+```
+
+# Casebase: Sampling
+```{r , echo=FALSE, warning=FALSE,message=FALSE,include=TRUE, out.width = FigureWidth}
+plot(0, type = 'n', xlim = c(0, max(ftime)), ylim = c(0, nobs),
+xlab = 'Follow-up time (days)', ylab = 'Population',cex.lab=1.5, cex.axis=1.5, cex.main=1.5, cex.sub=1.5)
+polygon(c(0, 0, ftime[ord], max(ftime), 0),
+c(0, nobs, yCoords[,3], 0, 0),
+col = "grey60")
+points((ftime[ord])[cases[ord]], moved_bases, pch = 19,
+col = "blue", cex = ptsz)
+legend("topright", legend = c("Base","base-series"),
+col = c("grey60","blue"),
+pch = 19,cex=1.5)
+```
+
+# Casebase: Sampling
+```{r , echo=FALSE, warning=FALSE,message=FALSE,include=TRUE, out.width = FigureWidth}
+plot(0, type = 'n', xlim = c(0, max(ftime)), ylim = c(0, nobs),
+xlab = 'Follow-up time (days)', ylab = 'Population',cex.lab=1.5, cex.axis=1.5, cex.main=1.5, cex.sub=1.5)
+polygon(c(0, 0, ftime[ord], max(ftime), 0),
+c(0, nobs, yCoords[,3], 0, 0),
+col = "grey60")
+points((ftime[ord])[cases[ord]], moved_bases, pch = 19,
+col = "blue", cex = ptsz)
+points((ftime[ord])[cases[ord]], yCoords[cases[ord],3], pch = 19,
+col = "firebrick3", cex = ptsz)
+legend("topright", legend = c("Base","base-series","case-series (Death)"),
+col = c("grey60","blue","firebrick3"),
+pch = 19,cex=1.5)
+```
+
+# Casebase: Sampling
+
+```{r , echo=FALSE, warning=FALSE,message=FALSE,include=TRUE, out.width = FigureWidth}
+
+plot(0, type = 'n', xlim = c(0, max(ftime)), ylim = c(0, nobs),
+xlab = 'Follow-up time (days)', ylab = 'Population',cex.lab=1.5, cex.axis=1.5, cex.main=1.5, cex.sub=1.5)
+polygon(c(0, 0, ftime[ord], max(ftime), 0),
+c(0, nobs, yCoords[,3], 0, 0),
+col = "grey60")
+points((ftime[ord])[cases[ord]], moved_bases, pch = 19,
+col = "blue", cex = ptsz)
+points((ftime[ord])[cases[ord]], moved_cases, pch = 19,
+col = "firebrick3", cex = ptsz)
+#points((ftime[ord])[bases[ord]], moved_bases, pch = 19,
+#col = "blue", cex = ptsz)
+legend("topright", legend = c("Base","base-series","case-series (Death)"),
+col = c("grey60", "blue","firebrick3"),
+pch = 19,cex=1.5)
+
+
+```
+
+# Casebase: Sampling
+
+$$e^{L}=\frac{Pr(Y=1|x,t)}{Pr(Y=0|x,t)}=\frac{h(x,t)*B(x,t)}{b[B(x,t)/B]}=\frac{h(x,t)*B}{b}$$
+
+* $L=\beta X$
+* $b$ = base-series.
+* $B$ = Base.
+* *B(x,t)* =  Risk-set for survival time $t$.
+
+
+# Casebase: Sampling
+
+$e^{L}=\frac{Pr(Y=1|x,t)}{Pr(Y=0|x,t)}=\frac{h(x,t)*B(x,t)}{b[B(x,t)/B]}=\frac{h(x,t)*B}{b}$
+
+```{r , echo=FALSE, warning=FALSE,message=FALSE,include=TRUE, out.width = FigureWidth}
+
+plot(0, type = 'n', xlim = c(0, max(ftime)), ylim = c(0, nobs),
+xlab = 'Follow-up time (days)', ylab = 'Population',cex.lab=1.5, cex.axis=1.5, cex.main=1.5, cex.sub=1.5)
+polygon(c(0, 0, ftime[ord], max(ftime), 0),
+c(0, nobs, yCoords[,3], 0, 0),
+col = "grey60")
+points((ftime[ord])[cases[ord]], moved_bases, pch = 19,
+col = "blue", cex = ptsz)
+points((ftime[ord])[cases[ord]], moved_cases, pch = 19,
+col = "firebrick3", cex = ptsz)
+#points((ftime[ord])[bases[ord]], moved_bases, pch = 19,
+#col = "blue", cex = ptsz)
+abline(v=990,lwd=2)
+abline(v=1010,lwd=2)
+legend("topright", legend = c("Base","base-series","case-series (Death)"),
+col = c("grey60", "blue","firebrick3"),
+pch = 19,cex=1.5)
+```
+
+
+
+# log-odds = log hazard
+
+ $$e^{L}=\frac{\hat{h}(x,t)*B}{b} $$
+ $$ \hat{h}(x,t)= \frac{b*e^{L}}{B}$$
+ $$log(\hat{h}(x,t))= L+log(\frac{b}{B})$$
+
+
+
+
+
+# Maximum log-likelihood [1]
+
+ $$  log(l(\beta_{0},\beta))=\frac{1}{N}\sum^{N}_{i=1}\{y_{i}(\beta_{0}+x_{i}^{T}\beta)-log(1+e^{\beta_{0}+x_{i}^{T}\beta})\}  $$
+ 
+# Maximum log-likelihood, with offset
+ $$  log(l(\textcolor{blue}{log(\frac{b}{B})},\beta))=\frac{1}{N}\sum^{N}_{i=1}\{y_{i}(\textcolor{blue}{log(\frac{b}{B})}+x_{i}^{T}\beta)-log(1+e^{\textcolor{blue}{log(\frac{b}{B})}+x_{i}^{T}\beta})\}  $$
+
+# Maximum log-likelihood, with offset and lasso
+
+
+$\frac{1}{N}\sum^{N}_{i=1}\{y_{i}(\textcolor{blue}{log(\frac{b}{B})}+x_{i}^{T}\beta)-log(1+e^{\textcolor{blue}{log(\frac{b}{B})}+x_{i}^{T}\beta})\}  \textcolor{red}{-\lambda||\beta||}$
+ 
+# Casebase: Parametric families
+* We can now fit models of the form:
+$$\ log(h(t;\alpha,\beta))=g(t;\alpha)+\beta X$$
+* By changing the function $g(t;\alpha)$, we can model different parametric families easily:
+
+
+# Casebase: Parametric models
+*Exponential*: $g(t;\alpha)$ is equal to a constant
+```{r,echo=TRUE,eval=FALSE}
+casebase::fitSmoothHazard(status ~ X1 + X2)
+```
+*Gompertz*: $g(t;\alpha)=\alpha t$
+```{r,echo=TRUE,eval=FALSE}
+casebase::fitSmoothHazard(status ~ time + X1 + X2)
+```
+*Weibull*: $g(t;\alpha) = \alpha log(t)$
+```{r,echo=TRUE,eval=FALSE}
+casebase::fitSmoothHazard(status ~ log(time) + X1 + X2)
+```
+
+
+# Absolute Risk
+* We have a bunch of different parametric hazard models now.
+* To get the absolute risk, we need to evaluate the following equation in relation to the hazard:
+$$CI(x,t)=1-e^{-\int^{t}_{0}h(x,u)du}$$
+* *CI(x,t)*= Cumulative Incidence (Absolute Risk)
+* *h(x,u)*= Hazard function
+
+* Lets use the weibull hazard.
+
+
+# Models to be compared
+
+* Recall: Case study to demonstrate regularization using our method.
+
+* **unadjusted**: (death, time) ~ 1
+* **Cox**: (death, time) ~  $\beta X$
+* **Coxnet**: (death, time) ~  $\beta X$ <--Lasso
+* **Penalized logistic**: death ~ log(time) + $\beta X$ <--Lasso 
+
+
+
+```{r supportSetUp, echo = FALSE, eval = TRUE}
+
+ratio=10  #universal ratio for TRUE
+#keep one individual out, at random, for which we will develop an 
+#absolute risk curve.
+#sam=sample(1:nrow(completeData), )
+#Set workingData to the remaining individuals in the data
+#workingData=completeData[-c(sam),]
+
+#x and y will be used to fit the casebase model under glmnet.
+#x=as.matrix(sparse.model.matrix(death~ .-d.time+0,data=workingData))
+y=data.matrix(train[,c(2,1)])
+colnames(y)<-c("death","d.time")
+x=data.matrix(train[,-c(2,1)])
+newData=data.matrix(test[,-c(2,1)])
+```
+
+```{r coxHazAbsolute, echo = FALSE }
+#Create u and xCox, which will be used when fitting Cox with glmnet.
+
+u=survival::Surv(time = as.numeric(y[,2]), event = as.factor(y[,1]))
+
+xCox=as.matrix(cbind(data.frame("(Intercept)"=rep(1,each=length(x[,1]))),x))
+colnames(xCox)[2]="(Intercept)"
+
+#hazard for cox using glmnet
+coxNet=glmnet::cv.glmnet(x=x,y=u, family="cox",alpha=1,standardize = TRUE)
+#convergence demonstrated in plot
+#plot(coxNet)
+#taking the coefficient estimates for later use
+nonzero_covariate_cox <- predict(coxNet, type = "nonzero", s = "lambda.1se")
+nonzero_coef_cox <- coef(coxNet, s = "lambda.1se")
+#creating a new dataset that only contains the covariates chosen through glmnet.
+#cleanCoxData<- as.data.frame(cbind(as.numeric(workingData$d.time),as.factor(workingData$death), xCox[,nonzero_covariate_cox$X1]))
+cleanCoxData<-as.data.frame(cbind(d.time=as.numeric(y[,2]),death=as.numeric(y[,1]),x[,nonzero_covariate_cox$X1]))
+#newDataCox<-xCox[sam,nonzero_covariate_cox$X1]
+#fitting a cox model using regular estimation, however we will not keep it.
+#this is used more as an object place holder.
+coxNet <- survival::coxph(Surv(time=d.time,event=death) ~ ., data = cleanCoxData)
+#The coefficients of this object will be replaced with the estimates from coxNet.
+#Doing so makes it so that everything is invalid aside from the coefficients.
+#In this case, all we need to estimate the absolute risk is the coefficients.
+#Std. error would be incorrect here, if we were to draw error bars.
+coxNet$coefficients<-nonzero_coef_cox@x
+#Fitting absolute risk curve for cox+glmnet. -1 to remove intercept
+#posCox=nonzero_covariate_cox$X1
+#newDataCoxI=cleanCoxDataI[,posCox]
+
+abcoxNet<-survival::survfit(coxNet,type="breslow",newdata=as.data.frame(newData[,nonzero_covariate_cox$X1]))
+
+
+abCN=as.data.frame(cbind(abcoxNet$time, 1-rowMeans(abcoxNet$surv)))
+colnames(abCN)<-c("Time","Absolute risk")
+abCN$Model="Coxnet"
+abCN=abCN[abCN$Time<=max(abKMcompare[,1]),]
+#plot(abCN[,1],1-abCN[,2])
+```
+
+```{r cbHazard}, echo = FALSE, eval = TRUE}
+
+cbFull=casebase::fitSmoothHazard.fit(x,y,family="glmnet",time="d.time",event="death",formula_time = ~log(d.time),alpha=1,ratio=ratio,standardize = TRUE)
+```
+
+```{r CBabsolute, echo = FALSE, eval = TRUE}
+#Estimating the absolute risk curve using the newData parameter.
+abCbFull=casebase::absoluteRisk(cbFull,time = seq(1,max(abKMcompare[,1]), 25),newdata = newData , s="lambda.1se",method=c("numerical"))
+abCB=data.frame(abCbFull[,1],rowMeans(abCbFull[,-c(1)]))
+
+colnames(abCB)<-c("Time","Absolute risk")
+abCB$Model="Penalized logistic"
+
+ab=rbind(abCB,abCN)
+
+```
+
+# Survival comparison
+
+```{r include=TRUE}
+ggplot(as.data.frame(ab), aes(Time)) + 
+    geom_line(data=as.data.frame(baseCoxComparisonData),mapping=aes(y = Full, colour = "Cox"),lwd=2)+
+  geom_line(aes(y = `Absolute risk`, colour = Model),lwd=2)+
+  geom_step(data=as.data.frame(abKMcompare),mapping=aes(x=Time,y=KM,color="Unadjusted"),lwd=2)+
+  scale_colour_manual(name = "Models",values = myColors[c(4,1,3,2)])+
+  labs(y= "Probability of Death", x = "Survival time (Days)")
+```
+
+# Conclusions / take homes
+
+* Classical survival analysis requires methods to encorporate censorship in our data.
+* Case-base sampling is a techinique that implicitly encorporates censorship implicitly.
+* Logistic regression on SUPPORT dataset had slightly different results near the end of follow-up time. 
+
+# Future work
+
+* Comparative measure.
+* Survival GWAS.
+
+
+# References 1
+
+1.Czepiel, S. A. (2002). Maximum likelihood estimation of logistic regression models: theory and implementation. Available at czep. net/stat/mlelr. pdf, 1825252548-1564645290.
+
+
+2.Hanley, James A, and Olli S Miettinen. 2009. "Fitting Smooth-in-Time Prognostic Risk Functions via Logistic Regression." *The International Journal of Biostatistics 5 (1)*.
+
+3.Harrell, F. (2020). SupportDesc < Main < Vanderbilt Biostatistics Wiki. [online] Biostat.mc.vanderbilt.edu. Available at: http://biostat.mc.vanderbilt.edu/wiki/Main/SupportDesc [Accessed 25 Feb. 2020].
+
+# References 2
+
+4.Knaus, W. A., Harrell, F. E., Lynn, J., Goldman, L., Phillips, R. S., Connors, A. F., ... & Layde, P. (1995). The SUPPORT prognostic model: Objective estimates of survival for seriously ill hospitalized adults. Annals of internal medicine, 122(3), 191-203.
+
+5.Saarela, Olli, and Elja Arjas. 2015. "Non-Parametric Bayesian Hazard Regression for Chronic Disease Risk Assessment." Scandinavian Journal of Statistics 42 (2). Wiley Online Library: 609–26. 
+
+6.Saarela, Olli. 2015. "A Case-Base Sampling Method for Estimating Recurrent Event Intensities." *Lifetime Data Analysis*. Springer, 1–17
+
+# References 3
+
+7.Scrucca L, Santucci A, Aversa F. Competing risk analysis using R: an easy guide for clinicians. *Bone Marrow Transplant*. 2007 Aug;40(4):381-7. doi: 10.1038/sj.bmt.1705727.
+
+8.Turgeon, M. (2017, June 10). Retrieved May 05, 2019, from https://www.maxturgeon.ca/slides/MTurgeon-2017-Student-Conference.pdf
+
+
+
+# Tutorial and slides
+
+<center>
+
+**Tutorial:**
+
+http://sahirbhatnagar.com/casebase/
+
+**Slides:**
+
+https://github.com/Jesse-Islam/ATGC_survival_presentation_Feb.27.2020
+
+**Questions?**
+
+</center>
+
+