atlantis / Git / Diff of /partyMod/man/varimp.Rd

Models:
DanielG/
atlantis
Downloads: 1
Diff of /partyMod/man/varimp.Rd [000000] .. [fbf06f]
Switch to side-by-side view

--- a
+++ b/partyMod/man/varimp.Rd
@@ -0,0 +1,127 @@
+\name{varimp}
+\alias{varimp}
+\alias{varimpAUC}
+\title{ Variable Importance }
+\description{
+    Standard and conditional variable importance for `cforest', following the permutation
+    principle of the `mean decrease in accuracy' importance in `randomForest'.
+}
+\usage{
+varimp(object, mincriterion = 0, conditional = FALSE, 
+       threshold = 0.2, nperm = 1, OOB = TRUE, pre1.0_0 = conditional)
+varimpAUC(object, mincriterion = 0, conditional = FALSE, 
+       threshold = 0.2, nperm = 1, OOB = TRUE, pre1.0_0 = conditional)
+}
+\arguments{
+  \item{object}{ an object as returned by \code{cforest}.}
+  \item{mincriterion}{ the value of the test statistic or 1 - p-value that
+                       must be exceeded in order to include a split in the 
+                       computation of the importance. The default \code{mincriterion = 0}
+                       guarantees that all splits are included.}
+  \item{conditional}{ a logical determining whether unconditional or conditional 
+                      computation of the importance is performed. }
+  \item{threshold}{ the value of the test statistic or 1 - p-value of the association 
+                    between the variable of interest and a covariate that must be 
+                    exceeded inorder to include the covariate in the conditioning 
+                    scheme for the variable of interest (only relevant if 
+                    \code{conditional = TRUE}). }
+  \item{nperm}{ the number of permutations performed.}
+  \item{OOB}{ a logical determining whether the importance is computed from the out-of-bag 
+              sample or the learning sample (not suggested).}
+  \item{pre1.0_0}{ Prior to party version 1.0-0, the actual data values
+                   were permuted according to the original permutation
+                   importance suggested by Breiman (2001). Now the assignments
+                   to child nodes of splits in the variable of interest
+                   are permuted as described by Hapfelmeier et al. (2012),
+                   which allows for missing values in the explanatory
+                   variables and is more efficient wrt memory consumption and 
+                   computing time. This method does not apply to conditional
+                   variable importances.}
+}
+\details{
+
+  Function \code{varimp} can be used to compute variable importance measures
+  similar to those computed by \code{\link[randomForest]{importance}}. Besides the
+  standard version, a conditional version is available, that adjusts for correlations between
+  predictor variables. 
+  
+  If \code{conditional = TRUE}, the importance of each variable is computed by permuting 
+  within a grid defined by the covariates that are associated  (with 1 - p-value 
+  greater than \code{threshold}) to the variable of interest.
+  The resulting variable importance score is conditional in the sense of beta coefficients in   
+  regression models, but represents the effect of a variable in both main effects and interactions.
+  See Strobl et al. (2008) for details.
+
+  Note, however, that all random forest results are subject to random variation. Thus, before
+  interpreting the importance ranking, check whether the same ranking is achieved with a
+  different random seed -- or otherwise increase the number of trees \code{ntree} in 
+  \code{\link{ctree_control}}.
+
+  Note that in the presence of missings in the predictor variables the procedure
+  described in Hapfelmeier et al. (2012) is performed.
+
+  Function \code{varimpAUC} implements AUC-based variables importances as
+  described by Janitza et al. (2012).  Here, the area under the curve
+  instead of the accuracy is used to calculate the importance of each variable. 
+  This AUC-based variable importance measure is more robust towards class imbalance.
+
+  For right-censored responses, \code{varimp} uses the integrated Brier score as a 
+  risk measure for computing variable importances. This feature is extremely slow and
+  experimental; use at your own risk.
+
+}
+\value{
+  A vector of `mean decrease in accuracy' importance scores.
+}
+\references{ 
+
+    Leo Breiman (2001). Random Forests. \emph{Machine Learning}, 45(1), 5--32.
+
+    Alexander Hapfelmeier, Torsten Hothorn, Kurt Ulm, and Carolin Strobl (2012).
+    A New Variable Importance Measure for Random Forests with Missing Data.
+    \emph{Statistics and Computing}, \url{http://dx.doi.org/10.1007/s11222-012-9349-1}
+
+    Torsten Hothorn, Kurt Hornik, and Achim Zeileis (2006b). Unbiased
+    Recursive Partitioning: A Conditional Inference Framework.
+    \emph{Journal of Computational and Graphical Statistics}, \bold{15} (3),
+    651-674.  Preprint available from 
+    \url{http://statmath.wu-wien.ac.at/~zeileis/papers/Hothorn+Hornik+Zeileis-2006.pdf}
+
+    Silke Janitza, Carolin Strobl and Anne-Laure Boulesteix (2013). An AUC-based Permutation 
+    Variable Importance Measure for Random Forests. BMC Bioinformatics.2013, \bold{14} 119.
+    \url{http://www.biomedcentral.com/1471-2105/14/119}
+
+    Carolin Strobl, Anne-Laure Boulesteix, Thomas Kneib, Thomas Augustin, and Achim Zeileis (2008).
+    Conditional Variable Importance for Random Forests. \emph{BMC Bioinformatics}, \bold{9}, 307. 
+    \url{http://www.biomedcentral.com/1471-2105/9/307}
+}
+\examples{
+    
+   set.seed(290875)
+   readingSkills.cf <- cforest(score ~ ., data = readingSkills, 
+       control = cforest_unbiased(mtry = 2, ntree = 50))
+
+   # standard importance
+   varimp(readingSkills.cf)
+   # the same modulo random variation
+   varimp(readingSkills.cf, pre1.0_0 = TRUE)
+
+   # conditional importance, may take a while...
+   varimp(readingSkills.cf, conditional = TRUE)
+
+   \dontrun{
+   data("GBSG2", package = "TH.data")
+   ### add a random covariate for sanity check
+   set.seed(29)
+   GBSG2$rand <- runif(nrow(GBSG2))
+   object <- cforest(Surv(time, cens) ~ ., data = GBSG2, 
+                     control = cforest_unbiased(ntree = 20)) 
+   vi <- varimp(object)
+   ### compare variable importances and absolute z-statistics
+   layout(matrix(1:2))
+   barplot(vi)
+   barplot(abs(summary(coxph(Surv(time, cens) ~ ., data = GBSG2))$coeff[,"z"]))
+   ### looks more or less the same
+   }
+}
+\keyword{tree}