--- a +++ b/partyMod/man/cforest_control.Rd @@ -0,0 +1,103 @@ +\name{Control Forest Hyper Parameters} +\alias{cforest_control} +\alias{cforest_classical} +\alias{cforest_unbiased} +\title{ Control for Conditional Tree Forests } +\description{ + + Various parameters that control aspects of the `cforest' fit via + its `control' argument. + +} +\usage{ +cforest_unbiased(\dots) +cforest_classical(\dots) +cforest_control(teststat = "max", + testtype = "Teststatistic", + mincriterion = qnorm(0.9), + savesplitstats = FALSE, + ntree = 500, mtry = 5, replace = TRUE, + fraction = 0.632, trace = FALSE, \dots) +} +\arguments{ + \item{teststat}{ a character specifying the type of the test statistic + to be applied. } + \item{testtype}{ a character specifying how to compute the distribution of + the test statistic. } + \item{mincriterion}{ the value of the test statistic (for \code{testtype == "Teststatistic"}), + or 1 - p-value (for other values of \code{testtype}) that + must be exceeded in order to implement a split. } + \item{mtry}{ number of input variables randomly sampled as candidates + at each node for random forest like algorithms. Bagging, as special case + of a random forest without random input variable sampling, can + be performed by setting \code{mtry} either equal to \code{NULL} or + manually equal to the number of input variables.} + \item{savesplitstats}{ a logical determining whether the process of standardized + two-sample statistics for split point estimate + is saved for each primary split.} + \item{ntree}{ number of trees to grow in a forest.} + \item{replace}{ a logical indicating whether sampling of observations is + done with or without replacement.} + \item{fraction}{ fraction of number of observations to draw without + replacement (only relevant if \code{replace = FALSE}).} + \item{trace}{ a logical indicating if a progress bar shall be printed + while the forest grows.} + \item{\dots}{ additional arguments to be passed to + \code{\link{ctree_control}}.} +} +\details{ + + All three functions return an object of class \code{\link{ForestControl-class}} + defining hyper parameters to be specified via the \code{control} argument + of \code{\link{cforest}}. + + The arguments \code{teststat}, \code{testtype} and \code{mincriterion} + determine how the global null hypothesis of independence between all input + variables and the response is tested (see \code{\link{ctree}}). The + argument \code{nresample} is the number of Monte-Carlo replications to be + used when \code{testtype = "MonteCarlo"}. + + A split is established when the sum of the weights in both daugther nodes + is larger than \code{minsplit}, this avoids pathological splits at the + borders. When \code{stump = TRUE}, a tree with at most two terminal nodes + is computed. + + The \code{mtry} argument regulates a random selection of \code{mtry} input + variables in each node. Note that here \code{mtry} is fixed to the value 5 by + default for merely technical reasons, while in \code{\link[randomForest]{randomForest}} + the default values for classification and regression vary with the number of input + variables. Make sure that \code{mtry} is defined properly before using \code{cforest}. + + It might be informative to look at scatterplots of input variables against + the standardized two-sample split statistics, those are available when + \code{savesplitstats = TRUE}. Each node is then associated with a vector + whose length is determined by the number of observations in the learning + sample and thus much more memory is required. + + The number of trees \code{ntree} can be increased for large numbers of input variables. + + Function \code{cforest_unbiased} returns the settings suggested + for the construction of unbiased random forests (\code{teststat = "quad", testtype = "Univ", + replace = FALSE}) by Strobl et al. (2007) + and is the default since version 0.9-90. + Hyper parameter settings mimicing the behaviour of + \code{\link[randomForest]{randomForest}} are available in + \code{cforest_classical} which have been used as default up to + version 0.9-14. + + Please note that \code{\link{cforest}}, in contrast to + \code{\link[randomForest]{randomForest}}, doesn't grow trees of + maximal depth. To grow large trees, set \code{mincriterion = 0}. + +} +\value{ + An object of class \code{\link{ForestControl-class}}. +} +\references{ + + Carolin Strobl, Anne-Laure Boulesteix, Achim Zeileis and Torsten Hothorn (2007). + Bias in Random Forest Variable Importance Measures: Illustrations, Sources and + a Solution. \emph{BMC Bioinformatics}, \bold{8}, 25. + \url{http://www.BioMedCentral.com/1471-2105/8/25/} +} +\keyword{misc}