atlantis / Git / [fbf06f] /partyMod/man/cforest

Models:
DanielG/
atlantis
Downloads: 1
[fbf06f]: / partyMod / man / cforest_control.Rd
History
Download this file
104 lines (91 with data), 4.9 kB

\name{Control Forest Hyper Parameters}
\alias{cforest_control}
\alias{cforest_classical}
\alias{cforest_unbiased}
\title{ Control for Conditional Tree Forests }
\description{

  Various parameters that control aspects of the `cforest' fit via
  its `control' argument.

}
\usage{
cforest_unbiased(\dots)
cforest_classical(\dots)
cforest_control(teststat = "max",
                testtype = "Teststatistic",
                mincriterion = qnorm(0.9),
                savesplitstats = FALSE,
                ntree = 500, mtry = 5, replace = TRUE,
                fraction = 0.632, trace = FALSE, \dots)
}
\arguments{
  \item{teststat}{ a character specifying the type of the test statistic
                       to be applied. }
  \item{testtype}{ a character specifying how to compute the distribution of
                   the test statistic. }
  \item{mincriterion}{ the value of the test statistic (for \code{testtype == "Teststatistic"}),
                       or 1 - p-value (for other values of \code{testtype}) that
                       must be exceeded in order to implement a split. }
  \item{mtry}{ number of input variables randomly sampled as candidates 
               at each node for random forest like algorithms. Bagging, as special case 
               of a random forest without random input variable sampling, can 
               be performed by setting \code{mtry} either equal to \code{NULL} or 
               manually equal to the number of input variables.}
  \item{savesplitstats}{ a logical determining whether the process of standardized
                         two-sample statistics for split point estimate
                         is saved for each primary split.}
  \item{ntree}{ number of trees to grow in a forest.}
  \item{replace}{ a logical indicating whether sampling of observations is 
                 done with or without replacement.}
  \item{fraction}{ fraction of number of observations to draw without 
                   replacement (only relevant if \code{replace = FALSE}).}
  \item{trace}{ a logical indicating if a progress bar shall be printed
                while the forest grows.}
  \item{\dots}{ additional arguments to be passed to 
                \code{\link{ctree_control}}.}
}
\details{

  All three functions return an object of class \code{\link{ForestControl-class}}
  defining hyper parameters to be specified via the \code{control} argument
  of \code{\link{cforest}}.

  The arguments \code{teststat}, \code{testtype} and \code{mincriterion}
  determine how the global null hypothesis of independence between all input
  variables and the response is tested (see \code{\link{ctree}}). The 
  argument \code{nresample} is the number of Monte-Carlo replications to be
  used when \code{testtype = "MonteCarlo"}.

  A split is established when the sum of the weights in both daugther nodes
  is larger than \code{minsplit}, this avoids pathological splits at the
  borders. When \code{stump = TRUE}, a tree with at most two terminal nodes
  is computed.

  The \code{mtry} argument regulates a random selection of \code{mtry} input 
  variables in each node. Note that here \code{mtry} is fixed to the value 5 by 
  default for merely technical reasons, while in \code{\link[randomForest]{randomForest}} 
  the default values for classification and regression vary with the number of input 
  variables. Make sure that \code{mtry} is defined properly before using \code{cforest}.

  It might be informative to look at scatterplots of input variables against
  the standardized two-sample split statistics, those are available when
  \code{savesplitstats = TRUE}. Each node is then associated with a vector
  whose length is determined by the number of observations in the learning
  sample and thus much more memory is required.

  The number of trees \code{ntree} can be increased for large numbers of input variables.

  Function \code{cforest_unbiased} returns the settings suggested 
  for the construction of unbiased random forests (\code{teststat = "quad", testtype = "Univ", 
    replace = FALSE}) by Strobl et al. (2007)
  and is the default since version 0.9-90.
  Hyper parameter settings mimicing the behaviour of
  \code{\link[randomForest]{randomForest}} are available in
  \code{cforest_classical} which have been used as default up to
  version 0.9-14. 

  Please note that \code{\link{cforest}}, in contrast to 
  \code{\link[randomForest]{randomForest}}, doesn't grow trees of
  maximal depth. To grow large trees, set \code{mincriterion = 0}.

}
\value{
  An object of class \code{\link{ForestControl-class}}.
}
\references{

    Carolin Strobl, Anne-Laure Boulesteix, Achim Zeileis and Torsten Hothorn (2007).
    Bias in Random Forest Variable Importance Measures: Illustrations, Sources and  
    a Solution. \emph{BMC Bioinformatics}, \bold{8}, 25. 
    \url{http://www.BioMedCentral.com/1471-2105/8/25/}
}
\keyword{misc}