atlantis / Git / Diff of /partyMod/man/cforest

Models:
DanielG/
atlantis
Downloads: 1
Diff of /partyMod/man/cforest_control.Rd [000000] .. [fbf06f]
Switch to side-by-side view

--- a
+++ b/partyMod/man/cforest_control.Rd
@@ -0,0 +1,103 @@
+\name{Control Forest Hyper Parameters}
+\alias{cforest_control}
+\alias{cforest_classical}
+\alias{cforest_unbiased}
+\title{ Control for Conditional Tree Forests }
+\description{
+
+  Various parameters that control aspects of the `cforest' fit via
+  its `control' argument.
+
+}
+\usage{
+cforest_unbiased(\dots)
+cforest_classical(\dots)
+cforest_control(teststat = "max",
+                testtype = "Teststatistic",
+                mincriterion = qnorm(0.9),
+                savesplitstats = FALSE,
+                ntree = 500, mtry = 5, replace = TRUE,
+                fraction = 0.632, trace = FALSE, \dots)
+}
+\arguments{
+  \item{teststat}{ a character specifying the type of the test statistic
+                       to be applied. }
+  \item{testtype}{ a character specifying how to compute the distribution of
+                   the test statistic. }
+  \item{mincriterion}{ the value of the test statistic (for \code{testtype == "Teststatistic"}),
+                       or 1 - p-value (for other values of \code{testtype}) that
+                       must be exceeded in order to implement a split. }
+  \item{mtry}{ number of input variables randomly sampled as candidates 
+               at each node for random forest like algorithms. Bagging, as special case 
+               of a random forest without random input variable sampling, can 
+               be performed by setting \code{mtry} either equal to \code{NULL} or 
+               manually equal to the number of input variables.}
+  \item{savesplitstats}{ a logical determining whether the process of standardized
+                         two-sample statistics for split point estimate
+                         is saved for each primary split.}
+  \item{ntree}{ number of trees to grow in a forest.}
+  \item{replace}{ a logical indicating whether sampling of observations is 
+                 done with or without replacement.}
+  \item{fraction}{ fraction of number of observations to draw without 
+                   replacement (only relevant if \code{replace = FALSE}).}
+  \item{trace}{ a logical indicating if a progress bar shall be printed
+                while the forest grows.}
+  \item{\dots}{ additional arguments to be passed to 
+                \code{\link{ctree_control}}.}
+}
+\details{
+
+  All three functions return an object of class \code{\link{ForestControl-class}}
+  defining hyper parameters to be specified via the \code{control} argument
+  of \code{\link{cforest}}.
+
+  The arguments \code{teststat}, \code{testtype} and \code{mincriterion}
+  determine how the global null hypothesis of independence between all input
+  variables and the response is tested (see \code{\link{ctree}}). The 
+  argument \code{nresample} is the number of Monte-Carlo replications to be
+  used when \code{testtype = "MonteCarlo"}.
+
+  A split is established when the sum of the weights in both daugther nodes
+  is larger than \code{minsplit}, this avoids pathological splits at the
+  borders. When \code{stump = TRUE}, a tree with at most two terminal nodes
+  is computed.
+
+  The \code{mtry} argument regulates a random selection of \code{mtry} input 
+  variables in each node. Note that here \code{mtry} is fixed to the value 5 by 
+  default for merely technical reasons, while in \code{\link[randomForest]{randomForest}} 
+  the default values for classification and regression vary with the number of input 
+  variables. Make sure that \code{mtry} is defined properly before using \code{cforest}.
+
+  It might be informative to look at scatterplots of input variables against
+  the standardized two-sample split statistics, those are available when
+  \code{savesplitstats = TRUE}. Each node is then associated with a vector
+  whose length is determined by the number of observations in the learning
+  sample and thus much more memory is required.
+
+  The number of trees \code{ntree} can be increased for large numbers of input variables.
+
+  Function \code{cforest_unbiased} returns the settings suggested 
+  for the construction of unbiased random forests (\code{teststat = "quad", testtype = "Univ", 
+    replace = FALSE}) by Strobl et al. (2007)
+  and is the default since version 0.9-90.
+  Hyper parameter settings mimicing the behaviour of
+  \code{\link[randomForest]{randomForest}} are available in
+  \code{cforest_classical} which have been used as default up to
+  version 0.9-14. 
+
+  Please note that \code{\link{cforest}}, in contrast to 
+  \code{\link[randomForest]{randomForest}}, doesn't grow trees of
+  maximal depth. To grow large trees, set \code{mincriterion = 0}.
+
+}
+\value{
+  An object of class \code{\link{ForestControl-class}}.
+}
+\references{
+
+    Carolin Strobl, Anne-Laure Boulesteix, Achim Zeileis and Torsten Hothorn (2007).
+    Bias in Random Forest Variable Importance Measures: Illustrations, Sources and  
+    a Solution. \emph{BMC Bioinformatics}, \bold{8}, 25. 
+    \url{http://www.BioMedCentral.com/1471-2105/8/25/}
+}
+\keyword{misc}