a b/partyMod/man/cforest_control.Rd
1
\name{Control Forest Hyper Parameters}
2
\alias{cforest_control}
3
\alias{cforest_classical}
4
\alias{cforest_unbiased}
5
\title{ Control for Conditional Tree Forests }
6
\description{
7
8
  Various parameters that control aspects of the `cforest' fit via
9
  its `control' argument.
10
11
}
12
\usage{
13
cforest_unbiased(\dots)
14
cforest_classical(\dots)
15
cforest_control(teststat = "max",
16
                testtype = "Teststatistic",
17
                mincriterion = qnorm(0.9),
18
                savesplitstats = FALSE,
19
                ntree = 500, mtry = 5, replace = TRUE,
20
                fraction = 0.632, trace = FALSE, \dots)
21
}
22
\arguments{
23
  \item{teststat}{ a character specifying the type of the test statistic
24
                       to be applied. }
25
  \item{testtype}{ a character specifying how to compute the distribution of
26
                   the test statistic. }
27
  \item{mincriterion}{ the value of the test statistic (for \code{testtype == "Teststatistic"}),
28
                       or 1 - p-value (for other values of \code{testtype}) that
29
                       must be exceeded in order to implement a split. }
30
  \item{mtry}{ number of input variables randomly sampled as candidates 
31
               at each node for random forest like algorithms. Bagging, as special case 
32
               of a random forest without random input variable sampling, can 
33
               be performed by setting \code{mtry} either equal to \code{NULL} or 
34
               manually equal to the number of input variables.}
35
  \item{savesplitstats}{ a logical determining whether the process of standardized
36
                         two-sample statistics for split point estimate
37
                         is saved for each primary split.}
38
  \item{ntree}{ number of trees to grow in a forest.}
39
  \item{replace}{ a logical indicating whether sampling of observations is 
40
                 done with or without replacement.}
41
  \item{fraction}{ fraction of number of observations to draw without 
42
                   replacement (only relevant if \code{replace = FALSE}).}
43
  \item{trace}{ a logical indicating if a progress bar shall be printed
44
                while the forest grows.}
45
  \item{\dots}{ additional arguments to be passed to 
46
                \code{\link{ctree_control}}.}
47
}
48
\details{
49
50
  All three functions return an object of class \code{\link{ForestControl-class}}
51
  defining hyper parameters to be specified via the \code{control} argument
52
  of \code{\link{cforest}}.
53
54
  The arguments \code{teststat}, \code{testtype} and \code{mincriterion}
55
  determine how the global null hypothesis of independence between all input
56
  variables and the response is tested (see \code{\link{ctree}}). The 
57
  argument \code{nresample} is the number of Monte-Carlo replications to be
58
  used when \code{testtype = "MonteCarlo"}.
59
60
  A split is established when the sum of the weights in both daugther nodes
61
  is larger than \code{minsplit}, this avoids pathological splits at the
62
  borders. When \code{stump = TRUE}, a tree with at most two terminal nodes
63
  is computed.
64
65
  The \code{mtry} argument regulates a random selection of \code{mtry} input 
66
  variables in each node. Note that here \code{mtry} is fixed to the value 5 by 
67
  default for merely technical reasons, while in \code{\link[randomForest]{randomForest}} 
68
  the default values for classification and regression vary with the number of input 
69
  variables. Make sure that \code{mtry} is defined properly before using \code{cforest}.
70
71
  It might be informative to look at scatterplots of input variables against
72
  the standardized two-sample split statistics, those are available when
73
  \code{savesplitstats = TRUE}. Each node is then associated with a vector
74
  whose length is determined by the number of observations in the learning
75
  sample and thus much more memory is required.
76
77
  The number of trees \code{ntree} can be increased for large numbers of input variables.
78
79
  Function \code{cforest_unbiased} returns the settings suggested 
80
  for the construction of unbiased random forests (\code{teststat = "quad", testtype = "Univ", 
81
    replace = FALSE}) by Strobl et al. (2007)
82
  and is the default since version 0.9-90.
83
  Hyper parameter settings mimicing the behaviour of
84
  \code{\link[randomForest]{randomForest}} are available in
85
  \code{cforest_classical} which have been used as default up to
86
  version 0.9-14. 
87
88
  Please note that \code{\link{cforest}}, in contrast to 
89
  \code{\link[randomForest]{randomForest}}, doesn't grow trees of
90
  maximal depth. To grow large trees, set \code{mincriterion = 0}.
91
92
}
93
\value{
94
  An object of class \code{\link{ForestControl-class}}.
95
}
96
\references{
97
98
    Carolin Strobl, Anne-Laure Boulesteix, Achim Zeileis and Torsten Hothorn (2007).
99
    Bias in Random Forest Variable Importance Measures: Illustrations, Sources and  
100
    a Solution. \emph{BMC Bioinformatics}, \bold{8}, 25. 
101
    \url{http://www.BioMedCentral.com/1471-2105/8/25/}
102
}
103
\keyword{misc}