|
a |
|
b/partyMod/man/cforest_control.Rd |
|
|
1 |
\name{Control Forest Hyper Parameters} |
|
|
2 |
\alias{cforest_control} |
|
|
3 |
\alias{cforest_classical} |
|
|
4 |
\alias{cforest_unbiased} |
|
|
5 |
\title{ Control for Conditional Tree Forests } |
|
|
6 |
\description{ |
|
|
7 |
|
|
|
8 |
Various parameters that control aspects of the `cforest' fit via |
|
|
9 |
its `control' argument. |
|
|
10 |
|
|
|
11 |
} |
|
|
12 |
\usage{ |
|
|
13 |
cforest_unbiased(\dots) |
|
|
14 |
cforest_classical(\dots) |
|
|
15 |
cforest_control(teststat = "max", |
|
|
16 |
testtype = "Teststatistic", |
|
|
17 |
mincriterion = qnorm(0.9), |
|
|
18 |
savesplitstats = FALSE, |
|
|
19 |
ntree = 500, mtry = 5, replace = TRUE, |
|
|
20 |
fraction = 0.632, trace = FALSE, \dots) |
|
|
21 |
} |
|
|
22 |
\arguments{ |
|
|
23 |
\item{teststat}{ a character specifying the type of the test statistic |
|
|
24 |
to be applied. } |
|
|
25 |
\item{testtype}{ a character specifying how to compute the distribution of |
|
|
26 |
the test statistic. } |
|
|
27 |
\item{mincriterion}{ the value of the test statistic (for \code{testtype == "Teststatistic"}), |
|
|
28 |
or 1 - p-value (for other values of \code{testtype}) that |
|
|
29 |
must be exceeded in order to implement a split. } |
|
|
30 |
\item{mtry}{ number of input variables randomly sampled as candidates |
|
|
31 |
at each node for random forest like algorithms. Bagging, as special case |
|
|
32 |
of a random forest without random input variable sampling, can |
|
|
33 |
be performed by setting \code{mtry} either equal to \code{NULL} or |
|
|
34 |
manually equal to the number of input variables.} |
|
|
35 |
\item{savesplitstats}{ a logical determining whether the process of standardized |
|
|
36 |
two-sample statistics for split point estimate |
|
|
37 |
is saved for each primary split.} |
|
|
38 |
\item{ntree}{ number of trees to grow in a forest.} |
|
|
39 |
\item{replace}{ a logical indicating whether sampling of observations is |
|
|
40 |
done with or without replacement.} |
|
|
41 |
\item{fraction}{ fraction of number of observations to draw without |
|
|
42 |
replacement (only relevant if \code{replace = FALSE}).} |
|
|
43 |
\item{trace}{ a logical indicating if a progress bar shall be printed |
|
|
44 |
while the forest grows.} |
|
|
45 |
\item{\dots}{ additional arguments to be passed to |
|
|
46 |
\code{\link{ctree_control}}.} |
|
|
47 |
} |
|
|
48 |
\details{ |
|
|
49 |
|
|
|
50 |
All three functions return an object of class \code{\link{ForestControl-class}} |
|
|
51 |
defining hyper parameters to be specified via the \code{control} argument |
|
|
52 |
of \code{\link{cforest}}. |
|
|
53 |
|
|
|
54 |
The arguments \code{teststat}, \code{testtype} and \code{mincriterion} |
|
|
55 |
determine how the global null hypothesis of independence between all input |
|
|
56 |
variables and the response is tested (see \code{\link{ctree}}). The |
|
|
57 |
argument \code{nresample} is the number of Monte-Carlo replications to be |
|
|
58 |
used when \code{testtype = "MonteCarlo"}. |
|
|
59 |
|
|
|
60 |
A split is established when the sum of the weights in both daugther nodes |
|
|
61 |
is larger than \code{minsplit}, this avoids pathological splits at the |
|
|
62 |
borders. When \code{stump = TRUE}, a tree with at most two terminal nodes |
|
|
63 |
is computed. |
|
|
64 |
|
|
|
65 |
The \code{mtry} argument regulates a random selection of \code{mtry} input |
|
|
66 |
variables in each node. Note that here \code{mtry} is fixed to the value 5 by |
|
|
67 |
default for merely technical reasons, while in \code{\link[randomForest]{randomForest}} |
|
|
68 |
the default values for classification and regression vary with the number of input |
|
|
69 |
variables. Make sure that \code{mtry} is defined properly before using \code{cforest}. |
|
|
70 |
|
|
|
71 |
It might be informative to look at scatterplots of input variables against |
|
|
72 |
the standardized two-sample split statistics, those are available when |
|
|
73 |
\code{savesplitstats = TRUE}. Each node is then associated with a vector |
|
|
74 |
whose length is determined by the number of observations in the learning |
|
|
75 |
sample and thus much more memory is required. |
|
|
76 |
|
|
|
77 |
The number of trees \code{ntree} can be increased for large numbers of input variables. |
|
|
78 |
|
|
|
79 |
Function \code{cforest_unbiased} returns the settings suggested |
|
|
80 |
for the construction of unbiased random forests (\code{teststat = "quad", testtype = "Univ", |
|
|
81 |
replace = FALSE}) by Strobl et al. (2007) |
|
|
82 |
and is the default since version 0.9-90. |
|
|
83 |
Hyper parameter settings mimicing the behaviour of |
|
|
84 |
\code{\link[randomForest]{randomForest}} are available in |
|
|
85 |
\code{cforest_classical} which have been used as default up to |
|
|
86 |
version 0.9-14. |
|
|
87 |
|
|
|
88 |
Please note that \code{\link{cforest}}, in contrast to |
|
|
89 |
\code{\link[randomForest]{randomForest}}, doesn't grow trees of |
|
|
90 |
maximal depth. To grow large trees, set \code{mincriterion = 0}. |
|
|
91 |
|
|
|
92 |
} |
|
|
93 |
\value{ |
|
|
94 |
An object of class \code{\link{ForestControl-class}}. |
|
|
95 |
} |
|
|
96 |
\references{ |
|
|
97 |
|
|
|
98 |
Carolin Strobl, Anne-Laure Boulesteix, Achim Zeileis and Torsten Hothorn (2007). |
|
|
99 |
Bias in Random Forest Variable Importance Measures: Illustrations, Sources and |
|
|
100 |
a Solution. \emph{BMC Bioinformatics}, \bold{8}, 25. |
|
|
101 |
\url{http://www.BioMedCentral.com/1471-2105/8/25/} |
|
|
102 |
} |
|
|
103 |
\keyword{misc} |