|
a |
|
b/partyMod/man/ctree_control.Rd |
|
|
1 |
\name{Control ctree Hyper Parameters} |
|
|
2 |
\alias{ctree_control} |
|
|
3 |
\title{ Control for Conditional Inference Trees } |
|
|
4 |
\description{ |
|
|
5 |
|
|
|
6 |
Various parameters that control aspects of the `ctree' fit. |
|
|
7 |
|
|
|
8 |
} |
|
|
9 |
\usage{ |
|
|
10 |
ctree_control(teststat = c("quad", "max"), |
|
|
11 |
testtype = c("Bonferroni", "MonteCarlo", |
|
|
12 |
"Univariate", "Teststatistic"), |
|
|
13 |
mincriterion = 0.95, minsplit = 20, minbucket = 7, |
|
|
14 |
stump = FALSE, nresample = 9999, maxsurrogate = 0, |
|
|
15 |
mtry = 0, savesplitstats = TRUE, maxdepth = 0) |
|
|
16 |
} |
|
|
17 |
\arguments{ |
|
|
18 |
\item{teststat}{ a character specifying the type of the test statistic |
|
|
19 |
to be applied. } |
|
|
20 |
\item{testtype}{ a character specifying how to compute the distribution of |
|
|
21 |
the test statistic. } |
|
|
22 |
\item{mincriterion}{ the value of the test statistic (for \code{testtype == "Teststatistic"}), |
|
|
23 |
or 1 - p-value (for other values of \code{testtype}) that |
|
|
24 |
must be exceeded in order to implement a split. } |
|
|
25 |
\item{minsplit}{ the minimum sum of weights in a node in order to be considered |
|
|
26 |
for splitting. } |
|
|
27 |
\item{minbucket}{ the minimum sum of weights in a terminal node. } |
|
|
28 |
\item{stump}{ a logical determining whether a stump (a tree with three |
|
|
29 |
nodes only) is to be computed. } |
|
|
30 |
\item{nresample}{ number of Monte-Carlo replications to use when the |
|
|
31 |
distribution of the test statistic is simulated.} |
|
|
32 |
\item{maxsurrogate}{ number of surrogate splits to evaluate. Note the |
|
|
33 |
currently only surrogate splits in ordered |
|
|
34 |
covariables are implemented. } |
|
|
35 |
\item{mtry}{ number of input variables randomly sampled as candidates |
|
|
36 |
at each node for random forest like algorithms. The default |
|
|
37 |
\code{mtry = 0} means that no random selection takes place.} |
|
|
38 |
\item{savesplitstats}{ a logical determining if the process of standardized |
|
|
39 |
two-sample statistics for split point estimate |
|
|
40 |
is saved for each primary split.} |
|
|
41 |
\item{maxdepth}{ maximum depth of the tree. The default \code{maxdepth = 0} |
|
|
42 |
means that no restrictions are applied to tree sizes.} |
|
|
43 |
} |
|
|
44 |
\details{ |
|
|
45 |
|
|
|
46 |
The arguments \code{teststat}, \code{testtype} and \code{mincriterion} |
|
|
47 |
determine how the global null hypothesis of independence between all input |
|
|
48 |
variables and the response is tested (see \code{\link{ctree}}). The |
|
|
49 |
argument \code{nresample} is the number of Monte-Carlo replications to be |
|
|
50 |
used when \code{testtype = "MonteCarlo"}. |
|
|
51 |
|
|
|
52 |
A split is established when the sum of the weights in both daugther nodes |
|
|
53 |
is larger than \code{minsplit}, this avoids pathological splits at the |
|
|
54 |
borders. When \code{stump = TRUE}, a tree with at most two terminal nodes |
|
|
55 |
is computed. |
|
|
56 |
|
|
|
57 |
The argument \code{mtry > 0} means that a random forest like `variable |
|
|
58 |
selection', i.e., a random selection of \code{mtry} input variables, is |
|
|
59 |
performed in each node. |
|
|
60 |
|
|
|
61 |
It might be informative to look at scatterplots of input variables against |
|
|
62 |
the standardized two-sample split statistics, those are available when |
|
|
63 |
\code{savesplitstats = TRUE}. Each node is then associated with a vector |
|
|
64 |
whose length is determined by the number of observations in the learning |
|
|
65 |
sample and thus much more memory is required. |
|
|
66 |
|
|
|
67 |
} |
|
|
68 |
\value{ |
|
|
69 |
An object of class \code{\link{TreeControl}}. |
|
|
70 |
} |
|
|
71 |
\keyword{misc} |