atlantis / Git / [fbf06f] /partyMod/man/readingSkills.Rd

Models:
DanielG/
atlantis
Downloads: 1
[fbf06f]: / partyMod / man / readingSkills.Rd
History
Download this file
50 lines (42 with data), 1.7 kB

\name{readingSkills}
\alias{readingSkills}
\docType{data}
\title{ Reading Skills }
\description{
  A toy data set illustrating the spurious correlation
  between reading skills and shoe size in school-children.
}
\usage{data("readingSkills")}
\format{
  A data frame with 200 observations on the following 4 variables.
  \describe{
    \item{\code{nativeSpeaker}}{a factor with levels \code{no} and \code{yes},
                                where \code{yes} indicates that the child
                                is a native speaker of the language of the reading test.}
    \item{\code{age}}{age of the child in years.}
    \item{\code{shoeSize}}{shoe size of the child in cm.}
    \item{\code{score}}{raw score on the reading test.}
  }
}
\details{

  In this artificial data set, that was generated by means of a linear model, 
  \code{age} and \code{nativeSpeaker} are actual predictors of the 
  \code{score}, while the spurious correlation between \code{score} and 
  \code{shoeSize} is merely caused by the fact that both depend on \code{age}.  

  The true predictors can be identified, e.g., by means of partial correlations, 
  standardized beta coefficients in linear models or the conditional random 
  forest variable importance, but not by means of the standard random 
  forest variable importance (see example).

}
\examples{

   set.seed(290875)
   readingSkills.cf <- cforest(score ~ ., data = readingSkills,
       control = cforest_unbiased(mtry = 2, ntree = 50))

   # standard importance
   varimp(readingSkills.cf)
   # the same modulo random variation
   varimp(readingSkills.cf, pre1.0_0 = TRUE)

   # conditional importance, may take a while...
   varimp(readingSkills.cf, conditional = TRUE) 

}
\keyword{datasets}