Chapter 8 Tree Based Classification and Model Validation
8.0.1 Summary of this document
Exploring dataset avaialbe in R
-Iris data exploration Decision Tree Algorithm
. -Using rPart and Ctree packags Cross Validations and testing hypothesis statistically
8.0.2 Exploring the iris data set.
Iris dataset is preloaded in R
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa

Additional options in basic R polt

8.1 Decision trees with the rpart package.
Decision tree is a tree based algorithm for classification and regression problems.
#also need to install package rpart.plot
library(rpart)
library(rattle)
library(RColorBrewer)
iristree=rpart(Species~Sepal.Length+Sepal.Width+Petal.Length+Petal.Width, data=iris)
iristree=rpart(Species~.,data=iris)
fancyRpartPlot(iristree)
Confusion matrix can be used to test how well the classification worked based on the algorithm.
predSpecies=predict(iristree,newdata=iris,type="class")
confusionmatrix=table(Species,predSpecies)
#confmatrix function
confmatrix=function(y,predy){
matrix=table(y,predy)
accuracy=sum(diag(matrix))/sum(matrix)
return(list(matrix=matrix,accuracy=accuracy,error=1-accuracy))
}A second look at the iris scatterplot.
plot(jitter(Petal.Length),jitter(Petal.Width),col=c('blue','red','purple')[Species])
lines(1:7,rep(1.8,7),col='black')
lines(rep(2.4,4),0:3,col='black')
Accuracy for rpart tree.
Accuracy for a model can be tested by looking at the true output value against the predicted value.
The confusion matrix function above calculates accuracy for the decision tree prediction.
## [1] 0.96
## $matrix
## predy
## y setosa versicolor virginica
## setosa 50 0 0
## versicolor 0 49 1
## virginica 0 5 45
##
## $accuracy
## [1] 0.96
##
## $error
## [1] 0.04
The party package.

Simple plot of decision tree using ctree .

#ctree confusion matrix.
predSpecies=predict(iristree2,newdata=iris)
confmatrix(Species,predSpecies)## $matrix
## predy
## y setosa versicolor virginica
## setosa 50 0 0
## versicolor 0 49 1
## virginica 0 5 45
##
## $accuracy
## [1] 0.96
##
## $error
## [1] 0.04
Controling the depth of the tree.
