
R Programming Assignment Help on Ensembles
- 26th May, 2022
- 15:03 PM
```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) options(warn = -1) ``` ## `Question 1`: Run a classification tree, using the default controls of rpart(). Looking at the validation set, what is the overall accuracy? What is the lift on the first decile? ```{r q1} library(readr) library(rpart.plot) library(rpart) library(caret) library(lift) eBayAuctions <- read.csv("eBayAuctions.csv") eBayAuctions$Category <- as.factor(eBayAuctions$Category) eBayAuctions$currency <- as.factor(eBayAuctions$currency) eBayAuctions$endDay <- as.factor(eBayAuctions$endDay) names1 = colnames(eBayAuctions) names1[8] = "Competitive" colnames(eBayAuctions)<- names1 eBayAuctions$Competitive = as.factor(eBayAuctions$Competitive) n = nrow(eBayAuctions) set.seed(12345) train_samp = sample(1:n, floor(0.6*n), replace = F) training = eBayAuctions[train_samp,] validation = eBayAuctions[-train_samp,] model.tree = rpart(Competitive ~., data = training, method = "class") rpart.plot(model.tree) pred = predict(model.tree, validation, type = "class") confusionMatrix(pred, as.factor(validation$Competitive)) TopDecileLift(pred, validation$Competitive) ``` The default accuracy is 85\% for validation set. The top docile is 1.668 ## `Question 2`: Run a boosted tree with the same predictors (use function boosting() in the adabag package). For the validation set, what is the overall accuracy? What is the lift on the first decile? ```{r q2} library(adabag) boost_model = boosting( Competitive~., data = training,mfinal = 10 , boos = T) pred2 = predict(boost_model, newdata = validation, type = "class") confusionMatrix(as.factor(pred2$class), validation$Competitive) TopDecileLift(as.factor(pred2$class), validation$Competitive) ``` The accuracy of the boosting model 0.8796. The top docile is 1.761 ## `Question 3`:Run a bagged tree with the same predictors (use function bagging() in the adabag package). For the validation set, what is the overall accuracy? What is the lift on the first decile? ```{r} model_bag = bagging(Competitive~., data = training,mfinal=10 ) pred_bag = predict(model_bag, validation, type = "class") confusionMatrix(as.factor(pred_bag$class), validation$Competitive) TopDecileLift(as.factor(pred_bag$class), validation$Competitive) ``` The accuracy of the bagging model 0.8631. The top docile is 1.807 ## `Question 4`: Run a random forest (use function randomForest() in package randomForest with argument mtry = 4). Compare the bagged tree to the random forest in terms of validation accuracy and lift on first decile. How are the two methods conceptually different? ```{r} library(randomForest) model_rf = randomForest(Competitive~., data = training, mtry = 4) pred_rf = predict(model_rf, validation) confusionMatrix(as.factor(pred_rf), validation$Competitive) TopDecileLift(as.factor(pred_rf), validation$Competitive) ``` The accuracy of the random forest model 0.8847. The top docile is 1.738. The fundamental difference is that in Random forests, only a subset of features are selected at random out of the total and the best split feature from the subset is used to split each node in a tree, unlike in bagging where all features are considered for splitting a node.