## R Programming Assignment Help on Ensembles

• 26th May, 2022
• 15:03 PM
``````{r setup, include=FALSE}
knitr::opts_chunk\$set(echo = TRUE)
options(warn = -1)
```

## `Question 1`: Run a classification tree, using the default controls of rpart(). Looking at the validation set, what is the overall accuracy? What is the lift on the first decile?

```{r q1}
library(rpart.plot)
library(rpart)
library(caret)
library(lift)
eBayAuctions\$Category <- as.factor(eBayAuctions\$Category)
eBayAuctions\$currency <- as.factor(eBayAuctions\$currency)
eBayAuctions\$endDay <- as.factor(eBayAuctions\$endDay)
names1 = colnames(eBayAuctions)
names1[8] = "Competitive"
colnames(eBayAuctions)<- names1
eBayAuctions\$Competitive = as.factor(eBayAuctions\$Competitive)
n = nrow(eBayAuctions)
set.seed(12345)
train_samp = sample(1:n, floor(0.6*n), replace = F)
training = eBayAuctions[train_samp,]
validation = eBayAuctions[-train_samp,]

model.tree = rpart(Competitive ~., data = training, method = "class")
rpart.plot(model.tree)

pred = predict(model.tree, validation, type = "class")
confusionMatrix(pred, as.factor(validation\$Competitive))
TopDecileLift(pred, validation\$Competitive)

```

The default accuracy is 85\% for validation set. The top docile is 1.668

## `Question 2`: Run a boosted tree with the same predictors (use function boosting() in the adabag package). For the validation set, what is the overall accuracy? What is the lift on the first decile?

```{r q2}
boost_model = boosting( Competitive~., data = training,mfinal = 10  , boos = T)

pred2 = predict(boost_model, newdata = validation, type = "class")
confusionMatrix(as.factor(pred2\$class), validation\$Competitive)
TopDecileLift(as.factor(pred2\$class), validation\$Competitive)

```

The accuracy of the boosting model 0.8796. The top docile is 1.761

## `Question 3`:Run a bagged tree with the same predictors (use function bagging() in the adabag package). For the validation set, what is the overall accuracy? What is the lift on the first decile?

```{r}
model_bag = bagging(Competitive~., data = training,mfinal=10 )
pred_bag = predict(model_bag, validation, type = "class")
confusionMatrix(as.factor(pred_bag\$class), validation\$Competitive)
TopDecileLift(as.factor(pred_bag\$class), validation\$Competitive)

```
The accuracy of the bagging model 0.8631. The top docile is 1.807

## `Question 4`: Run a random forest (use function randomForest() in package randomForest with argument mtry = 4). Compare the bagged tree to the random forest in terms of validation accuracy and lift on first decile. How are the two methods conceptually different?

```{r}
library(randomForest)
model_rf = randomForest(Competitive~., data = training, mtry = 4)
pred_rf = predict(model_rf, validation)

confusionMatrix(as.factor(pred_rf), validation\$Competitive)
TopDecileLift(as.factor(pred_rf), validation\$Competitive)

```
The accuracy of the random forest model 0.8847. The top docile is 1.738.

The fundamental difference is that in Random forests, only a subset of features are selected at random out of the total and the best split feature from the subset is used to split each node in a tree, unlike in bagging where all features are considered for splitting a node.
```