Order Now
Knn R, K-Nearest Neighbor Implementation In R Using Caret Package

Knn R, K-Nearest Neighbor Implementation In R Using Caret Package

  • 30th Jun, 2022
  • 15:43 PM
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
options(warn = -1)

## Step 1: Reading and partitioning of the data

We have read the dataset and divided it into training and test set according to the instructions.

titanicData <- read.csv("titanic.csv")
titanicData <- titanicData[complete.cases(titanicData),]
titanicData$Survived <- as.factor(titanicData$Survived)
train <- sample(1:nrow(titanicData), size = nrow(titanicData)*0.8)
test <- dplyr::setdiff(1:nrow(titanicData), train)
titanicDataTrain <- titanicData[train, ]
titanicDataTest <- titanicData[test, ]


## Training and Testing the k-NN models

We have used train from caret to train a k\-NN model. We have also used tune grid to train the model for all k from 2 to 30. We shall choose the best k from the 10-fold cross validation.


ctrl <- trainControl(method="repeatedcv",repeats = 10) 
knnFit <- train(Survived ~ Fare + Age, data = titanicDataTrain, method = "knn", trControl = ctrl, preProcess = c("center","scale"), tuneGrid = expand.grid(k = 2:30))


The best fit across all k was for k \= 29 and 30. Since it is a binary classification, we prefer k \= 29 as it will not have ties.

Now, we shall use the 29-NN model to see the accuracy on test dataset.

## Accuracy on test dataset

pred <- predict(knnFit, newdata = titanicDataTest)
confusionMatrix(pred, titanicDataTest$Survived )

The test accuracy is 0.6538. All the prediction was done for the class 0. As the data was biased towards the class 0, we see a bias in prediction. The accuracy may not be a good measure to train the dataset on as it favors class with more data points.

Share this post

assignment helpassignment helperassignment expertsassignment writing services