Perform bootstrapping of an ordinary linear regression model, using B=100 bootstrap resamples of my dataset, and getting RMSE

Question

Perform bootstrapping of an ordinary linear regression model, using B=100 bootstrap resamples of my dataset, and getting RMSE

Robbie Meaney

2021年10月4日 05:06

So Im studying machine learning through R, and Im working with the boston data set from the library MASS. I am practicing bootsrapping. I already carried out analysis to determine how ,many distinct data points on average are drawn from the sample to make up a bootsratp resample, using B=100 resamples of the dataset. Next I would like to do two things- perform boostrapping of an ordinary linear regression model using B=100 resamples of the data set again and use the OOB samples to predict RMSE. And aslo use bootstrapping of a ridge regression model with 100 Bootstrap samples to predict RMSE and then compare my answers. I am having difficulty adapting the code from what I have already done into this. Does anyone have any idea?

Topic bootstraping rmse r machine-learning

Category Data Science

Saurabh Kansal · Accepted Answer · 2020年4月22日 15:15

The following code will run Ordinary linear regression and ridge regression on B=100 different samples of Boston data set and calculate the RMSE on the test set for all 100 different test sets and store the RMSE for ordinary linear regression in olr_rmse_all and for ridge in ridge_rmse_all.

You can do any type of analysis on RMSE vectors.

I calculated mean and standard deviation of the RMSE of B=100 for Ordinary Linear regression and Ridge regression


library(MASS)
library (ridge)

data1 = Boston

RMSE = function(actual, predicted){
  sqrt(mean((actual - predicted)^2))
}


test_perc = 0.3
B = 100

olr_rmse_all = c()
ridge_rmse_all = c()


for (i in c(1:B)){

  cat("running for sample = ", i, '\n')
  train_rows = sample(nrow(data), (1-test_perc)*nrow(data))
  test_rows = setdiff(c(1:nrow(data)) , train_row)

  Boston.train <- Boston[train_rows, ]
  Boston.test <- Boston[test_rows, ]

  olr_model  <- lm(medv ~ ., data = Boston.train)
  ridge_model <- linearRidge(medv ~ ., data = Boston.train)

  test_predicted_olr = predict(olr_model, newdata = Boston.test)
  test_predicted_ridge = predict(ridge_model, newdata = Boston.test)

  test_actual = Boston.test$medv

  rmse_test_olr = RMSE(test_actual, test_predicted_olr)
  rmse_test_ridge = RMSE(test_actual, test_predicted_ridge)

  cat(paste("olr_rmse: " , rmse_test_olr,  "ridge_rmse: ", rmse_test_ridge, sep = " "), '\n')

  olr_rmse_all = c(olr_rmse_all, rmse_test_olr)
  ridge_rmse_all = c(ridge_rmse_all, rmse_test_ridge)

}

#-- Mean of RMSE----

olr_rmse_all_mean = mean(olr_rmse_all)
ridge_rmse_all_mean = mean(ridge_rmse_all)

#-- Standard Deviation of RMSE----

olr_rmse_all_sd = sd(olr_rmse_all)
ridge_rmse_all_sd = sd(ridge_rmse_all)

Hope, this answer your question.

There is package for bootstrapping in R, name "boot", you can also use that. I do it without package for more understanding.

reference: https://www.statmethods.net/advstats/bootstrapping.html

Perform bootstrapping of an ordinary linear regression model, using B=100 bootstrap resamples of my dataset, and getting RMSE

About