K-Fold cross validation and data leakage
I want to do K-Fold cross validation and also I want to do normalization or feature scaling for each fold. So let's say we have k folds. At each step we take one fold as validation set and the remaining k-1 folds as training set. Now I want to do feature scaling and data imputation on that training set and then apply the same transformation on that validation set. I want to do this for each step. I am trying to avoid data leakage as much as possible and at the same time rescale my validation sets to get better results.
How can I do this with a few lines of code?
Secondly, is it necessary to do this? Because I don't see many people do this for k-fold validation. I have seen many times, they do feature scaling and imputation on the entire dataset first and then do the k-fold cross validation. But doesn't this cause data leakage?
Topic data-leakage data-imputation feature-scaling cross-validation
Category Data Science