Should I encode the categorical data before making a training validation split?
I am looking at some examples in kaggle and I'm not sure what is the correct approach. If I split the training data for training and validation and only encode the categorical data in the training part sometimes there are some unique values that are left behind and I'm not sure if that is correct.
Topic encoding
Category Data Science