Should I encode the categorical data before making a training validation split?

I am looking at some examples in kaggle and I'm not sure what is the correct approach. If I split the training data for training and validation and only encode the categorical data in the training part sometimes there are some unique values that are left behind and I'm not sure if that is correct.

Topic encoding

Category Data Science


Yes encode the data before the split. The point of the split is to try to represent two i.i.d. samples from the data generating process. Encoding the data simply represents the data in a different manner.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.