How to handle categorical feature engineering in ML production?
I have a classification dataset ,where I have a lot of categorical columns . I have one hot encoded ie. dummy variables in my training . How to handle this in production side of ML. There are cases in which there is drift in data in future datasets which can introduce new variables outside categories used during training the model.
What I did was after one hot encoding off all the features I saved the categorical columns and saved those as pickle file and later loaded the pickle file to match the production set features during deployment and remove the extras.
How is it done in production , the correct way?
Topic deployment machine-learning
Category Data Science