sklearn serialize label encoder for multiple categorical columns
I have a model with several categorical features that need to be converted to numeric format. I am using a combination of LabelEncoder and OneHotEncoder to achieve this.
Once in production, I need to apply the same encoding to new incoming data before the model can be used. I've saved on disk the model and the encoders using pickle. The problem here is that the LabelEncoder
keeps only the last set of classes (for the last feature it has encoded), thus it can't be used to encode all the categorical features for the new data. To face this issue I am saving on disk a different LabelEncoder
for each one of the categorical features, but this does not seem to scale very well to me, especially when you have a large number of categorical features.
What is the common practice for this situation? Is it possible to serialize and save just one encoder for all the categorical features to be used in production?
Topic encoder categorical-encoding labels scikit-learn categorical-data
Category Data Science