Transform test data when using a persistent model

I'm quite new to data science and only slowly following the necessary steps to get valid results using scikit-learn. As far as I understand you fit and transform the training data and only transform the test data (using the parameters retrieved by the earlier fitting). For my project a persistent model is necessary, for that I export the trained model using joblib.

When applying the model on test data later, is there a way to retrieve the parameters (for transformation) generated during the training process?

Topic pickle preprocessing scikit-learn

Category Data Science


In the same way that you use joblib to persist your saved model, you should persist the transformers that you use in the pipeline too. So for example if you're using StandardScaler() and OneHotEncoder(), those also need to be joblib.dump()ed so you can import them into your prediction script.

The simplest way to achieve this is to add both transformers and estimator into a Scikit Learn pipeline and use joblib to persist that.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.