Transform test data when using a persistent model

Question

Transform test data when using a persistent model

Daten_Raten

2019年9月19日 08:14

I'm quite new to data science and only slowly following the necessary steps to get valid results using scikit-learn. As far as I understand you fit and transform the training data and only transform the test data (using the parameters retrieved by the earlier fitting). For my project a persistent model is necessary, for that I export the trained model using joblib.

When applying the model on test data later, is there a way to retrieve the parameters (for transformation) generated during the training process?

Topic pickle preprocessing scikit-learn

Category Data Science

Dan Scally · Accepted Answer · 2019年9月19日 08:14

In the same way that you use joblib to persist your saved model, you should persist the transformers that you use in the pipeline too. So for example if you're using StandardScaler() and OneHotEncoder(), those also need to be joblib.dump()ed so you can import them into your prediction script.

The simplest way to achieve this is to add both transformers and estimator into a Scikit Learn pipeline and use joblib to persist that.

Transform test data when using a persistent model

About