reuse of LDA model for new data

I am working with the LDA (Latent Dirichlet Allocation) model from sklearn and I have a question about reusing the model I have. After training my model with data how do I use it to make a prediction on a new data? Basically the goal is to read content of an email.

countVectorizer = CountVectorizer(stop_words=stop_words)
termFrequency = countVectorizer.fit_transform(corpus)
featureNames = countVectorizer.get_feature_names()

model = LatentDirichletAllocation(n_components=3)
model.fit(termFrequency)
joblib.dump(model, 'lda.pkl')

# lda_from_joblib = joblib.load('lda.pkl')

I save my model using joblib. Now I want in another file to load the model and use it on new data. Is there a way to do this? In the sklearn documentaton I am not sure what function to call to make a new prediction.

Topic unsupervised-learning scikit-learn lda machine-learning

Category Data Science


I'm not familiar with the library but this is certainly possible, and I would assume that the way to do it is something like this:

countVectorizer = CountVectorizer(stop_words=stop_words)
termFrequency = countVectorizer.fit_transform(corpus)
featureNames = countVectorizer.get_feature_names()

model = LatentDirichletAllocation(n_components=3)
model.fit(termFrequency)

# obtaining MyNewData here
newdataFreq = countVectorizer.transform(myNewData)
model.transform(newdataFreq)
  • I didn't test this code, but the LDA class does have a transform method which is supposed to apply an existing model, exactly as you need.
  • Note that you need to represent the new data with the same vocabulary indexing. If saving and loading the model, you would need to save not only the LDA model but also countVectorizer.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.