reuse of LDA model for new data

Question

reuse of LDA model for new data

El Pandario

2022年5月22日 10:56

I am working with the LDA (Latent Dirichlet Allocation) model from sklearn and I have a question about reusing the model I have. After training my model with data how do I use it to make a prediction on a new data? Basically the goal is to read content of an email.

countVectorizer = CountVectorizer(stop_words=stop_words)
termFrequency = countVectorizer.fit_transform(corpus)
featureNames = countVectorizer.get_feature_names()

model = LatentDirichletAllocation(n_components=3)
model.fit(termFrequency)
joblib.dump(model, 'lda.pkl')

# lda_from_joblib = joblib.load('lda.pkl')

I save my model using joblib. Now I want in another file to load the model and use it on new data. Is there a way to do this? In the sklearn documentaton I am not sure what function to call to make a new prediction.

Topic unsupervised-learning scikit-learn lda machine-learning

Category Data Science

Erwan · Accepted Answer · 2022年5月22日 10:56

I'm not familiar with the library but this is certainly possible, and I would assume that the way to do it is something like this:

countVectorizer = CountVectorizer(stop_words=stop_words)
termFrequency = countVectorizer.fit_transform(corpus)
featureNames = countVectorizer.get_feature_names()

model = LatentDirichletAllocation(n_components=3)
model.fit(termFrequency)

# obtaining MyNewData here
newdataFreq = countVectorizer.transform(myNewData)
model.transform(newdataFreq)

I didn't test this code, but the LDA class does have a transform method which is supposed to apply an existing model, exactly as you need.
Note that you need to represent the new data with the same vocabulary indexing. If saving and loading the model, you would need to save not only the LDA model but also countVectorizer.

reuse of LDA model for new data

About