Topic Modelling in an existing dataframe in python
I am trying to perform topic extraction in a panda dataframe. I am using LDA topic modeling in order to extract the topics in my dataframe. No problem.
But, I would like to apply LDA topic modeling to each row in my dataframe.
Current datafame:
date | cust_id | words |
---|---|---|
3/14/2019 | 100001 | samantha slip skirt pi ski |
1/21/2020 | 10002 | steel skirt solid greenish |
5/19/2020 | 10003 | arizona denim blouse d |
The dataframe I am looking for:
date | cust_id | words | topic 0 words | topic 0 weights |
---|---|---|---|---|
3/14/2019 | 100001 | samantha slip skirt pi ski | skirt | 0.5 |
1/21/2020 | 10002 | skirt solid greenish | greenish | 0.2 |
5/19/2020 | 10003 | arizona denim blouse | denim | 01 |
vectorizer = CountVectorizer(max_df=0.9, min_df=20, token_pattern='\w+|\$[\d.]+|\S+')
tf = vectorizer.fit_transform(features['words']).toarray()
tf_feature_names = vectorizer.get_feature_names()
number_of_topics = 6 model = LatentDirichletAllocation(n_components=number_of_topics, random_state=1111)
model.fit(tf)
I tried to merge two dataframe together, it does not work.
How will I be able to add each topic in each column and add each topic weights to add to all my rows?
I posted the question in stackoverflow: https://stackoverflow.com/questions/71476309/topic-modelling-in-an-existing-dataframe-in-python
Topic dataframe pandas lda topic-model python
Category Data Science