Get row wise frequency count of words from list in text column pandas

Question

Get row wise frequency count of words from list in text column pandas

shivanshu dhawan

2022年4月24日 21:02

I have a data frame with a Audio Transcript column from customer care phone conversation. I have created one list with words and sentences

words = ["rain", "buy new house", "tornado"]

What I need to do is create a column in the data frame which checks these words in the text column row by row and if it presents then update the column with word and it's frequency. For example first row text

"I was going to buy new house last week but it was raining since then. Once the rain stops I'll go and buy new house"

the column should read

{"buy new house",2}, {"rain",2}

or may be create a duplicate row and add the comma part in next row.

How to proceed in this as I am fairly new.

Topic python-3.x word-embeddings text-mining nlp

Category Data Science

Brian Spiering · Accepted Answer · 2021年11月20日 14:30

Here is one way to approach the core logic:

def count_phrases(string: str, phrases: str) -> dict:
    "Find the number of occurances of phrases in a string."
    return {phrase: string.count(phrase) for phrase in phrases}

string = "I was going to buy new house last week but it was raining since then. Once the rain stops I'll go and buy new house"
phrases = ["rain", "buy new house", "tornado"]

assert count_phrases(string, phrases) == {'rain': 2, 'buy new house': 2, 'tornado': 0}

The function then could be used in a Pandas DataFrame with .apply

Get row wise frequency count of words from list in text column pandas

About