Get row wise frequency count of words from list in text column pandas

I have a data frame with a Audio Transcript column from customer care phone conversation. I have created one list with words and sentences

words = ["rain", "buy new house", "tornado"]

What I need to do is create a column in the data frame which checks these words in the text column row by row and if it presents then update the column with word and it's frequency. For example first row text

"I was going to buy new house last week but it was raining since then. Once the rain stops I'll go and buy new house"

the column should read

{"buy new house",2}, {"rain",2}

or may be create a duplicate row and add the comma part in next row.

How to proceed in this as I am fairly new.

Topic python-3.x word-embeddings text-mining nlp

Category Data Science


Here is one way to approach the core logic:

def count_phrases(string: str, phrases: str) -> dict:
    "Find the number of occurances of phrases in a string."
    return {phrase: string.count(phrase) for phrase in phrases}

string = "I was going to buy new house last week but it was raining since then. Once the rain stops I'll go and buy new house"
phrases = ["rain", "buy new house", "tornado"]

assert count_phrases(string, phrases) == {'rain': 2, 'buy new house': 2, 'tornado': 0}

The function then could be used in a Pandas DataFrame with .apply

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.