Jargon extraction in a text

Question

Jargon extraction in a text

n.mathfreak

2021年7月27日 07:49

I have a big text corpus (documentation from a company) and I want to extract the terms that are specific to that area/business. I can do that using TF or TF-IDF and guide myself by the frequency of the words, which isn't always reliable.

I want to also do that for single, shorter sentences, but I think this is already more difficult. I was also thinking of using Wikipedia articles to train a model and then apply it to my documentation texts.

Is there any way of identifying words that are related to a specific field?

Topic corpus nlp python

Category Data Science

Shrinidhi M · Accepted Answer · 2021年7月27日 07:49

1

Shrinidhi M answered at 2021年7月27日 07:49

You can use TF-IDF, TextRank, TopicRank, YAKE!, and KeyBERT for keyword extraction.

Check this article: https://towardsdatascience.com/keyword-extraction-python-tf-idf-textrank-topicrank-yake-bert-7405d51cd839

Palak Bansal · Accepted Answer · 2021年7月26日 10:31

I had created a similar application some time back, I had extracted the features(important defining terms) from the corpus using TF-IDF and then calculated word similarity between these words with my input words and aggregated the results.

You could use word embeddings like GloVe if you want to compare these words semantically.

Jargon extraction in a text

About