Extracting domain specific terms from a huge hard-coded list from a text

Question

Extracting domain specific terms from a huge hard-coded list from a text

BaldML

2020年4月29日 18:16

I know. The title sounds like I haven't googled my problem, but trust me, I did. Maybe my problem has a name and I haven't found it yet. Hoping you can help me wrap my head around it.

What I want to do is, given a text, extract all terms from a specific domain. For simplicity let's say, given a list of hard-coded animals, I want my model to extract from an input text all of the animals that are present in the list. Why would I use IA for this? Well e.g. I want to distinguish negations, so I don't want to extract "lion" if the text says "no it a'int a lion m'am", also I want it to extract "lion" if the text says "lioness", without having to make a huge list of synonyms.

I think my problem is kind of like NER, but not exactly, right? I don't just want to say "lioness" = animal, I want it to map it to "lion" also. So this is basically a classification problem, right? I want it to label all of the animals found in the text, basically label "lioness" = lion. But the problem is I may have 100k+ thousands of possible labels, so should I just train a classification model with 100k+ possible outputs? Doesn't seem right.

Thanks in advance for any feedback.

Topic information-extraction nlp python

Category Data Science

Extracting domain specific terms from a huge hard-coded list from a text

About