Classification of a free text field to determine which product

I have a problem. I have a free text field and I would like to use NLP to determine which product should be selected. This involves concepts such as weight, speed, size and so on. Here is an example:

If the weight is over 50 kg, take product A.
If less than the upper weight, take product B. 

I also have the labelled data for the corresponding free text fields.

Free text   Product Weight Size
(see above) A       110    50x58x98

What is the best approach to solve such a problem? Is there any literature on this subject? Is there also any blog/paper/jupyter notebook or anything else where someone implemented a similar classifciation.

Topic classification nlp

Category Data Science


It's a quite complex problem, depending on the different possible types of constraints.

As far as I know the problem would usually be decomposed into two parts:

  1. The extraction of the constraints from the text, which results in a set of constraints expressed in some predefined formal language.
  2. The application of the constraints to the database. This is the simple part: it's essentially building an SQL query or similar.

Of course part 1 is the NLP complex part, it can itself be decomposed into several parts:

  • Locating a constraint in the text and detecting the type of constraint. This could be designed as sequence labeling (e.g. named entity recognition)
  • Extracting the variable elements, for example the weight and the relation "over". Then this is mapped to a formal expression in the predefined language. Note that this can get difficult if the text references a previous information like "the upper weight", this would involve coreference resolution (as far as I know this rarely work perfectly in general).

The whole problem is similar to relation extraction, so there might some similar problems or even systems which could help doing it, but it would certainly require adaptation to the specific case.

Also note that if the text descriptions don't have too much diversity, it's possible that some simple pattern matching would suffice to capture most of the cases. For example one can imagine designing this as an hybrid system which first tries a set of predefined simple patterns, then if it doesn't find a match attempts a more general method.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.