Extract a numeric attribute from partially unstructured text for each word of a vocabulary

Given a vocabulary

v = {'sales', 'units', 'parts', 'operators', 'revenue'}

and strings such as

s1 = 'total of 1138 units, repaired 7710 parts, sales increased 588 (+34), decreasing of operator 413 (-14)'
s2 = 'part 7710 (repaired), units are 1138, revenue 1212, operators variation is -14, salles increment +34 (588 total)'

I have to associate each key of v with the corresponding number from s1 and s2 (for sales and operators I need the variations (numbers with a sign in the front), i.e. +34 and -14), that is for s1 we have to obtain

key attribute
sales +34
units 1138
parts 7710
operators -14
revenue none

for s2 is the same table except for 1212 instead of none.

Notice that:

  • there is some sort of structure in the text data since each string contains some commas , dividing the string into different parts, each of them containing a word of the vocabulary and one number (two in the case of sales and operators).
  • keys contained in the strings may be written badly since they are manually typed, i.e. in s1 there is operator instead of operators, in s2 there are part instead of parts and salles instead of sales

I wrote a simple python script (using mainly regex) doing the job in most cases, and now I'd like to try with a machine learning algorithm to learn how it works and compare the results. I have many manually labelled strings (i.e. string + table) which I could use to train a neural network, but since I'm a novice I don't know where to start.

Which model is most suitable for this task? NER? BERT? I searched for examples on keras site, here and on google to see if somebody had already treatened this kind of tasks, but didn't find one, and I think it's because I don't know which terms to use in the search query. I tried with something like data mining unstructured data, example of supervised NER and keras example text extraction, but they all are kind of vague.

Is this a problem of multiclass classification? Text mining? Features generation? I think is not NLP since the algorithm doesn't have to learn the meaning of a sentence.

Topic text-mining neural-network

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.