NLP approaches to infer Processes from Text

I would like to use NLP techniques to infer a process out of raw text. For example, if I have a sentence like:

Recruitment is about attracting and selecting the right person for the job.

To get the following process:

Attracting the right person.

I noticed that a very good strong step forward is to use SpaCy, tokenizing the texts and filtering them for NOUNS. But from this point on, I'm completely blank. Someone suggested to me something named Semantic Role Labelling (SRL) and reading about the approach I think it could work here in a good way, perhaps using the AllenNLP module.

What I would like to know though (and the whole purpose of this post) are alternatives. Besides SRL, what other approaches could suit and provide a solution to this problem? I obviously don't expect the people who reply to solve the problem, only to make suggestions on approaches that might work so I can dig upon them (as mentioned, my experience on this topic is really short).

Thanks in advance!

Topic allennlp spacy nlp

Category Data Science


Following with the idea of building a classifier, one option is to use nltk library together with Keras-Tensorflow once you have a labeled dataset with the desired process categories. You can go on two main approaches:

  • bag-of-words
  • sequence-modeling

As a quick resume of the steps to implement in a text classifier with the first approach, you could follow the ones below (you can find a worked pout example here):

  • Read and check that your raw input sentences (to be used for training, validating...) have the right format and correct label, something like: enter image description here
  • Preprocess your sentences as needed, which could be these steps:
    - lowercase all your words
    - remove punctuation characters
    - tokenize your words (here, your can define if you want 1-gram tokens, 2-grams tokens...)
    - stem your words (so as to eliminate singular/plurals, verbs tenses... (this point is not always straigtforward, because some stemmers like PorterStemmer, SnowballStemmer might offer different performance depending on the selected language), more info here
    - add more steps custom for your use case, the ones above are standard ones, but you can filter sentences wich you know do not offer value for you use case
  • Once you have preprocessed your input data, you should be able top access your vocabulary, to have something like:

    enter image description here

    and you are ready to vectorize your sentences, to end up with something like: enter image description here

  • build your classifier, where you can try out different models, like a convolutional neural network, a bi-directional LSTM, a transformer model...

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.