How to find possible subjects for given verb in everyday object domain

I am asking for tools (possibly in NLTK) or papers that talk about the following:

e.g. Input: Vase(Subject1) put(verb)

Ans I am looking for: flower, water

Is there a tool that can output subjects (objects) that can be associated to this verb? (I was going through VerbNet but didn't find anything)

Topic nltk nlp

Category Data Science


If you want something quick, I think pattern is the best tool for the job. It provides a ready-to-use multilingual parser that you can use in the following way:

import pattern
from pattern.en import parse
s = 'I put water in the vase'
s = parse(s)
print s
# output = I/PRP/B-NP/O put/VBP/B-VP/O water/NN/B-NP/O in/IN/B-PP/B-PNP the/DT/B-NP/I-PNP vase/NN/I-NP/I-PNP

Once you have a string like output above, you only need regex parsing to extract every sequence of tokens whose tags match the sequence [B-NP, B-VP, B-NP].

NP stands for "noun phrase" and VP stands for "verb phrase". In English, virtually every sequence consisting of a noun phrase, a verb phrase, and a second noun phrase, all in strict adjacency, is a subject-verb-object sequence, so this should give you what you're looking for.

pattern's parser will also be able to handle some non-strict adjacencies (e.g. intervening adverbs and adjectives between the three phrases in the subject-verb-object sequence).

However, pattern is not terribly sophisticated -this will give you some Precision and some Recall, but not terribly high numbers. If you need high-quality parsing, you should try the Stanford parser's Python implementation or spacy.

Hope this helps!

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.