String together a set of tokens into a sequence

I have this problem scenario - Given a set of tokens, string them or a subset of the tokens together using stop words into a sequence. I am clear that I can have potentially infinite pre-training data for this problem. For example, given the set of tokens {cat, jump, mouse} - possible outputs might be: a. the cat jumped on a mouse, b. the cat and the mouse jumped, c. cats jump and so on...

I am not sure if this is a well-studied problem scenario or what directions/model architectures should I explore. TIA.

Topic sequence-to-sequence machine-learning

Category Data Science


The great NodeBox Linguistics project and its follower pattern seems unsupported now, but if you could make them run - you can try the following, based on the RDF-Triple of: subject predicate object.

It won't cover every permutation, and won't be 100% grammatically correct - but it's a good start.

from pattern import en

for subject in subjects:
  for object in objects:
    for v in verbs:
      predicate = en.verb.past(v)
      print(f'The {subject} {predicate} the {object}')

The subjects and objects lists can be imported from any nlp nouns list. Same for the verbs list.

You can go on and add present and future tenses, each with an appropriate "sentence template".

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.