String together a set of tokens into a sequence

Question

String together a set of tokens into a sequence

Deepak Saini

2022年5月14日 06:02

I have this problem scenario - Given a set of tokens, string them or a subset of the tokens together using stop words into a sequence. I am clear that I can have potentially infinite pre-training data for this problem. For example, given the set of tokens {cat, jump, mouse} - possible outputs might be: a. the cat jumped on a mouse, b. the cat and the mouse jumped, c. cats jump and so on...

I am not sure if this is a well-studied problem scenario or what directions/model architectures should I explore. TIA.

Topic sequence-to-sequence machine-learning

Category Data Science

mork · Accepted Answer · 2022年4月6日 10:18

The great NodeBox Linguistics project and its follower pattern seems unsupported now, but if you could make them run - you can try the following, based on the RDF-Triple of: subject predicate object.

It won't cover every permutation, and won't be 100% grammatically correct - but it's a good start.

from pattern import en

for subject in subjects:
  for object in objects:
    for v in verbs:
      predicate = en.verb.past(v)
      print(f'The {subject} {predicate} the {object}')

The subjects and objects lists can be imported from any nlp nouns list. Same for the verbs list.

You can go on and add present and future tenses, each with an appropriate "sentence template".

String together a set of tokens into a sequence

About