Choosing a right algorithm for template-based text generation

Question

Choosing a right algorithm for template-based text generation

Vitaly Gorbachev

2022年5月2日 19:06

I am doing a text generation project -- the task is to basically represent the statistical data in a readable way.

The way I decided to go about this is template-based: each data type has a template for how sentence should be formed and what synonyms can be used.

I'm torn about whether some kind of ML techinque can bolster this template-based approach. Text should be unique -- so I need an algorithm that optimises for uniqueness.

Now, there are API solutions that can give me uniqueness score at the end of the text (and even in the middle of it) -- so my first instinct was to try out reinforcement learning with sparse rewards. Templates can be represented as a tree, which the algorithm traverses, getting rewards in the end and during its trip. Input is its current options on where to go and the output is its decision on where to go.

The problem with this approach is that after it succeeded in generating an unique text, it can't generate the same one (I mean it can, but the score would be 0), which might prove difficult for the model to learn. Also many articles on the web indicate that RL is really, really hard to tune properly.

I am now in the pre-research process, so any feedback on how I should approach this task is appreciated. Maybe there is no need for ML at all?

What do you think? My instinct tells me that such problems should have established solutions and I'm just searching the wrong way.

Thanks!

Topic text-generation decision-trees reinforcement-learning machine-learning

Category Data Science

Sandeep Bhutani · Accepted Answer · 2019年11月28日 17:35

You give this one a try - generation-caption-for-new-image . This has 2 parts one labeling the image and second generating text for those labeled images.
Your case is similar to this, the difference is - in your case there are templates and in this case there are images.

Please comment if you need help in understanding the code (that code is not mine, but I have gone through that in detail) or you find difficulty mapping this answer to your problem.

Choosing a right algorithm for template-based text generation

About