Generate new sentences based on keywords

For example, for a domain specific neural network in Fashion, with the Keywords light, dress, orange, cotton. It could output: This gorgeous orange summer dress is great for wearing on sunny camping days. It's cotton fabric makes it very comfortable to wear.

Can someone please suggest the simplest way to achieve this?

Topic nlg lstm nlp

Category Data Science


As already stated, it's an NLG (Natural Language Generation) task. Language generators are usually based on RNN technology, sometimes mixed with CNNs. Transformer models reached SOTA but they are extremely expensive computationally and training one from scratch could be problematic.


How to implement the model

The task must be based on a set of keywords that are processed with some kind of word vectors: word2vec and glove are classical ones, and probably it's what you need. (BERT and other contextual embeddings do not make sense in this case, since keywords come without an actual sentence-context.) These word vectors could be processed with CNN and/or RNN layers; the output of your model could be an LSTM layer with a softmax output that "writes" the sentence, sequentially, one word at a time.


How to build a dataset

You probably can build your own dataset by taking some already complete sentences somewhere, extract its keywords with standard NLP techniques (PoS tagging and NER for example), and train a model to reverse the process, i.e. going from keywords to complete sentences. I don't know this domain to find the right source, probably web scraping the right fashion websites.


If you want to go for the heavy stuff

An alternative, very fancy step forward you can take is to turn your model into a GAN, with a sentence generator trying to fool a discriminator. This could be a great performance boost. But that's not necessary at the beginning, I wouldn't focus on that until a base model is working.


It's not an easy project, but certainly something cool to put on your CV. Good luck!


This is area of NLG . You can use template based text generation techniques, wherein you have defined structure of output text and fill in required blank areas based on keywords. This technique is used in reports generation. An example is narrative science company.

Other approach can be to use OpenAI GPT . Example is Generate Text using OpenAIGPT2 in Python . You may have to tweak the code as per your requirement.

Paraphrasing can be another technique. An example of paraphrasing is - https://github.com/vsuthichai/paraphraser

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.