Word-level text generation with word embeddings – outputting a word vector instead of a probability distribution

I am currently researching the topic of text generation for my university project. I decided (ofc) to go with a RNN getting a sequence of tokens as input with a target of predicting the next token given the sequence. I have been reading through a number of tutorials and there is one thing that I am wondering about. The sources I have read, regardless of how they encode the X sequences (one-hot or word embeddings), encode the y target tokens …
Category: Data Science

Choosing a right algorithm for template-based text generation

I am doing a text generation project -- the task is to basically represent the statistical data in a readable way. The way I decided to go about this is template-based: each data type has a template for how sentence should be formed and what synonyms can be used. I'm torn about whether some kind of ML techinque can bolster this template-based approach. Text should be unique -- so I need an algorithm that optimises for uniqueness. Now, there are …
Category: Data Science

Guide to Natural language Prompt programming for few-shot learning of Pretrained Language Models

I'm currently working on a project with the goal of producing AI content in the space of a content generation like blog writing, Instagram caption generation etc. Found the in-context few-shot learning capabilities of the GPT-3 quite useful but I'm unable to generate creative content consistently. It becomes boring and repetitive in nature after a few iterations. I came across the concept of knowledge probing of language models and have come to this understanding that writing better prompts can actually …
Category: Data Science

Using LSTM for text generation keeps generating same word

I work on a simple text generation problem using a portion of the Shakespeare dataset that I decided to use LSTM for. I primarily used this tutorial for reference. However, as I ran the below code, I noticed that the text generation section didn't work as expected: regardless of the input string (seed), the model always predicts the exact same word as having the highest probability. For example, when the input is just the one-word seed "i", the trained model …
Category: Data Science

Create an RNN on text sources with different lengths

I want to create an RNN to generate a new text based on many examples of existing texts of a certain format in the training data. The type of texts in the training data consists of 3 segments, like so: Example text 1: [Segment 0, ~20 characters] [Segment 1, ~200 characters] [Segment 2, ~400 characters] It is worth mentioning that the segments are all still alphanumerical, but of varying structure. Segment 1 contains more numbers and Segment 2 has more …
Category: Data Science

Is it possible to use Word2vec for text paraphrasing?

After reading several papers I am not sure if it is possible to some how generate text with the same meaning (paraphrase it) using only Word2vec. I found out other approaches that use sequences of sentence pairs, and they train Neural nets to find the most similar, but this is hard to maintain and it will be hard to generate relevant content like this. I would like to give raw text to Word2vec powered algorithm that gives paraphrased text.
Category: Data Science

Automatic data summarization with text

I would like to automate periodic report writing based on data. Given one/some data tables, the machine should output texts like Stock A rose by 10% this year and hit 5 year high on 2019-12-01, or we made a large profit in sector B. I can find the subject automatic text summarization, but seems it's about reading text and shorten it to its key sentences. Not exactly data summarization. Could someone recommend some book/paper/video/MOOC on text generation based on data?
Category: Data Science

NLP - Paraphrase extraction in Python

I am trying to develop a NLP model, which takes something like you have high levels of cholesterol(this will be a tag) as input and has to output something like you have high levels of cholesterol, you need to have a low-salt diet that emphasizes fruits, vegetables and whole grains; limit the amount of animal fats and use good fats in moderation(this will be the suggestion; and it is an example suggestion from doctor). So, now when I was researching …
Category: Data Science

How to generate syntactically correct text for CRNN-CTC text model?

Disregarding the image creation and labeling details, is there a way to generate syntactically correct text examples? As of my current understanding of the CTC model, it takes into consideration the likelihood of a given letter preceding or following another in a given sequence. For example: "Colorless green ideas sleep furiously" The sentence doesn't make sense however, it has a proper syntax: each word has a few vowels, verbs are where they should be, ... I want the word generator …
Category: Data Science

Distractor Generation for Multiple Choice Questions

I'm currently working on generating distractor for multiple choice questions. Training set consists of question, answer and 3 distractor and I need to predict 3 distractor for test set. I have gone through many research papers regarding this but the problem in my case is unique. Here the problem is the questions and answers are for a comprehension(usually a big passage of text story) but the comprehension based on which is not given nor any supporting text is given for …
Category: Data Science

predicting next jobtitle

I have a dataset of which has 30M rows each like [current_jobtitles, nextjobtitles]. [['junior software programmer', 'senior software programmer'], ['senior software programmer', 'lead software programmer'], ['sales associate', 'regional sales associate']] I want to build a deep learning model to predict the nextjobtitle when a currenttitle is given. Are there any ways that I could acheieve this using some deep learning model? if yes, what kind of model ? Can we use any of the text generation models for this scenario …
Category: Data Science

Is is possible to make a text generator with sklearn?

So recently I made a Tensorflow model using RNN (Recurrent neural networks) and I was wondering if it was possible with sklearn too, through the usage of SVMs or Naive bayes. I searched up many articles on google but didn't find a feasible solution to it. So is it even possible to build a text generator with sklearn? If so, whats the base code to make that model?
Category: Data Science

How effective is text generation?

I have implemented some basic models like composing a poem using the dataset of poems. But the results were not that good in general. I want to make a model that could write an essay for me. One strong motivation for making this essay writing model is that it would help me to escape my tedious and useless college assignments. But before I proceed and hope for making such a human-like model that could write an essay for me, I …
Category: Data Science

Transform NL Text to DSL using NN/ML approach

Essentially I have a Corpus of a multitude of system requirements given in a natural language. An example requirement can look like this: When the Gear shifter is put into Drive, the Car should start moving forward! my task was to develop a DSL that models these NL requirements. I have developed certain keywords in my DSL and the requirement above translated in my DSL would look like this: The Component: Gear shifter if: put into Drive then: Car should …
Category: Data Science

Generation of medical institution names: training corpora?

My question is quite similar to this one: Generation of institution names. I need to be able to produce 'fake' names of medical institutions, specifically to create data for unit tests. Unfortunately, simple tools like Faker do not work well for this task, so I am interested in a more sophisticated solution, possibly involving some NER model(s). My question here is where can I get text corpora for training the model? The texts must contain (human-)recognizable names of medical institutions, …
Category: Data Science

English to "basic English" translation

I'd like to build something (ideally in Python) that can translate an English sentence into "basic" English. Are there any free/open-source tools/frameworks that can help? If not, what kind of steps can help solve this problem? (e.g., building on existing work like WordNet or pre-trained word embeddings). By "basic" I mean things like being concise (avoiding unnecessary verbiage) and using well-known words (without compromising too much on meaning). I might even consider "broken" English, where verbs for example are lemmatised …
Category: Data Science

LSTM Text Generation with Pytorch

I am currently trying quote generation (character level) with LSTMs using Pytorch. I am currently facing some issues understanding exactly how the hidden state is implemented in Pytorch. Some details: I have a list of quotes from a character in a TV series. I am converting those to a sequence of integers with each character corresponding to a certain integer by using a dictionary char2idx. I also have the inverse of this idx2char where the mapping is reversed. After that, …
Category: Data Science

Pytorch: understanding the purpose of each argument in the forward function of nn.TransformerDecoder

According to https://pytorch.org/docs/stable/generated/torch.nn.TransformerDecoder.html, the forward function of nn.TransformerDecoder contemplates the following arguments: tgt – the sequence to the decoder (required). memory – the sequence from the last layer of the encoder (required). tgt_mask – the mask for the tgt sequence (optional). memory_mask – the mask for the memory sequence (optional). tgt_key_padding_mask – the mask for the tgt keys per batch (optional). memory_key_padding_mask – the mask for the memory keys per batch (optional). Unfortunately, Pytorch's official documentation on the function isn't …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.