ngrams

In smoothing of n-gram model in NLP, why don't we consider start and end of sentence tokens?

KGhatak

2022年4月19日 08:00

When learning Add-1 smoothing, I found that somehow we are adding 1 to each word in our vocabulary, but not considering start-of-sentence and end-of-sentence as two words in the vocabulary. Let me give an example to explain. Example: Assume we have a corpus of three sentences: "John read Moby Dick", "Mary read a different book", and "She read a book by Cher". After training our bi-gram model on this corpus of three sentences, we need to evaluate the probability of …

Topic: stanford-nlp ngrams language-model nlp

Category: Data Science

Does N-gram language model for text generation are more efficient than Neural Network language models?

Tejax

2022年4月17日 22:38

I recently build an language model with N-gram model for text generation and for change I started exploring Neural Network for text generation. One thing I observed that the previous model results were better than the LSTM model even when both where built using same corpus.

Topic: lstm text-generation ngrams rnn nlp

Category: Data Science

How to keep only the top k-frequent ngrams in a text field with pandas?

Hing

2022年4月12日 15:10

How to keep only the top k-frequent ngrams in a text field with pandas? For example, I've a text column. For every row in it, I only want to keep those substrings that belong to the top k-frequent ngram in the list of ngrams built from the same columns with all rows. How should I implement it on a pandas dataframe?

Topic: ngrams

Category: Data Science

Application of bag-of-ngrams in feature engineering of texts

Hing

2022年4月11日 11:23

I've got few questions about the application of bag-of-ngrams in feature engineering of texts: How to (or can we?) perform word2vec on bag-of-ngrams? As the feature space of bag of n-gram increases exponentially with 'N', what (or are there?) are commonly used together with bag-of-ngrams to increase computational and storage efficiency? Or in general, does bag of n-gram used alongside with other feature engineering techniques when it's involved in transforming a text fields into a field of text feature?

Topic: ngrams feature-engineering word-embeddings

Category: Data Science

FastText Model Explained

Black Jack 21

2022年3月13日 19:03

I was reading the FastText paper and I have a few questions about the model used for classification. Since I am not from NLP background, some I am unfamiliar with the jargon. In the figure, what exactly is are the $x_i$? I am not sure what $N$ ngram features mean. If my document has total $L$ words, then how can I represent the entire document using $N$ variables ($x_1$,..,$x_n$)? What exactly is $N$? $$-\frac{1}{N}\sum_{n=1}^Ny_n\log(f(BAx_n)) $$ If $y_n$ is the label, …

Topic: fasttext ngrams nlp

Category: Data Science

Understanding Kneser-Ney Formula for implementation

Wolfy

2022年2月11日 19:13

I am trying to implement this formula in Python $$ \frac{\text{max}(c_{KN}(w^{i}_{i-n+1} - d), 0)}{c_{KN}(w^{i-1}_{i-n+1})} + \lambda(c_{KN}(w^{i-1}_{i-n+1})\mathbb{P}(c_{KN}(w_{i}|w^{i-1}_{i-n+2})$$ where $$ \mathrm{c_{KN}}(\cdot) = \begin{cases} \text{count}(\cdot) & \text{for the highest order } \\ % & is your "\tab"-like command (it's a tab alignment character) \text{continuationcount}(\cdot) & \text{otherwise.} \end{cases} $$ Following this link here I was able to understand how to implement the first half of the equation namely $$\frac{\text{max}(c_{KN}(w^{i}_{i-n+1} - d), 0)}{c_{KN}(w^{i-1}_{i-n+1})} $$ but the second half specifically at the moment the $\lambda(c_{KN}(w^{i-1}_{i-n+1})$ term …

Topic: mathematics ngrams language-model nlp

Category: Data Science

N-Gram Smoothing

Wolfy

2022年1月28日 17:40

I am wondering if there is a good example out there that compares N-Gram with various smoothing techniques. I found this notebook that applies Laplace transform but that is about it. Any suggestions are greatly appreciated.

Topic: ngrams

Category: Data Science

Size Matrix features after applying 6 1D Kernels on one-hot encoded vectors

Avv

2021年12月21日 01:26

Suppose we are building the following model to build a neural network over one-hot encoded vectors of characters: For a given dataset, it’s not reasonable to read the whole text! So, we take some characters of text, say 1014. Then we apply 1D convolutions + pooling 6 times and we use the following kernels width: o Kernels width: 7,7,3,3,3,3 o We will apply 1024 filters on the same data. Since we apply the same process six times, we will get …

Topic: convolutional-neural-network ngrams nlp

Category: Data Science

Classifying short strings of text with additional context

Jivan

2021年10月6日 11:31

I have a list of short strings each identifying a city. Misspellings are very common. The example below shows some of these short strings, along with the correct city they're supposed to match. string city amsterdam amsterdam asmterddam amsterdam amstterdm amsterdam new york new york new yrok new york nwe york new york neew york new york nw york new york I would like to train a classifier that takes the input string and then predict the most likely city …

Topic: text-classification text ngrams decision-trees neural-network

Category: Data Science

How do I get ngrams for all combinations of words in a sentence?

user16584277

2021年9月3日 17:25

Lets say I have a sentence "I need multiple ngrams". If I create bigrams using Tf idf vectorizer it will create bigrams only using consecutive words. i.e. I will get "I need", "need multiple", "multiple ngrams". How can I get "I mutiple", "I ngrams", "need ngrams"?

Topic: ngrams tfidf nlp machine-learning

Category: Data Science

Usage of KL divergence to improve BOW model

Balocre

2021年7月10日 05:19

For a university project, I chose to do sentiment analysis on a Google Play store reviews dataset. I obtained decent results classifying the data using the bag of words (BOW) model and an ADALINE classifier. I would like to improve my model by incorporating bigrams relevant to the topic (Negative or Positive) in my features set. I found this paper which uses KL divergence to measure the relevance of unigrams/bigrams relative to a topic. The only problem is that I …

Topic: bag-of-words ngrams classification

Category: Data Science

The best way/instruments to use a communication protocol messages in hex form as input parameter for machine/deep learning (n-grams?)

r4t31

2021年7月9日 13:36

I am trying to categorize server software versions based on server responses to various slightly different hex messages. To extract the ML input parameters from these hex messages I suppose to use the n-grams method. Can you please advise on some other methods that can be used to identify ML input parameters from hex messages? Of course, I can do it manually but probably exists an automated solution. What tools/libraries better to use to apply the n-grams method to communication …

Topic: text-classification ngrams classification

Category: Data Science

Artificially increasing frequency weight of word ending characters in word building

Matt

2021年5月20日 22:00

I have a database of letter pair bigrams. For example: +-----------+--------+-----------+ | first | second | frequency | +-----------+--------+-----------+ | gs | so | 1 | | gs | sp | 2 | | gs | sr | 1 | | gs | ss | 3 | | gs | st | 7 | | gt | th | 2 | | gt | to | 10 | | gu | u | 2 | | Gu | ua | …

Topic: ngrams markov-process python machine-learning

Category: Data Science

N-grams for RNNs

mojbius

2021年4月30日 02:08

Given a word $w_{n}$ a statistical model such a Markov chain using n-grams predicts the subsequent word $w_{n+1}$. The prediction is by no means random. How is this translated into a neural model? I have tried tokenizing and sequencing my sentences, below is how they are prepared to be passed to the model: train_x = np.zeros([len(sequences), max_seq_len], dtype=np.int32) for i, sequence in enumerate(sequences[:-1]): #using all words except last for t, word in enumerate(sequence.split()): train_x[i, t] = word2idx(word) #storing in word …

Topic: lstm ngrams rnn nlp

Category: Data Science

NLP: find the best preposition for connecting parts of a sentence

Liza Savenko

2021年4月5日 23:27

My task is to connect 2-3 parts of the sentence into one whole using a preposition the first part is some kind of action. Ex. "take pictures" the second part is an object that can consist of only one noun or a noun with adjectives and additions dependent on it. Ex. "juicy cherry pie", "squirrel" the third part is a place. Ex. "room", "London" To solve this task I've already tried some options such as generation using GPT-2 (or other …

Topic: ngrams nlp

Category: Data Science

ngram and RNN prediction rate wrt word index

Arkantus

2021年3月29日 22:01

I tried to plot the rate of correct predictions (for the top 1 shortlist) with relation to the word's position in sentence : I was expecting to see a plateau sooner on the ngram setup since it needless context. However, one thing I wasn't expecting was that the prediction rate drops. In my understanding since we already have a context of 3 words, the plateau should converge asymptotically to its highest value. But both the recurrent network and the Ngram …

Topic: ngrams rnn neural-network nlp statistics

Category: Data Science

what is the training phase in N-gram model?

black sheep 369

2021年2月20日 12:50

Following is my understanding of N gram model used in text prediction case : Given a sentence say, " I love my " (say N = 1 /bigram), using N gram and say 4 possible candidates ( country, family, wife, school) I can estimate the conditional probability on each of the candidates and take the one with highest probability as the next word. Question : I understand the probability part of the model but to even get to the probability, …

Topic: ngrams nlp

Category: Data Science

Shouldn't ROUGE-1 precision be equal to BLEU with w=(1, 0, 0, 0) when brevity penalty is 1?

jdepoix

2020年12月26日 10:39

I am trying to evaluate a NLP model using BLEU and ROUGE. However, I am a bit confused about the difference between those scores. While I am aware that ROUGE is aimed at recall whilst BLEU measures precision, all ROUGE implementations I have come across also output precision and the F-score. The original ROUGE paper only briefly mentions precision and the F-score, therefore I am a bit unsure about what meaning they have to ROUGE. Is ROUGE mainly about recall …

Topic: ngrams evaluation nlp

Category: Data Science

Self Organising Map with variable length ordered sets of N-grams

Cookies

2020年12月20日 22:15

I want to preface my question with the highlighted situation I have might not be applicable to kohonen self organising maps (SOM) due to a lack of understanding on my part so I do apologise if that is the case. If this the case I would greatly appreciate any suggestions on alternative methods to compare the similarities for my given input data. I am trying to create a self organising map for the similarity comparison between the ngram ordered set …

Topic: features ngrams feature-selection

Category: Data Science

For an n-Gram model with n>2, do we need more context at end of each sentence?

KGhatak

2020年10月5日 15:06

Jurafsky's book says we need to add context to left and right of a sentence: Does this mean, for example, if we've a corpus of three sentences: "John read Moby Dick", "Mary read a different book", and "She read a book by Cher"; and after training our tri-gram model on this corpus of three sentences, we need to evaluate the probability of a sentence "John read a book", i.e. to find $P(John\; read\; a\; book)$ as below, $P(John\; read\; a\; …

Topic: stanford-nlp ngrams language-model nlp

Category: Data Science

About