language-model

How to predict the sentiment of the entities form the tweet?

coding_ninza

2022年6月4日 17:05

I have a JSON file (tweets.json) that contains tweets (sentences) along with the name of the author. Objective 1: Get the most frequent entities from the tweets. Objective 2: Find out the sentiment/polarity of each author towards each of the entities. Sample Input: Assume we have only 3 tweets: Tweet1 by Author1: Pink Pearl Apples are tasty but Empire Apples are not. Tweet2 by Author2: Empire Apples are very tasty. Tweet3 by Author3: Pink Pearl Apples are not tasty. Sample …

Topic: spacy stanford-nlp sentiment-analysis language-model nlp

Category: Data Science

When using padding in sequence models, is Keras validation accuracy valid/ reliable?

Amir Jalilifard

2022年6月2日 16:17

I have a group of non zero sequences with different lengths and I am using Keras LSTM to model these sequences. I use Keras Tokenizer to tokenize (tokens start from 1). In order to make sequences have the same lengths, I use padding. An example of padding: # [0,0,0,0,0,10,3] # [0,0,0,0,10,3,4] # [0,0,0,10,3,4,5] # [10,3,4,5,6,9,8] In order to evaluate if the model is able to generalize, I use a validation set with 70/30 ratio. In the end of each epoch …

Topic: sequence-to-sequence keras tensorflow language-model machine-learning

Category: Data Science

Optimal input setup for character-level text classification RNN

Dave White

2022年5月14日 05:05

I want to classify 500-character long text samples as to whether they look like natural language using a character-level RNN. I'm unsure as to the best way to feed the input to the RNN. Here are two approaches I've thought of: Provide the whole 500 characters (one per time step) to the RNN, and predict a binary class, $\{0,1\}$. Provide shorter overlapping segments (e.g. 10 characters) and predict the next (e.g. 11th) character. Convert this to classification by taking the …

Topic: text-classification rnn neural-network language-model nlp

Category: Data Science

Clarification on "predict the next character given the previous 100 characters"

tossimmar

2022年5月6日 16:25

I am studying Justin Johnson's lecture on RNNs Lecture recording: https://www.youtube.com/watch?v=dUzLD91Sj-o&list=PL5-TkQAfAZFbzxjBHtzdVCWE0Zbhomg7r&index=12&t=3177s One of the examples is character level language modeling: predicting the next character given the previous characters. At 33:03 in the video linked above, Justin discusses training an RNN that processes the works of William Shakespeare and tries to predict the next character given the previous 100 characters. What does given the previous 100 characters mean? In the lecture slides Slides link: https://web.eecs.umich.edu/~justincj/slides/eecs498/498_FA2019_lecture12.pdf there are the following figures: It …

Topic: rnn deep-learning language-model

Category: Data Science

Transfer learning between Language Model and classification

Maverick Meerkat

2022年5月1日 07:02

Following this fast.ai lecture, I am trying to understand the mechanism of Transfer Learning in NLP from a general Language Model (LM) to a classification problem. What is exactly taken from the Language Model training? Is it just the word embeddings? Or is it also the weights of the LSTM cell? The architecture of the neural net should be quite different - where in a LM you would output a prediction after every sequence-step, in a classification problem you would …

Topic: transfer-learning classification language-model nlp

Category: Data Science

In smoothing of n-gram model in NLP, why don't we consider start and end of sentence tokens?

KGhatak

2022年4月19日 08:00

When learning Add-1 smoothing, I found that somehow we are adding 1 to each word in our vocabulary, but not considering start-of-sentence and end-of-sentence as two words in the vocabulary. Let me give an example to explain. Example: Assume we have a corpus of three sentences: "John read Moby Dick", "Mary read a different book", and "She read a book by Cher". After training our bi-gram model on this corpus of three sentences, we need to evaluate the probability of …

Topic: stanford-nlp ngrams language-model nlp

Category: Data Science

Importance of Random initialisation VS number of hidden units

Arkantus

2022年4月18日 18:14

A question crossed my mind not so long ago: I am doing experiments on Language Model with RNN (always with the same network topology: 50 hidden units, and 10M "directs connections" that are emulating N_grams models) and different fraction of corpus (10,25,50,75,100%) (9M words). I noticed that while perplexity seems to decrease when the training data become more abundant, certain times it does not. Last example : 143 118 109 106 112 My first thought was network initialization, so I …

Topic: rnn neural-network language-model

Category: Data Science

how to improve my imbalanced data NLP model?

Madhur Yadav

2022年4月15日 21:02

I want to classify a patient's health as a prediction probability and get the top 10 most ill patients in a hospital. I have patient's condition notes, medical notes, diagnoses notes, and lab notes for each day. Current approach - vectorize all the notes using spacy's scispacy model and sum all the vectors grouped by patient id and day. (200 columns) find the unit vectors of the above vectors. (200 columns) use a moving average function on the vectors grouped …

Topic: allennlp bert language-model nlp

Category: Data Science

Question about computing language modeling loss with multi gpu

Mike Lee

2022年4月12日 13:46

When training BERT or GPT or other language model, we use the mean of cross entropy as loss function(don't consider label smoothing). Here B denote for batch size, len denote target length of i-th sentence sequence. $$L = \frac{\sum_{i=0}^{|B|}\sum_{j=0}^{len_i}ce(y_{ij},\hat{y}_{ij})}{\sum_{i=0}^{|B|}{len_i}} .......(1)$$ When use multi gpu, the common forward process is: Split data to each gpu; Compute loss on each gpu; Reduce loss;(Most of time we simply use the mean of loss from all gpus) Now if we combine those above together, …

Topic: bert deep-learning language-model nlp

Category: Data Science

State-of-the-art Python packages that can evaluate language similarity

LearnerAL

2022年3月21日 03:04

I am trying to evaluate the likelihood of generating a specific sentence out of a large set of sentences. To do this, I start from a simple approach: training a custom n-gram language model and calculating the perplexity values for a list of sentences. I found that the package KenLM (https://www.aclweb.org/anthology/W11-2123/) was often used to do this task. However, it's kind of old (published in 2011). On the other hand, I noticed that the two most famous state-of-the-art NLP packages, …

Topic: language-model nlp similarity

Category: Data Science

The differences between BNF and JSGF in NLP?

Lerner Zhang

2022年3月16日 14:03

I wonder what the differences are between the BNF(Backus-Naur Form) and JSGF(Java Speech Grammar Format)? The former is a kind of context-free grammar taught in CS224, but I learned that the latter is also being used. Could anyone tell me which one is better and what are their differences?

Topic: language-model nlp

Category: Data Science

A multi label text classification problem

Yassine

2022年3月14日 18:42

I'm looking to solve a multi label text classification problem but I don't really know how to formulate it correctly so I can look it up.. Here is my problem : Say I have the document "I want to learn NLP. I can do that by reading NLP books or watching tutorials on the internet. That would help me find a job in NLP." I want to classify the sentences into 3 labels (for example) objective, method and result. The …

Topic: text-classification multiclass-classification language-model nlp

Category: Data Science

Can Domain-Adaption improve the performance of Sentiment Analysis?

Mahdi Amrollahi

2022年2月12日 21:17

Does Domain Adaption have any effect of results in Sentiment Analysis? I am going to train a BERT language model based on some texts particularly in Health area, then I want to apply Opinion Mining on that to find which text carries positive or negative sentiment. I have run it on pre-trained BERT and get some results, however my question is that, does Domain Adaption help me to increase the performance of my model?

Topic: bert sentiment-analysis language-model

Category: Data Science

A simple attention based text prediction model from scratch using pytorch

Eka

2022年2月12日 03:03

I first asked this question in codereview SE but a user recommended to post this here instead. I have created a simple self attention based text prediction model using pytorch. The attention formula used for creating attention layer is, I want to validate whether the whole code is implemented correctly, particularly my custom implementation of Attention layer. Full code import torch import torch.nn as nn import torch.optim as optim import torch.nn.functional as F import random random.seed(0) torch.manual_seed(0) # Sample text …

Topic: attention-mechanism sequence-to-sequence pytorch language-model nlp

Category: Data Science

Understanding Kneser-Ney Formula for implementation

Wolfy

2022年2月11日 19:13

I am trying to implement this formula in Python $$ \frac{\text{max}(c_{KN}(w^{i}_{i-n+1} - d), 0)}{c_{KN}(w^{i-1}_{i-n+1})} + \lambda(c_{KN}(w^{i-1}_{i-n+1})\mathbb{P}(c_{KN}(w_{i}|w^{i-1}_{i-n+2})$$ where $$ \mathrm{c_{KN}}(\cdot) = \begin{cases} \text{count}(\cdot) & \text{for the highest order } \\ % & is your "\tab"-like command (it's a tab alignment character) \text{continuationcount}(\cdot) & \text{otherwise.} \end{cases} $$ Following this link here I was able to understand how to implement the first half of the equation namely $$\frac{\text{max}(c_{KN}(w^{i}_{i-n+1} - d), 0)}{c_{KN}(w^{i-1}_{i-n+1})} $$ but the second half specifically at the moment the $\lambda(c_{KN}(w^{i-1}_{i-n+1})$ term …

Topic: mathematics ngrams language-model nlp

Category: Data Science

Transformer model comparison for binary sentiment classification

nlpkind

2022年2月9日 12:47

On two independent datasets, I am comparing XLNet and BERT models with binary sentiment classification tasks: the Twitter dataset, where sentences are short, and the IMDB review dataset, where sentences are long. On the Twitter dataset, BERT matches and slightly outperforms XLNet, but XLNet outperforms BERT on the IMDB dataset. I understand that XLNet captures longer dependencies due to the Transformer XL architecture and so outperforms BERT; but, what additional reasons may exist for one to outperform the other for …

Topic: binary-classification bert sentiment-analysis language-model nlp

Category: Data Science

What is the difference between model hyperparameters and model parameters?

minerals

2022年2月1日 05:51

I have noticed that such terms as model hyperparameter and model parameter have been used interchangeably on the web without prior clarification. I think this is incorrect and needs explanation. Consider a machine learning model, an SVM/NN/NB based classificator or image recognizer, just anything that first springs to mind. What are the hyperparameters and parameters of the model? Give your examples please.

Topic: hyperparameter parameter language-model machine-learning

Category: Data Science

Sequence-to-Sequence Transformer for Neural machine translation

Fhunmie

2022年1月29日 16:22

I am using the tutorial in Keras documentation here. I am new to deep learning. On a different dataset Menyo-20k dataset, of about 10071 total pairs, 7051 training pairs,1510 validation pairs,1510 test pairs. The highest validation accuracy and test accuracy I have gotten is approximately 0.26. I tried the list of things below: Using the following optimizers: SGD, Adam, RMSprop Tried different learning rate Tried the dropout rate of 0.4 and 0.1 Tried using different embedding dimensions and feed-forward network …

Topic: transformer keras deep-learning language-model nlp

Category: Data Science

How do we pass data to a RNN?

MiloMinderbinder

2022年1月17日 13:01

Let's say we have A1, A2, ... , Am different articles in the corpus and each of them has W1, W2, ....., Ww words. We are training a language model on them. Do we: Scheme 1 Take the first batch of data as first S (Number of time steps) (S1, S2, .., Ss)words from each article (for the sake of simplicity let us assume batch size = m) Set the initial hidden state H0 = $[0,0,..,0]$ Calculate loss and gradient …

Topic: rnn deep-learning language-model nlp

Category: Data Science

Why not rule-based semantic role labelling?

thesofakillers

2022年1月8日 11:34

I have recently found some interest in automatic semantic role labelling. Most introductory texts (e.g. Jurafsky and Martin, 2008) present approaches based on supervised machine learning, often using FrameNet (Baker et al. 1998) and PropBank (Kingsbury & Palmer, 2002). Intuitively however, I would imagine that the same problem could be tackled with a grammar-based parser. Why is this not the case? Or rather, why would these supervised solutions be preferred? Thanks in advance. References Jurafsky, D., & Martin, J. H. …

Topic: text parsing language-model nlp

Category: Data Science

About