where to start in natural language processing for a language

My native language is a regional language and few people speak it. I have some assignements in a machine learning course and i was thinking about doing some natural languge processing on my native language but i don't know where to start since there is almost no research about this language ( no corpus , no research papers , ... ) and i'm new to machine learning. I want to start doing everything from bottom and i want to do …
Category: Data Science

Is it possible feed BERT to seq2seq encoder/decoder NMT (for low resource language)?

I'm working on NMT model which the input and the target sentences are from the same language (but the grammar differs). I'm planning to pre-train and use BERT since I'm working on small dataset and low/under resource language. so is it possible to feed BERT to the seq2Seq encoder/decoder?
Category: Data Science

Attention network without hidden state?

I was wondering how useful the encoder's hidden state is for an attention network. When I looked into the structure of an attention model, this is what I found a model generally looks like: x: Input. h: Encoder's hidden state which feeds forward to the next encoder's hidden state. s: Decoder's hidden state which has a weighted sum of all the encoder's hidden states as input and feeds forward to the next decoder's hidden state. y: Output. With a process …
Category: Data Science

How to implement Early stopping in Neural Machine Translation with Attention or Transformers?

I am trying to implement early stopping to my model where I am performing Machine Translation using Seq2Seq with attention. I am mostly used to writing my own models in steps, something like this: for activation in activations: for layer1 in layers1: for optimizer in optimizers: # define model model_vanilla_lstm = Sequential() model_vanilla_lstm.add(LSTM(layer1, activation=activation, input_shape=(n_step, n_features))) model_vanilla_lstm.add(Dense(1)) #compile model model_vanilla_lstm.compile(optimizer=optimizer, loss='mse') #Early Stopping earlyStop=EarlyStopping(monitor="val_loss",mode='min',patience=5) # fit model history = model_vanilla_lstm.fit(X, y, epochs=epoch, validation_data=(X_test,dataset_test['Close']) , verbose=1, callbacks=[earlyStop]) #Summary of the model …
Category: Data Science

For an LSTM-based seq2seq model, is reversing the input still necessary or advised when using attention?

The original seq2seq paper reversed the input sequence and cited multiple reasons for doing so. See: Why does LSTM performs better when the source target is reversed? (Seq2seq) But when using attention, is there still any benefit to doing this? I imagine since the decoder has access to the encoder hidden states at each time step, it can learn what to attend to and the input can be fed in the original order.
Category: Data Science

WMT: What are the differences of WMT14, WMT15 and WMT16 datasets?

Each year, the Workshop on Statistical Machine Translation (WMT) holds a conference that focuses on new tasks, papers, and findings in the field of machine translation. Let's say we are talking about the parallel dataset Newscommentary. There is the Newscommentary in WMT14, WMT15, WMT16 and so on. How much does the dataset differ from each conference? Is it possible to read this somewhere?
Category: Data Science

Self Attention vs LSTM with Attention for NMT

I am trying to compare the A: Transformer-based architecture for Neural Machine Translation (NMT) from the Attention is All You Need paper, with B: an architecture based on Bi-directional LSTM's in the encoder coupled with a unidirectional LSTM in the decoder, which attends to all the hidden states of the encoder, creates a weighted combination and uses this along with decoder (unidirectional) LSTM output to produce final output word. My question is what might be the advantages of Architecture A …
Category: Data Science

Pytorch build_vocab_from_iterator giving vocabulary with very few words

I am trying to build a translation model in pytorch. Following this post on pytorch I downloaded the multi30k dataset and spacy models for English and German. python -m spacy download en python -m spacy download de import torchtext import torch from torchtext.data.utils import get_tokenizer from collections import Counter from torchtext.vocab import Vocab, build_vocab_from_iterator from torchtext.utils import download_from_url, extract_archive import io url_base = 'https://raw.githubusercontent.com/multi30k/dataset/master/data/task1/raw/' train_urls = ('train.de.gz', 'train.en.gz') val_urls = ('val.de.gz', 'val.en.gz') test_urls = ('test_2016_flickr.de.gz', 'test_2016_flickr.en.gz') train_filepaths = [extract_archive(download_from_url(url_base + …
Category: Data Science

Why does Bahdanau Attention Have to be Causal?

Using the Bahdanau attention layer on Tensorflow for time series prediction, although conceptually it is similar to NLP applications. This is how the minimal example code for a single layer looks like. import tensorflow as tf dim=7 Tq=5 # Number of future time steps to predict Tv=13 # Number of historic lag timesteps to consider batch_size=2**4 query=tf.random.uniform(shape=(batch_size, Tq, dim)) value=tf.random.uniform(shape=(batch_size, Tv, dim)) key=tf.random.uniform(shape=value.shape) layer=tf.keras.layers.AdditiveAttention(use_scale=True, causal=True) output, score=layer(inputs=[query, value, key], return_attention_scores=True) The score obtained in the last line seems to be …
Category: Data Science

Questions of understanding - Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation

I'm currently analysing the paper Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation (Post, Vilar 2018): https://arxiv.org/abs/1804.06609 I have understanding problems how the data is processed. For example: the paper is writing about beams, banks and hypothesises and I have no idea what these terms mean. How would you describe these terms and are there any tutorial sources you would recommend for understanding the dynamic beam allocation?
Category: Data Science

Multi-Head attention mechanism in transformer and need of feed forward neural network

After reading the paper, Attention is all you need, I have two questions: 1. What is the need of a multi-head attention mechanism? The paper says that: "Multi-head attention allows the model to jointly attend to information from different representation subspaces at different positions" My understanding is that it helps in anaphora resolution. For example:- "The animal didn't cross the street because it was too ..... (tired/wide)". Here "it" can refer to animal or street based on the last word. …
Category: Data Science

Passing Dependency/Constituency trees to a Neural Machine Translator

I am working on a project on Neural Machine Translation in the English-Irish domain. I am not an expert and have researched entirely on my own for a technology exhibition so apologies if my question is simple. I am trying to parse all of my English corpus to constituency trees. Of course, the format of a sentence when using the Stanford Parser is something like: (ROOT (S (NP (VBG cohabiting) (NNS partners)) (VP (MD can) (VP (VB make) (NP (NP …
Category: Data Science

Training NMT models for noisy social media roman text

I am trying to train an NMT model where the source side is roman text of Asian languages from social media, and target side is English. Note that since roman text is not native to Asia, the romanizations done by people to type on the Internet are very personal and hence a bit noisy, but easily intelligible to native speakers. The following is an example for writing a Hindi sentence in different ways: Vaise bhi mere paas jo bhi hai …
Category: Data Science

Paraphrasing a sentence and changing the tone of it

I am trying to make a model that is capable of translating a sentence into a new and a better form. I would like the model to change the tone and also give it some character. I am using this in my web app UI, simply allowing the users to witness new description as they refresh the page. For example, "You are logged out" -> "Looks like you have logged out". Something of such sort, any idea on this?
Category: Data Science

Algorithm to parse PSD into html/XML?

I have been working on a project and we were trying to convert a PSD (Adobe Photoshop) file to a HTML for web applications as well as a Layout XML for android. We worked our way to generate basic skeletal html/xml but hit a wall for complex scenarios such as identifying separate divs and components. Our initial approach was to standardize the PSD and get metadata about each component from PSD but due to it's limitations we could only add …
Category: Data Science


Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.