I have a group of non zero sequences with different lengths and I am using Keras LSTM to model these sequences. I use Keras Tokenizer to tokenize (tokens start from 1). In order to make sequences have the same lengths, I use padding. An example of padding: # [0,0,0,0,0,10,3] # [0,0,0,0,10,3,4] # [0,0,0,10,3,4,5] # [10,3,4,5,6,9,8] In order to evaluate if the model is able to generalize, I use a validation set with 70/30 ratio. In the end of each epoch …
I have a DNA sequence dataset each mapped to a certain class. e,g TCAGCCGAGAGCTCATCGATCGTACGT 2 ATGCAGTGCATCGATCGATCGTAGAAC 3 Where the number after the sequence specifies the type of protein this sequence belongs to. So my question can I use KMers and 1-HOT-Coding to classify these sequences through biLSTM. Or this is not a possible concept and I would appreciate your feedback and suggestions on this task as I am new to Deep Learning. Thank you.
I see two different ways of applying attention in seq2seq: (a) the context vector (the weighted sum of encoder hidden states) fed into the output softmax, as shown in the diagram below. The diagram is from here. (b) the context vector fed into the decoder input as shown the diagram below. The diagram is from here. What are the pros and the cons of the two different approaches? Is there any paper comparing the two?
I have one single, very long time series. I want to train an LSTM to distinguish between two behaviours (A or B) at every timestep (sequence-to-sequence). Because the time series is very long, I plan to extract shorter, partially-overlapping subsequences and use each of them as one training input for the LSTM. In my train/validation/test split, do I have to use older subsequences for training and newer for validation and test? Or can I treat them as if they were …
I'm trying to build an encoder-decoder network in Keras to generate a sentence of a particular style. As my problem is unsupervised i.e. I don't have the ground truths for the generated sentences, I use a classifier to help during training. I pass the decoder's output into the classifier to tell me what style the decoded sentence is. The decoder outputs a softmax distribution which I was intending to feed straight into the classifier but I realised that it has …
I am a machine learning newbie and I am working on a project where I'm given a sequence of integers all of which are in the range 0 to 70. My goal is to predict the next integer in the sequence given the previous 5 integers in the same sequence. There isn't much more information on the sequence of integers itself (for example, how was the sequence obtained, etc). The following are the things I tried. The first thing that …
I'm working on NMT model which the input and the target sentences are from the same language (but the grammar differs). I'm planning to pre-train and use BERT since I'm working on small dataset and low/under resource language. so is it possible to feed BERT to the seq2Seq encoder/decoder?
I have this problem scenario - Given a set of tokens, string them or a subset of the tokens together using stop words into a sequence. I am clear that I can have potentially infinite pre-training data for this problem. For example, given the set of tokens {cat, jump, mouse} - possible outputs might be: a. the cat jumped on a mouse, b. the cat and the mouse jumped, c. cats jump and so on... I am not sure if …
Based on this blog entry, I have written a sequence to sequence deep learning model in Keras: model = Sequential() model.add(LSTM(hidden_nodes, input_shape=(n_timesteps, n_features))) model.add(RepeatVector(n_timesteps)) model.add(LSTM(hidden_nodes, return_sequences=True)) model.add(TimeDistributed(Dense(n_features, activation='softmax'))) model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) model.fit(X_train, Y_train, epochs=30, batch_size=32) It works reasonably well, but I intend to improve it by applying attention mechanism. The aforementioned blog post includes a variation of the architecture with it by relying on a custom attention code, but it doesn't work my present TensorFlow/Keras versions, and anyway, to my …
I have several sequences of univariate real-valued time-series data. The sequences are of different lengths and right now I cannot batch them and feed them to a network. What is the correct procedure to pad these sequences? Is it even possible in this case since I can't use any number as a special symbol? UPDATE 1 I'm working with arbitrary univariate time-series data (not related to one specific domain, unbounded range). To give example of one such a series consider …
I'm currently working on an extractive summary model based on Facebook's BART model. Consistent absolute length output would be highly desirable. The problem is that input length may vary wildly. That is to say, creating the training data, the instructions look like this: Take the input text (a news article) and start (recursively) deleting examples, excess details, unnecessary background information, quotes, etc. Once your summary has less than 90 words, stop deleting stuff. Fix up the text format to match …
I'm implementing a sequence-2-sequence model with RNN-VAE architecture, and I use an attention mechanism. I have problem in the decoder part. I'm struggling with this error: IndexError: list index out of range When I run this code: decoder_inputs = Input(shape=(len_target,)) decoder_emb = Embedding(input_dim=vocab_out_size, output_dim=embedding_dim) decoder_lstm = LSTM(units=units, return_sequences=True, return_state=True) decoder_lstm_out, _, _ = decoder_lstm(decoder_emb(decoder_inputs), initial_state=encoder_states) print("enc_outputs", encoder_outputs.shape) # ==> (?,256) print("decoder_lstm_out", decoder_lstm_out.shape)# ==> (?,12,256) print("zzzzzz", z.shape) # ==> (?,256) attn_layer = AttentionLayer(name='attention_layer') attn_out, attn_states = attn_layer([z,z], decoder_lstm_out) The error is …
I have a problem statement when I need to find all the tasks that the server had to do based on a complex task. Example, in a 3D modeling scenario, if the model is queried with a complex task such as "rotate" then the response should be something like: Select the object Rotate the object Can we make this model learn on data that is manually prepared and then tune the model such that it can predict more complex tasks?
I was reading the paper neural_approach_conversational_ai.pdf. And in the section Seq2Seq for Text Generation there is a formula that i feel a bit wrong [1]: https://i.stack.imgur.com/sX0it.png Can someone help me confirm this formula?
I am attempting to use a Seq2Seq model to make forecasts of factory production data using an Encoder-Decoder model augmented with Attention. I have become a little stuck as the output of the model seems to be a constant and has the same size sequence length as the input, where in fact I would like to be able to specify that say I want to forecast 3 (or any number of) months into the future. Here is 2 diagrams of …
(I am working on Jupter notebook with python version 3.6.12, running Tensorflow 2.4.0 version.) I have a dataset that consists of 5 input features and 3 output features (that requires to be predicted). My features are string values of integers and looks like as follows: Input (training) features: A B C D E 57 00101 01000 01001 01000 00110 203 00111 01001 01000 01000 00110 559 00010 01001 01001 01000 00110 247 00101 01001 01001 01000 00110 1111 00111 01001 …
Assume a simple LSTM Followed by Attention layer or a full transformer architecture. The attention weights are learnt during training, which get multiplied with keys, queries and values. Please correct if my above understanding is wrong or below question. The question is, when these weights of attention layer gets changed and when not. Do attention layer weights change for each input in sequence? (I assume no, but please confirm) Do attention layer weights get frozen during prediction (inference)? Or these …
I am working on a problem on Named Entity Recognition. Given a text, my model is detecting the Named Entities and extracting that info for the end-user. Now the ask is end-user needs a confidence score along with the extracted entity. For example, the given text is: XYZ Bank India Limited is a good place to invest your money - Our model is detecting XYZ Bank as an Org, but India as a Location (which is wrong - the whole …
I am trying to implement early stopping to my model where I am performing Machine Translation using Seq2Seq with attention. I am mostly used to writing my own models in steps, something like this: for activation in activations: for layer1 in layers1: for optimizer in optimizers: # define model model_vanilla_lstm = Sequential() model_vanilla_lstm.add(LSTM(layer1, activation=activation, input_shape=(n_step, n_features))) model_vanilla_lstm.add(Dense(1)) #compile model model_vanilla_lstm.compile(optimizer=optimizer, loss='mse') #Early Stopping earlyStop=EarlyStopping(monitor="val_loss",mode='min',patience=5) # fit model history = model_vanilla_lstm.fit(X, y, epochs=epoch, validation_data=(X_test,dataset_test['Close']) , verbose=1, callbacks=[earlyStop]) #Summary of the model …
Given standard transformer architecture with encoder and decoder. What happens when the input for the encoder is shorter than the expected output from the decoder? The decoder is expecting to receive value and key tensors from the encoder which size is dependent on the amount of input token. I could solve this problem during training by padding input and outputs to the same size. But how about inference, when I don't know the size of the output? Should I make …