Using GRU with FeedForward layers in Python

I'm trying to reproduce the codes in this paper here for the multi-labeling problem (11 classes), which is using

1- Embedding layer 
2- GRU 
3- two Feed forward Layers with the ReLU activation function 
4- sigmoid unit.

I've tried to run the codes, but it is showing the following error:

ValueError: Error when checking target: expected dense_5 to have 3 dimensions, but got array with shape (6838, 11)

Edit: The error is fixed. I changed the "return_sequences" to False, and removed flatten() to fix the error.

My code: i'm not sure if 2 Feedforward layers are correct. in the paper it stated FF1:1024 units, and FF2: 512 units. with mini-batch size of 32. How can I state it in the code?

target_input=Input(shape=(max_length, ))

target_embedding=Embedding(input_dim=vocabulary_size, output_dim=embedding_dims, #embedding_matrix]
                           input_length=max_length, weights=[embedding_matrix] , trainable=False)(target_input) 

#target_embedding=Dropout(0.3)(target_embedding)

target_gru1=Bidirectional(GRU(units=200, return_sequences=True, dropout=0.3, recurrent_dropout=0.3))(target_embedding)
target_gru=Bidirectional(GRU(units=200, return_sequences=False, dropout=0.3, recurrent_dropout=0.3))(target_gru1)



# target_gru=Dropout(0.3)(target_gru)

#2 feedforward layers
# target_output1=Activation("relu")(target_gru)
# target_output2=Activation("relu")(target_output1)

FF1 = Dense(1024)(target_gru)
target_output1=Activation("relu")(FF1)
FF2 = Dense(512)(target_output1)


target_output=Dense(units=11, activation="sigmoid")(FF2)#target_output2)
target_model=Model(target_input, target_output)
## configuring model for training:
opt = Adam(lr=0.0001)#lr=0.001,decay=0.5
target_model.compile(optimizer=opt,loss="binary_crossentropy", metrics=["categorical_accuracy"])

and here is the layers

Topic gru lstm keras deep-learning python

Category Data Science


The error is caused by return_states = True. You set it to True only if you feed the output of a recurrent layer to another. The "states" are the hidden states of recurrent cells, that could be fed to Dense() layers only though Flatten().

Moreover, I suggest you to delete the Dropout() layer. Don't put it after an Embedding(), it contains information (i.e. the learned representation of words/chars) that is not safe to be distorted by dropout. This is just a suggestion of course.


EDIT:

You need return_sequences = True only when a recurrent layer outputs to another recurrent layer. If the following layer is Dense(), then you can drop return_sequences = True and also Flatten().

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.