LSTM layer (keras) is causing all layers after it to constantly predict the same thing no matter the input
I have a model for OCR, which after 2-3 epochs gives the same output. When I predicted the values and looked at the output for each layer I realized that all layers after the 1st layer in the LSTM block output the same values no matter the output. Here is the model (or the parts related to the problem):
Processing = layers.Reshape((12,9472))(encoder)
Processing = layers.Dense(128, activation='relu')(Processing)
lstm = layers.Bidirectional(layers.LSTM(256, return_sequences = True))(Processing)
lstm = layers.Bidirectional(layers.LSTM(128, return_sequences = True))(lstm)
lstm = layers.Bidirectional(layers.LSTM(64, return_sequences = True))(lstm)
outputs = layers.Dense(358,activation=tf.keras.layers.LeakyReLU(alpha=0.1))(lstm)
outputs = layers.Dense(358, activation=tf.keras.layers.LeakyReLU(alpha=0.1))(outputs)
outputs = layers.Dense(l, activation='softmax',name='output')(outputs)
output = CTCLayer()(labels,outputs)
Everything after (and including) the 1st LSTM layer outputs the same value (not including the CTC layer thing due to it being removed for predictions).
Before picking the model apart I thought it may have been a dying relu problem so I replaced all of the activation functions which where relu with the leaky relu. Is there something wrong with my implementation? or what may be causing everything after the LSTM layer to die. How would I fix the underlying issue?
Another weird thing is that even after it outputs the same thing the loss values reduce for some time (so from 20.5 - 16.2), so its still learning. I'm pretty sure it has nothing to do with the learning rate as I experimented with extremely small values (1e-10, it just took a lot longer to get to the point where all the outputs become the same which from my observation is between 22 and 16, in terms of loss)
FYI: the CTCLayer is the one from the code example from the keras website
class CTCLayer(layers.Layer):
def __init__(self, name=None):
self.loss_fn = keras.backend.ctc_batch_cost
def call(self, y_true, y_pred):
batch_len = tf.cast(tf.shape(y_true)[0], dtype=int64)
input_length = tf.cast(tf.shape(y_pred)[1], dtype=int64)
label_length = tf.cast(tf.shape(y_true)[1], dtype=int64)
input_length = input_length * tf.ones(shape=(batch_len, 1), dtype=int64)
label_length = label_length * tf.ones(shape=(batch_len, 1), dtype=int64)
loss = self.loss_fn(y_true, y_pred, input_length, label_length)
return y_pred
Topic ocr lstm keras tensorflow neural-network
Category Data Science