hope you're all doing good !
I am working on Automatic Speech Recognition with Python with the LibriSpeech Dataset.
After preprocessing the audios data and applying an MFCC featurizing I append everything into a list and get a shape of (14174,)
. Knowing that each sample has a different length but the same number of features for example :
(615, 13)
(301, 13)
Now when I feed the data into my network with an Input layer defined as
input_data = Input(name='the_input', shape=(None, input_dim)) # with input_dim = 13 MFCC features
I get the following error
ValueError: Error when checking input: expected the_input to have 3 dimensions, but got array with shape (14174, 1)
I tried reshaping with different shapes but I am still struggling.
This is the model
def final_model(input_dim, units, output_dim=29):
Build a bidirectional recurrent network for speech
# Main acoustic input
input_data = Input(name='the_input', shape=(None, input_dim))
# =============== 1st Layer =============== #
# Add bidirectional recurrent layer
bidirectional_rnn = Bidirectional(GRU(units, activation=None,return_sequences=True, implementation=2, name='bidir_rnn'))(input_data)
# Add batch normalization
batch_normalization = BatchNormalization(name = batch_normalization_bidirectional_rnn)(bidirectional_rnn)
# Add activation function
activation = Activation('relu')(batch_normalization)
# Add dropout
#drop = Dropout(rate = 0.1)(activation)
# =============== 2nd Layer =============== #
# Add bidirectional recurrent layer
bidirectional_rnn = Bidirectional(GRU(units, activation=None,return_sequences=True, implementation=2, name='bidir_rnn'))(activation)
# Add batch normalization
batch_normalization = BatchNormalization(name = bn_bidir_rnn_2)(bidirectional_rnn)
# Add activation function
activation = Activation('relu')(batch_normalization)
# Add dropout
#drop = Dropout(rate = 0.1)(activation)
# =============== 3rd Layer =============== #
# Add a TimeDistributed(Dense(output_dim)) layer
time_dense = TimeDistributed(Dense(output_dim))(activation)
# Add softmax activation layer
y_pred = Activation('softmax', name='softmax')(time_dense)
# Specify the model
model = Model(inputs=input_data, outputs=y_pred)
model.output_length = lambda x: x
return model
