TensorFlow Speech Emotion Recognition Model gives same prediction for all inputs
Dataset used: RAVDESS (I've only used the audio only files)
Here's a sample after I've processed the data:
And the code for the label encoding:
#encode labels as ints
lb = LabelEncoder()
y_train = np_utils.to_categorical(lb.fit_transform(y_train))
y_test = np_utils.to_categorical(lb.fit_transform(y_test))
#Not sure if this is needed
x_train =np.expand_dims(x_train, axis=2)
x_test= np.expand_dims(x_test, axis=2)
Model:
model.add(Conv1D(16, 5,strides=2 ,padding='same', input_shape=(259,1)))
model.add(Conv1D(16, 5,padding='same', activation=relu))
model.add(Dropout(0.1))
model.add(MaxPooling1D(pool_size=(6)))
model.add(LSTM(1))
model.add(Flatten())
model.add(Dense(10, activation=relu))
model.add(Dense(10,activation=softmax))
model.summary()
opt = keras.optimizers.RMSprop(lr=0.00001, decay=1e-6)
model.compile(metrics=['accuracy'], optimizer=opt, loss='categorical_crossentropy')
history = model.fit(x_train, y_train, batch_size=1,epochs=15, validation_data=(x_train, y_train))
When I call model.predict() with the test data the model gives the same exact answer for all the inputs. I have 10 classes so the accuracy basically always plateaus at 0.1399 or somewhere similar.
Here's a sample of what my model.predict() returns:
I've played around with the model for quite a bit now, but I just can't get these results to change.
Any ideas what I could do?
Topic audio-recognition keras tensorflow neural-network python
Category Data Science