problem with using f1 score with a multi class and imbalanced dataset - (lstm , keras)

I'm trying to use f1 score because my dataset is imbalanced. I already tried this code but the problem is that val_f1_score is always equal to 1. I don't know if I did it correctly or not. my X_train data has a shape of (50000,30,10) and Y_train data has a shape of (50000,). I have 3 classes: 0, 1 and 2. this is my code so far:

maximum_epochs = 40
early_stop_epochs= 60
learning_rate_epochs = 30
maximum_time = 8*60*60 

model = Sequential()
model.add(LSTM(32,activation='tanh', input_shape=(X_train.shape[1],X_train.shape[2]), return_sequences=True))
model.add(LSTM(16,activation='tanh', return_sequences=False))
model.add(Dense(3, activation='softmax'))

def recall(y_true, y_pred):
    y_true = K.ones_like(y_true) 
    true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
    all_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
    
    recall = true_positives / (all_positives + K.epsilon())
    return recall

def precision(y_true, y_pred):
    y_true = K.ones_like(y_true) 
    true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
    
    predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
    precision = true_positives / (predicted_positives + K.epsilon())
    return precision

def f1_score(y_true, y_pred):
    p = precision(y_true, y_pred)
    r = recall(y_true, y_pred)
    return 2*((p*r)/(p+r+K.epsilon()))

model.compile(loss='sparse_categorical_crossentropy',optimizer='adam', metrics=['accuracy', f1_score, precision, recall])

callbacks_list = [                     
                  tf.keras.callbacks.ReduceLROnPlateau(monitor='val_f1_score', factor=0.9, 
                                    patience=learning_rate_epochs, 
                                    verbose=0, mode='max', min_lr=0.0000001),
                  tf.keras.callbacks.ModelCheckpoint(filepath=fn, save_weights_only=True,
                                  monitor='val_f1_score',mode='max', save_best_only=True)]

history = model.fit(x=X_train, y= Y_train,
                  validation_data=(X_val, Y_val),
                  batch_size=500,
                  epochs=maximum_epochs,
                  shuffle=True, verbose=2,
                  callbacks=callbacks_list)
pyplot.plot(history.history['f1_score'], label='train')
pyplot.plot(history.history['val_f1_score'], label='val')
pyplot.legend()
pyplot.show()

this is the log of first epochs:

Epoch 1/40
85/85 - 29s - loss: 0.7125 - accuracy: 0.8806 - f1_score: 0.9736 - precision: 1.0000 - recall: 0.9515 - val_loss: 0.5389 - val_accuracy: 0.8862 - val_f1_score: 1.0000 - val_precision: 1.0000 - val_recall: 1.0000

Epoch 2/40
85/85 - 8s - loss: 0.5590 - accuracy: 0.8900 - f1_score: 0.9903 - precision: 1.0000 - recall: 0.9808 - val_loss: 0.4930 - val_accuracy: 0.8862 - val_f1_score: 1.0000 - val_precision: 1.0000 - val_recall: 1.0000

UPDATE: thanks to @Erwan's answer I changed compilation as below:

import tensorflow_addons as tfa
from sklearn.preprocessing import OneHotEncoder

encoder = OneHotEncoder(sparse=False)
Y_train = encoder.fit_transform(Y_train.reshape(-1,1))
Y_val = encoder.fit_transform(Y_val.reshape(-1,1))

model.compile(loss='categorical_crossentropy',optimizer='adam', metrics=[tfa.metrics.F1Score(average='macro',num_classes=3)])

callbacks_list = [
                              tf.keras.callbacks.ReduceLROnPlateau(monitor='val_f1_score', factor=0.9, 
                                    patience=learning_rate_epochs, 
                                    verbose=0, mode='max', min_lr=0.0000001),
                  tf.keras.callbacks.ModelCheckpoint(filepath=fn, save_weights_only=True,
                                  monitor='val_f1_score',mode='max', save_best_only=True)]

here is the epochs log(I think it's going well and f1_score is increasing and loss is decreasing):

Epoch 1/15 85/85 - 27s - loss: 0.8422 - f1_score: 0.3337 - val_loss: 0.5830 - val_f1_score: 0.3145

Epoch 2/15 85/85 - 7s - loss: 0.6539 - f1_score: 0.3221 - val_loss: 0.5218 - val_f1_score: 0.3145

Topic f1score lstm keras

Category Data Science


The problem is simple: recall, precision and F1-score work only with binary classification. If you try with a example manually you will see that the definitions that you're using for precision and recall can only work with classes 0 and 1, they go wrong with class 2 (and this is normal).

When working with more than 2 classes you must use either micro f1-score (but this is the same as accuracy) or macro f1-score, which would be the standard option with imbalanced data. Macro F1-score is the average of the f1-score across all 3 classes, where the f1-score for one class is obtained by considering all the other classes as the negative class.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.