Self-attention mechanism did not improve the LSTM classification model
I am doing an 8-class classification using time series data.
It appears that the implementation of the self-attention mechanism has no effect on the model so I think my implementations have some problem. However, I don't know how to use the keras_self_attention
module and how the parameters should be set.
The question is how to utilize keras_self_attention
module for such a classifier.
The first confusion matrix is 2 layers of LSTM.
lstm_unit = 256
model = tf.keras.models.Sequential()
model.add(Masking(mask_value=0.0, input_shape=(X_train.shape[1], X_train.shape[2])))
model.add(Bidirectional(LSTM(lstm_unit, dropout=dropout,return_sequences=True)))
model.add(Bidirectional(LSTM(lstm_unit, dropout=dropout,return_sequences=True)))
model.add(keras.layers.Flatten())
model.add(Dense(num_classes, activation='softmax'))
The second confusion matrix is 2 lSTM + 2 self-attention.
lstm_unit = 256
model = tf.keras.models.Sequential()
model.add(Masking(mask_value=0.0, input_shape=(X_train.shape[1], X_train.shape[2])))
model.add(Bidirectional(LSTM(lstm_unit, dropout=dropout,return_sequences=True)))
model.add(Bidirectional(LSTM(lstm_unit, dropout=dropout,return_sequences=True)))
model.add(SeqSelfAttention(attention_type=SeqSelfAttention.ATTENTION_TYPE_MUL,
attention_activation='sigmoid'))
model.add(keras.layers.Flatten())
model.add(Dense(num_classes, activation='softmax'))
I have further tried different functions from the module, such as
1.MultiHead
model.add(MultiHead(Bidirectional(LSTM(units=32)), layer_num=10, name='Multi-LSTMs'))
- Residual connection
inputs = Input(shape=(X_train.shape[1],X_train.shape[2]))
x = Masking(mask_value=0.0)(inputs)
x2 = SeqSelfAttention(attention_type=SeqSelfAttention.ATTENTION_TYPE_MUL,
attention_activation='sigmoid')(x)
x = x + x2
x = Bidirectional(LSTM(lstm_unit, dropout=dropout,return_sequences=True))(x)
x = Flatten()(x)
output = Dense(num_classes, activation='softmax')(x)
model = Model(inputs=inputs, outputs=output)
But they are more or less the same, no much effect on the MAR MAP and ACC.
I have 2 Titan Xp so computation power is less problem for me, is there a way to make the model more accurate?
Topic deep-learning machine-learning
Category Data Science