HuggingFace Transformers is giving loss: nan - accuracy: 0.0000e+00
I am a HuggingFace Newbie and I am fine-tuning a BERT model (distilbert-base-cased
) using the Transformers library but the training loss is not going down, instead I am getting loss: nan - accuracy: 0.0000e+00
.
My code is largely per the boiler plate on the [HuggingFace course][1]:-
model = TFAutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=3)
opt = Adam(learning_rate=lr_scheduler)
model.compile(optimizer=opt, loss=loss, metrics=['accuracy'])
model.fit(
encoded_train.data,
np.array(y_train),
validation_data=(encoded_val.data, np.array(y_val)),
batch_size=8,
epochs=3
)
Where my loss function is:-
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
The learning rate is calculated like so:-
lr_scheduler = PolynomialDecay(
initial_learning_rate=5e-5,
end_learning_rate=0.,
decay_steps=num_train_steps
)
The number of training steps is computed thus:-
batch_size = 8
num_epochs = 3
num_train_steps = (len(encoded_train['input_ids']) // batch_size) * num_epochs
So far then all very much like the boiler plate code.
My data looks like this:-
{'input_ids': tf.Tensor: shape=(1040, 512), dtype=int32, numpy=
array([[ 101, 155, 1942, ..., 0, 0, 0],
[ 101, 27900, 7641, ..., 0, 0, 0],
[ 101, 155, 1942, ..., 0, 0, 0],
...,
[ 101, 109, 7414, ..., 0, 0, 0],
[ 101, 2809, 1141, ..., 0, 0, 0],
[ 101, 1448, 1111, ..., 0, 0, 0]], dtype=int32), 'attention_mask': tf.Tensor: shape=(1040, 512), dtype=int32, numpy=
array([[1, 1, 1, ..., 0, 0, 0],
[1, 1, 1, ..., 0, 0, 0],
[1, 1, 1, ..., 0, 0, 0],
...,
[1, 1, 1, ..., 0, 0, 0],
[1, 1, 1, ..., 0, 0, 0],
[1, 1, 1, ..., 0, 0, 0]], dtype=int32)}
And like this:-
10 2
147 1
342 1
999 3
811 3
Name: sentiment, dtype: int64
I have studied the forums and made the most obvious checks:-
Here I check if there are any NaN in the data:-
print(Any NaN in y_train? ,np.isnan(np.array(y_train)).any())
print(Any NaN in y_val? ,np.isnan(np.array(y_val)).any())
Which gives:-
Any NaN in y_train? False
Any NaN in y_val? False
I also tried:-
print(Any NaN in encoded_train['input_ids']? ,np.isnan(np.array(encoded_train['input_ids'])).any())
print(Any NaN 'encoded_train['attention_mask']'? ,np.isnan(np.array(encoded_train['attention_mask'])).any())
but only got:-
Any NaN in encoded_train['input_ids']? False
Any NaN 'encoded_train['attention_mask']'? False
I am struggling to know where to go next with this code.
The full error trace looks like this, you can see the accuracy and loss on each epoch and this model is obviously not training at all:-
Some layers from the model checkpoint at distilbert-base-cased were not used when initializing TFDistilBertForSequenceClassification: ['vocab_layer_norm', 'vocab_projector', 'activation_13', 'vocab_transform']
- This IS expected if you are initializing TFDistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFDistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some layers of TFDistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-cased and are newly initialized: ['pre_classifier', 'classifier', 'dropout_59']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/3
WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
130/130 [==============================] - ETA: 0s - loss: nan - accuracy: 0.0019WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
130/130 [==============================] - 63s 452ms/step - loss: nan - accuracy: 0.0019 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 2/3
130/130 [==============================] - 57s 438ms/step - loss: nan - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 3/3
130/130 [==============================] - 57s 441ms/step - loss: nan - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
tensorflow.python.keras.callbacks.History at 0x7f304f714fd0
I would be happy to post more details if anyone is able to tell me what it would be useful to see.
Topic loss huggingface bert nlp
Category Data Science