HuggingFace Transformers is giving loss: nan - accuracy: 0.0000e+00

Question

HuggingFace Transformers is giving loss: nan - accuracy: 0.0000e+00

JasonExcel

2022年5月14日 00:02

I am a HuggingFace Newbie and I am fine-tuning a BERT model (distilbert-base-cased) using the Transformers library but the training loss is not going down, instead I am getting loss: nan - accuracy: 0.0000e+00.

My code is largely per the boiler plate on the [HuggingFace course][1]:-

model = TFAutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=3)

opt = Adam(learning_rate=lr_scheduler)

model.compile(optimizer=opt, loss=loss, metrics=['accuracy'])

model.fit(
    encoded_train.data,
    np.array(y_train),
    validation_data=(encoded_val.data, np.array(y_val)),
    batch_size=8,
    epochs=3
)

Where my loss function is:-

loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

The learning rate is calculated like so:-

lr_scheduler = PolynomialDecay(
    initial_learning_rate=5e-5,
    end_learning_rate=0.,
    decay_steps=num_train_steps
    )

The number of training steps is computed thus:-

batch_size = 8
num_epochs = 3

num_train_steps = (len(encoded_train['input_ids']) // batch_size) * num_epochs

So far then all very much like the boiler plate code.

My data looks like this:-

{'input_ids': tf.Tensor: shape=(1040, 512), dtype=int32, numpy=
array([[  101,   155,  1942, ...,     0,     0,     0],
       [  101, 27900,  7641, ...,     0,     0,     0],
       [  101,   155,  1942, ...,     0,     0,     0],
       ...,
       [  101,   109,  7414, ...,     0,     0,     0],
       [  101,  2809,  1141, ...,     0,     0,     0],
       [  101,  1448,  1111, ...,     0,     0,     0]], dtype=int32), 'attention_mask': tf.Tensor: shape=(1040, 512), dtype=int32, numpy=
array([[1, 1, 1, ..., 0, 0, 0],
       [1, 1, 1, ..., 0, 0, 0],
       [1, 1, 1, ..., 0, 0, 0],
       ...,
       [1, 1, 1, ..., 0, 0, 0],
       [1, 1, 1, ..., 0, 0, 0],
       [1, 1, 1, ..., 0, 0, 0]], dtype=int32)}

And like this:-

10     2
147    1
342    1
999    3
811    3
Name: sentiment, dtype: int64

I have studied the forums and made the most obvious checks:-

Here I check if there are any NaN in the data:-

print(Any NaN in y_train? ,np.isnan(np.array(y_train)).any())

print(Any NaN in y_val? ,np.isnan(np.array(y_val)).any())

Which gives:-

Any NaN in y_train?  False
Any NaN in y_val?  False

I also tried:-

print(Any NaN in encoded_train['input_ids']? ,np.isnan(np.array(encoded_train['input_ids'])).any())
print(Any NaN 'encoded_train['attention_mask']'? ,np.isnan(np.array(encoded_train['attention_mask'])).any())

but only got:-

Any NaN in encoded_train['input_ids']?  False
Any NaN 'encoded_train['attention_mask']'?  False

I am struggling to know where to go next with this code.

The full error trace looks like this, you can see the accuracy and loss on each epoch and this model is obviously not training at all:-

Some layers from the model checkpoint at distilbert-base-cased were not used when initializing TFDistilBertForSequenceClassification: ['vocab_layer_norm', 'vocab_projector', 'activation_13', 'vocab_transform']
- This IS expected if you are initializing TFDistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFDistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some layers of TFDistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-cased and are newly initialized: ['pre_classifier', 'classifier', 'dropout_59']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Epoch 1/3
WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
130/130 [==============================] - ETA: 0s - loss: nan - accuracy: 0.0019WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
130/130 [==============================] - 63s 452ms/step - loss: nan - accuracy: 0.0019 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 2/3
130/130 [==============================] - 57s 438ms/step - loss: nan - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 3/3
130/130 [==============================] - 57s 441ms/step - loss: nan - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
tensorflow.python.keras.callbacks.History at 0x7f304f714fd0

I would be happy to post more details if anyone is able to tell me what it would be useful to see.

Topic loss huggingface bert nlp

Category Data Science

SrJ · Accepted Answer · 2021年8月7日 07:41

It is about the warning that you have "The parameters output_attentions, output_hidden_states and use_cache cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: config=XConfig.from_pretrained('name', output_attentions=True))."

You might try the following code.

from transformers import BertConfig, BertModel
# Download model and configuration from huggingface.co and cache.
model = BertModel.from_pretrained('bert-base-uncased')
# Model was saved using `save_pretrained('./test/saved_model/')` (for example purposes, not runnable).
model = BertModel.from_pretrained('./test/saved_model/')
# Update configuration during loading.
model = BertModel.from_pretrained('bert-base-uncased', output_attentions=True)
assert model.config.output_attentions == True

HuggingFace Transformers is giving loss: nan - accuracy: 0.0000e+00

About