Finetune XLM-RoBERTa on TF-keras for text classification

Question

Finetune XLM-RoBERTa on TF-keras for text classification

Komal Rathod

2022年2月15日 05:26

I am trying to finetune pre-trained XLM-RoBERTa on Tensorflow-keras. I am using dataset in English for text classification. I have used xlm-roberta-base tokenizer to tokenize the sentences. I am using roberta-base model from TFRobertaForSequenceClassification. Please find the code below.

optimizer=tf.keras.optimizers.SGD(learning_rate=5e-2)
model.compile(optimizer = optimizer, loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics = [tf.keras.metrics.SparseCategoricalAccuracy()])
model.fit(train_tf_dataset,  validation_data=eval_tf_dataset, epochs=1, verbose=1)

I am getting below error while training the model.Can anyone help me to solve this error?

InvalidArgumentError: indices[2,268] = 124030 is not in [0, 50265) [[node tf_roberta_for_sequence_classification_1/roberta/embeddings/Gather (defined at /usr/local/lib/python3.7/dist-packages/transformers/models/roberta/modeling_tf_roberta.py:149) ]] [Op:__inference_train_function_82886]

Errors may have originated from an input operation. Input Source operations connected to node tf_roberta_for_sequence_classification_1/roberta/embeddings/Gather: In[0] tf_roberta_for_sequence_classification_1/roberta/embeddings/Gather/resource: In[1] IteratorGetNext (defined at /usr/local/lib/python3.7/dist-packages/keras/engine/training.py:866)

Topic huggingface transformer transfer-learning keras nlp

Category Data Science

Finetune XLM-RoBERTa on TF-keras for text classification

About