huggingface

Fine Tuning BERT for text summarization

stu_dent

2022年6月3日 18:26

I was trying to follow this notebook to fine-tune BERT for the text summarization task. Everything was good till I come to this instruction in section Evaluation to evaluate my model: model = EncoderDecoderModel.from_pretrained("checkpoint-500") An error appears: OSError: checkpoint-500 is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models' If this is a private repository, make sure to pass a token having permission to this repo with use_auth_token or log in with huggingface-cli login and …

Topic: huggingface bert finetuning

Category: Data Science

Large jumps in loss in simple transformer model?

msailor

2022年5月27日 21:40

As an exercise, I created a very simple transformer model that just sees the same simple batch of dummy data repeatedly and (one would assume) should quickly learn to fit it perfectly. And indeed, training reaches a loss of zero quickly. However I noticed that the loss does not stay at zero, or even close to it: there are occasional large jumps in the loss. The script below counts every time that the loss jumps by 10 or more between …

Topic: cross-entropy huggingface transformer loss-function deep-learning

Category: Data Science

Could Attention_mask in T5 be a float in [0,1]?

Dave

2022年5月26日 07:35

I was inspecting T5 model from hf https://huggingface.co/docs/transformers/model_doc/t5 . attention_mask is presented as attention_mask (torch.FloatTensor of shape (batch_size, sequence_length), optional) — Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]: 1 for tokens that are not masked, 0 for tokens that are masked. I was wondering whether it could be used something "softer" not only selecting the not-padding token but also selecting "how much" attention should be used on every token. This question is …

Topic: huggingface transformer attention-mechanism deep-learning nlp

Category: Data Science

HuggingFace Transformers is giving loss: nan - accuracy: 0.0000e+00

JasonExcel

2022年5月14日 00:02

I am a HuggingFace Newbie and I am fine-tuning a BERT model (distilbert-base-cased) using the Transformers library but the training loss is not going down, instead I am getting loss: nan - accuracy: 0.0000e+00. My code is largely per the boiler plate on the [HuggingFace course][1]:- model = TFAutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=3) opt = Adam(learning_rate=lr_scheduler) model.compile(optimizer=opt, loss=loss, metrics=['accuracy']) model.fit( encoded_train.data, np.array(y_train), validation_data=(encoded_val.data, np.array(y_val)), batch_size=8, epochs=3 ) Where my loss function is:- loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True) The learning rate is calculated like so:- lr_scheduler …

Topic: loss huggingface bert nlp

Category: Data Science

Unable to debug where torch Adam optimiser is going wrong

Maha

2022年4月24日 14:32

I was implementing a training loop in vscode. I have created a Adam optimizer using XLM-Roberta model as follows: xlm_r_model = XLMRobertaForSequenceClassification.from_pretrained("xlm-roberta-base", num_labels = NUM_LABELS, output_attentions=False, output_hidden_states=False ) xlm_r_model.to(device) optimizer = torch.optim.Adam(xlm_r_model.parameters(), lr=LR) Then at following line: optimizer.step() vscode simply terminates the execution, without any error stack trace. So I debugged to get to know exactly where this is happening. I reached this line, which makes F.adam(...) call: Weirdly, on github, torch.optim.adam does not have this line. It seems that …

Topic: huggingface bert pytorch python

Category: Data Science

Conversational model returns empty string after a while

lte__

2022年4月16日 11:00

I've been experinmenting with Huggingface models and I've set up a chatbot with DialoGPT. It works pretty well, but after a while it stops answering and just returns empty strings. Before this it will start to give shorter and shorter answers. Any idea what can cause such a behavior? I'm using the medium-sized model with a max_length of 2000 and added a repetition_penalty=1.3, but other than that I didn't change any other parameters. I also add the previous message back …

Topic: huggingface openai-gpt transformer

Category: Data Science

How to train a Task Specific Knowledge Distillation model using Hugging face model

MAC

2022年4月13日 14:23

I was referring to this code: https://github.com/philschmid/knowledge-distillation-transformers-pytorch-sagemaker/blob/master/knowledge-distillation.ipynb From @philschmid I could follow most of the code, but had few doubts. Please help me to clarify these doubts. In this code below: class DistillationTrainer(Trainer): def __init__(self, *args, teacher_model=None, **kwargs): super().__init__(*args, **kwargs) self.teacher = teacher_model # place teacher on same device as student self._move_model_to_device(self.teacher,self.model.device) self.teacher.eval() When I take fine-tuned teacher model it is never fine-tuned in the process of Task Specific Distillation training, as in line self.teacher.eval() mentioned in the code.? Only …

Topic: huggingface pytorch python-3.x deep-learning

Category: Data Science

How to save hugging face fine tuned model using pytorch and distributed training

MAC

2022年4月12日 03:25

I am fine tuning masked language model from XLM Roberta large on google machine specs. When I copy the model using gsutil and subprocess from container to GCP bucket it gives me error. Versions Versions torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0+cu113 transformers==4.17.0 I am using pre-trained Hugging face model. I launch it as train.py file which I copy inside docker image and use vertex-ai ( GCP) to launch it using Containerspec machineSpec = MachineSpec(machine_type="a2-highgpu-4g",accelerator_count=4,accelerator_type="NVIDIA_TESLA_A100") python -m torch.distributed.launch --nproc_per_node 4 train.py --bf16 I am …

Topic: huggingface google-cloud pytorch python-3.x distributed

Category: Data Science

Should weight distribution change more when fine-tuning transformers-based classifier?

Marcin Zablocki

2022年4月10日 10:02

I'm using pre-trained DistilBERT model from Huggingface with custom classification head, which is almost the same as in the reference implementation: class PretrainedTransformer(nn.Module): def __init__( self, target_classes): super().__init__() base_model_output_shape=768 self.base_model = DistilBertModel.from_pretrained("distilbert-base-uncased") self.classifier = nn.Sequential( nn.Linear(base_model_output_shape, out_features=base_model_output_shape), nn.ReLU(), nn.Dropout(0.2), nn.Linear(base_model_output_shape, out_features=target_classes), ) for layer in self.classifier: if isinstance(layer, nn.Linear): layer.weight.data.normal_(mean=0.0, std=0.02) if layer.bias is not None: layer.bias.data.zero_() def forward(self, input_, y=None): X, length, attention_mask = input_ base_output = self.base_model(X, attention_mask=attention_mask)[0] base_model_last_layer = base_output[:, 0] cls = self.classifier(base_model_last_layer) return cls During …

Topic: huggingface transformer weight-initialization pytorch historgram

Category: Data Science

Hugging face Model Output 'last_hidden_state'

Fhunmie

2022年4月4日 11:29

I am using the Huggingface BERTModel, The model gives Seq2SeqModelOutput as output. The output contains the past hidden states and the last hidden state. These are my questions What is the use of the hidden states? How do I pass my hidden states to my output layer? What I actually want is the output tokens, from the model how do I get the prediction tokens?

Topic: bart huggingface bert pytorch

Category: Data Science

How to use label smoothing for single label classification in hugging face models

MAC

2022年4月4日 05:52

I am training a binary class classification model using Roberta-xlm large model. I am using training data with hard labels as either 1 or 0. Is it advisable to perform label smoothing on this training procedure for hard labels? If so then what would be right way to do. Here is my code: tokenizer = tr.XLMRobertaTokenizer.from_pretrained("/home/scp/AIML/tokenizer_xlm2") train_encodings = tokenizer(train_texts, truncation=True, padding=True, max_length=512, return_tensors="pt") val_encodings = tokenizer(val_texts, truncation=True, padding=True, max_length=512, return_tensors="pt") test_encodings = tokenizer(test_texts, truncation=True, padding=True, max_length=512, return_tensors="pt") class SEDataset(torch.utils.data.Dataset): def …

Topic: huggingface pytorch python-3.x nlp

Category: Data Science

Overfitting in Huggingface's TFBertForSequenceClassification

Shahad Mahmud

2022年3月26日 10:06

I'm using Huggingface's TFBertForSequenceClassification for multilabel tweets classification. During training the model archives good accuracy, but the validation accuracy is poor. I've tried to solve the overfitting using some dropout but the performance is still poor. The model is as follows: # Get and configure the BERT model config = BertConfig.from_pretrained("bert-base-uncased", hidden_dropout_prob=0.5, num_labels=13) bert_model = TFBertForSequenceClassification.from_pretrained("bert-base-uncased", config=config) optimizer = tf.keras.optimizers.Adam(learning_rate=3e-5, epsilon=0.00015, clipnorm=0.01) loss = tf.keras.losses.CategoricalCrossentropy(from_logits=True) metric = tf.keras.metrics.CategoricalAccuracy('accuracy') bert_model.compile(optimizer=optimizer, loss=loss, metrics=[metric]) bert_model.summary() The summary is as follows: When I fit …

Topic: huggingface bert overfitting

Category: Data Science

How to use is_split_into_words with Huggingface NER pipeline

Alan Buxton

2022年3月26日 04:05

I am using Huggingface transformers for NER, following this excellent guide: https://huggingface.co/blog/how-to-train. My incoming text has already been split into words. When tokenizing during training/fine-tuning I can use tokenizer(text,is_split_into_words=True) to tokenize the incoming text. However, I can't figure out how to do the same in a pipeline for predictions. For example, the following works (but requires incoming text to be a string): s1 = "Here is a sentence" p1 = pipeline("ner",model=model,tokenizer=tokenizer) p1(s1) But the following raises the following error: Exception: …

Topic: huggingface transformer named-entity-recognition

Category: Data Science

How to do NER predictions with Huggingface BERT transformer

Khachatur Mirijanyan

2022年3月10日 19:03

I am trying to do a prediction on a test data set without any labels for an NER problem. Here is some background. I am doing named entity recognition using tensorflow and Keras. I am using huggingface transformers. I have two datasets. A train dataset and a test dataset. The training set has labels, the tests does not. Below you will see what a tokenized sentence looks like, what it's labels look like, and what it looks like after encoding …

Topic: huggingface transformer tensorflow named-entity-recognition machine-learning

Category: Data Science

Transformer similarity fine-tuned way too often predicts pairs as similar

Simone

2022年2月18日 06:48

I fine-tuned a transformer for classification to compute similarity between names. This is a toy example for the training data: name0 name1 label Test Test y Test Hi n I fined-tuned the transformer using the label and feeding it with pairs of names as its tokenizer allows to feed 2 pieces of text. I found a really weird behavior. At prediction times, there exist pairs that have very high chances to be predicted as similar just because they have repeated …

Topic: huggingface transformer finetuning classification similarity

Category: Data Science

Finetune XLM-RoBERTa on Tensorflow

Komal Rathod

2022年2月17日 06:09

I want to finetune pre-trained XLM-RoBERTa from HuggingFace for Text classification. I have categorical data in English. I want to finetune model on Tensorflow-keras. Can anyone let me know how can I tokenize the data and finetune model?

Topic: huggingface transformer transfer-learning tensorflow nlp

Category: Data Science

How to prepare texts to BERT/RoBERTa models?

IsaacLevon

2022年2月15日 14:10

I have an artificial corpus I've built (not a real language) where each document is composed of multiple sentences which again aren't really natural language sentences. I want to train a language model out of this corpus (to use it later for downstream tasks like classification or clustering with sentence BERT) How to tokenize the documents? Do I need to tokenize the input like this: <s>sentence1</s><s>sentence2</s> or <s>the whole document</s> How to train? Do I need to train an MLM …

Topic: huggingface bert transformer deep-learning nlp

Category: Data Science

Finetune XLM-RoBERTa on TF-keras for text classification

Komal Rathod

2022年2月15日 05:26

I am trying to finetune pre-trained XLM-RoBERTa on Tensorflow-keras. I am using dataset in English for text classification. I have used xlm-roberta-base tokenizer to tokenize the sentences. I am using roberta-base model from TFRobertaForSequenceClassification. Please find the code below. optimizer=tf.keras.optimizers.SGD(learning_rate=5e-2) model.compile(optimizer = optimizer, loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics = [tf.keras.metrics.SparseCategoricalAccuracy()]) model.fit(train_tf_dataset, validation_data=eval_tf_dataset, epochs=1, verbose=1) I am getting below error while training the model.Can anyone help me to solve this error? InvalidArgumentError: indices[2,268] = 124030 is not in [0, 50265) [[node tf_roberta_for_sequence_classification_1/roberta/embeddings/Gather …

Topic: huggingface transformer transfer-learning keras nlp

Category: Data Science

Adding a new token to a transformer model without breaking tokenization of subwords

Jigsaw

2022年2月13日 21:05

I'm running an experiment investigating the internal structure of large pre-trained models (BERT and RoBERTa, to be specific). Part of this experiment involves fine-tuning the models on a made-up new word in a specific sentential context and observing its predictions for that novel word in other contexts post-tuning. Because I am just trying to teach it a new word, we freeze the embeddings for the other words during fine-tuning so that only the weights for the new word are updated. …

Topic: huggingface tokenization

Category: Data Science

How to i get word embeddings for out of vocabulary words using a transformer model?

cerofrais

2022年2月7日 19:04

When i tried to get word embeddings of a sentence using bio_clinical bert, for a sentence of 8 words i am getting 11 token ids(+start and end) because "embeddings" is an out of vocabulary word/token, that is being split into em,bed,ding,s. I would like to know if there is any aggregation strategies available that make sense apart from doing a mean of these vectors. from transformers import AutoTokenizer, AutoModel # download and load model tokenizer = AutoTokenizer.from_pretrained("emilyalsentzer/Bio_ClinicalBERT") model = AutoModel.from_pretrained("emilyalsentzer/Bio_ClinicalBERT") …

Topic: huggingface transformer tokenization stanford-nlp nlp

Category: Data Science

About