Fine Tuning BERT for text summarization

I was trying to follow this notebook to fine-tune BERT for the text summarization task. Everything was good till I come to this instruction in section Evaluation to evaluate my model: model = EncoderDecoderModel.from_pretrained("checkpoint-500") An error appears: OSError: checkpoint-500 is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models' If this is a private repository, make sure to pass a token having permission to this repo with use_auth_token or log in with huggingface-cli login and …
Category: Data Science

Is it possible to add new vocabulary to BERT's tokenizer when fine-tuning?

I want to fine-tune BERT by training it on a domain dataset of my own. The domain is specific and includes many terms that probably weren't included in the original dataset BERT was trained on. I know I have to use BERT's tokenizer as the model was originally trained on its embeddings. To my understanding words unknown to the tokenizer will be masked with [UNKNOWN]. What if some of these words are common in my dataset? Does it make sense …
Category: Data Science

Why not using linear regression for finetuning the last layer of a neural network?

In transfer learning, often only the last layer of the network is retrained using gradient descent. However, the last layer of a common neural network performs only a linear transformation, so why do we use gradient descent and not linear (or logistic) regression to finetune the last layer?
Category: Data Science

Is it possible to "fine-tune" a pre-trained logistic regression model?

Fine tuning is a concept commonly used in deep learning. We may have a pre-trained model and then fine-tune it to our specific task. Does that apply to simple models, such as logistic regression? For example, let's say I have a dataset with attribute variables of an animal and I want to classify whether or not it is a mammal or not. The labels on that dataset are only "mammal"/"not mammal". I then train a logistic regression model for this …
Category: Data Science

ValueError: Mixed precision training with AMP or APEX (`--fp16` or `--bf16`) and half precision evaluation (`--fp16) can only be used on CUDA devices

i’m fine tuning the wav2vec-xlsr model. i’ve created a virtual env for that and i’ve installed cuda 11.0 and tensorflow-gpu==2.5.0 but it gives the following error : ValueError: Mixed precision training with AMP or APEX (--fp16 or --bf16) and half precision evaluation (--fp16_full_eval or --bf16_full_eval) can only be used on CUDA devices. i want to fine tune the model on GPU ANY HELP ?
Category: Data Science

Pretrained vs. finetuned model

I have a doubt regarding terminology. When dealing with huggingface transformer models, I often read about "using pretrained models for classification" vs. "fine-tuning a pretrained model for classification." I fail to understand what the exact difference between these two is. As I understand, pretrained models by themselves cannot be used for classification, regression, or any relevant task, without attaching at least one more dense layer and one more output layer, and then training the model. In this case, we would …
Category: Data Science

Tuning a classifier for high precision, with no regard for recall

I understand this falls under the decision making aspect, rather than the probabilistic, but for the purposes of some work I am doing, I need the classifier to have very high precision, as I can't afford a false positive. I do not care about false negatives, and consequently, do not care about recall. Since it is currently a binary classifier, some might say to play with the decision probability threshold from its current 0.5 value, but I will eventually need …
Category: Data Science

Incompatible shapes (None, 1) and (None, 5) with Keras VGGFace Finetuning

Categories to learn and predict: df.race.unique() array(['0', '1', '3', '2', '4'], dtype=object) Data: train_generator = image_gen.flow_from_dataframe( df_train, x_col="img_name", y_col="race", directory=str(data_folder), class_mode="sparse", target_size=(IMAGE_SIZE, IMAGE_SIZE), batch_size=BATCH_SIZE, shuffle=True, ) val_generator = image_gen.flow_from_dataframe( df_val, x_col="img_name", y_col="race", directory=str(data_folder), class_mode="sparse", target_size=(IMAGE_SIZE, IMAGE_SIZE), batch_size=BATCH_SIZE, shuffle=False, ) Model load and fit: vggface_model = load_model("resnet50face.h5") base_model = tf.keras.Model([vggface_model.input], vggface_model.get_layer("flatten_1").output) base_model.trainable = False last_layer = base_model.get_layer('avg_pool').output hidden_layer = Flatten(name='flatten')(last_layer) out_layer = Dense(5, activation='softmax', name='gender_classifier')(hidden_layer) custom_base_model = tf.keras.Model(base_model.input, out_layer) custom_base_model.compile( optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001), loss="categorical_crossentropy", metrics=['accuracy']) history = custom_base_model.fit( x=train_generator, validation_data=val_generator, steps_per_epoch=20, epochs=40) Error …
Category: Data Science

Tuning model by my metric

My project is using a metric to evaluate the performance of regression model, it is not belong to basic metric in Machine learning (MSE,MAE,...). So, how can I tuning model base on my metric ?
Topic: finetuning
Category: Data Science

Are most deep learning models online learning models?

I'm online learning starter. from my perspective, online learning model is the model which can update its paramater with data flows(I've seen a article pointing out that incremental model is irrevalent of time while online learning emphasizes the data flows in time-series). Here I regard them as one thing. And in my view, most deep learning can be fine tuned,as we fine-tune a pre-trained bert model, is that means a deep learning model can be fine tuned is equivalent to …
Category: Data Science

Transformer similarity fine-tuned way too often predicts pairs as similar

I fine-tuned a transformer for classification to compute similarity between names. This is a toy example for the training data: name0 name1 label Test Test y Test Hi n I fined-tuned the transformer using the label and feeding it with pairs of names as its tokenizer allows to feed 2 pieces of text. I found a really weird behavior. At prediction times, there exist pairs that have very high chances to be predicted as similar just because they have repeated …
Category: Data Science

How to improve a CNN without changing the architecture?

I'm currently using an autoencoder CNN that's built upon the VGG-16 architecture that was designed by someone else. I want to replicate their results using their dataset first but I'm finding that: -Validation losses diverge from training losses fairly early on (I get to around 10 epochs and it already looks like it's overfitting) -At its best, the validation losses aren't even close to being as low as training losses -In general, the accuracy is still worse than reported in …
Category: Data Science

Value accuracy remains the same

I have used my own build model and also fine-tuned other two model ResNeT50 and VGG16, but val_acc remains the same for them all. import tensorflow as tf model_1 = Sequential() model_1.add(Conv2D(32, kernel_size=(3,3), padding='same', activation='relu', input_shape=(224,224,3))) model_1.add(MaxPooling2D(2,2)) model_1.add(Dropout(0.3)) model_1.add(Conv2D(64, kernel_size=(3,3), padding='same', activation='relu')) model_1.add(Flatten()) model_1.add(Dropout(0.3)) model_1.add(BatchNormalization()) model_1.add(Dense(1, activation='sigmoid')) model_1.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) history_1 = model_1.fit(train_gen, epochs=3, batch_size=32, validation_data=(val_gen)) Results: CPU Frequency: 2199995000 Hz Epoch 1/3 13/13 [==============================] - 81s 6s/step - loss: 2.3583 - accuracy: 0.3001 - val_loss: 0.3717 - val_accuracy: 0.1900 …
Category: Data Science

Fine tune the RetinaNet model in PyTorch

I would like to fine the pre-trained RetinaNet model available in torchvision in order to create my own object detection. I'm trying to replicate what is done for the FastRCNN at this link: https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html#finetuning-from-a-pretrained-model What I have done is the following: model = model = torchvision.models.detection.retinanet_resnet50_fpn(pretrained=True) num_classes = 2 # get number of input features and anchor boxed for the classifier in_features = model.head.classification_head.conv[0].in_channels num_anchors = model.head.classification_head.num_anchors # replace the pre-trained head with a new one model.head = RetinaNetHead(in_features, num_anchors, …
Category: Data Science

How to freeze certain layers in models obtained from keras.applications

I am currrently trainning to use transfer learning on ResNet152 obtained from Keras Applications: tf.keras.applications.ResNet152( weights="imagenet", input_shape=(400,250,3) ) I know to freeze all the layers I need to set the trainable attribute to False, but right now I need to freeze certain layers. More specifically, I need to unfreeze the last three layers of this model but freeze the rest. So how do I do that?
Category: Data Science

How to fine-tune GPT-J with small dataset

Firstly, thank you so much for looking at this post. I could really use some help. I have followed this guide as closely as possible: https://github.com/kingoflolz/mesh-transformer-jax I'm trying to fine-tune GPT-J with a small dataset of ~500 lines: You are important to me. <|endoftext|> I love spending time with you. <|endoftext|> You make me smile. <|endoftext|> feel so lucky to be your friend. <|endoftext|> You can always talk to me, even if it’s about something that makes you nervous or …
Category: Data Science

How can I build my voice speech-to-text model?

I found an instruction to build such kind of custom model on Azure. Prepare data for Custom Speech However, I would like to either fine-tune or transfer learning on Google Colaboratory or docker. In that case, what machine learning framework do you recommend using? If you know some Github repo or articles for this challenge, could you share them with me?
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.