Pretrain RoBERTa model with new data using PyTorch library

I've pretrained the RoBERTa model with new data using a 'simpletransformers' library: from simpletransformers.classification import ClassificationModel OUTPUT_DIR = 'roberta_output/' model = ClassificationModel('roberta', 'roberta-base',use_cuda=False, num_labels=22, args={'overwrite_output_dir':True, 'output_dir':OUTPUT_DIR}) model.train_model(train_df) result, model_outputs, wrong_predictions = model.eval_model(test_df) # model evaluation on test data where 'train_df' is a pandas dataframe that consists of many samples (=rows) with two columns: the 1st column is a text data - input; the 2nd column is a category (=label) - output. I need to create the same model and pretrain …
Category: Data Science

Is it possible to "fine-tune" a pre-trained logistic regression model?

Fine tuning is a concept commonly used in deep learning. We may have a pre-trained model and then fine-tune it to our specific task. Does that apply to simple models, such as logistic regression? For example, let's say I have a dataset with attribute variables of an animal and I want to classify whether or not it is a mammal or not. The labels on that dataset are only "mammal"/"not mammal". I then train a logistic regression model for this …
Category: Data Science

Pretrained vs. finetuned model

I have a doubt regarding terminology. When dealing with huggingface transformer models, I often read about "using pretrained models for classification" vs. "fine-tuning a pretrained model for classification." I fail to understand what the exact difference between these two is. As I understand, pretrained models by themselves cannot be used for classification, regression, or any relevant task, without attaching at least one more dense layer and one more output layer, and then training the model. In this case, we would …
Category: Data Science

Fine-tuning pre-trained Word2Vec model with Gensim 4.0

With Gensim < 4.0, we can retrain a word2vec model using the following code: model = Word2Vec.load_word2vec_format("GoogleNews-vectors-negative300.bin", binary=True) model.train(my_corpus, total_examples=len(my_corpus), epochs=model.epochs) However, what I understand is that Gensim 4.0 is no longer supporting Word2Vec.load_word2vec_format. Instead, I can only load the keyedVectors. How to fine-tune a pre-trained word2vec model (such as the model trained on GoogleNews) with my domain-specific corpus using Gensim 4.0?
Category: Data Science

Deploying multiple pre-trained model (tar.gz files) on Sagemaker in a single endpoint

We have followed the following steps: Trained 5 TensorFlow models in local machine using 5 different training sets. Saved those in .h5 format. Converted those into tar.gz (Model1.tar.gz,...Model5.tar.gz) and uploaded it in the S3 bucket. Successfully deployed a single model in an endpoint using the following code: from sagemaker.tensorflow import TensorFlowModel sagemaker_model = TensorFlowModel(model_data = tarS3Path + 'model{}.tar.gz'.format(1), role = role, framework_version='1.13', sagemaker_session = sagemaker_session) predictor = sagemaker_model.deploy(initial_instance_count=1, instance_type='ml.m4.xlarge') predictor.predict(data.values[:,0:]) The output was: {'predictions': [[153.55], [79.8196], [45.2843]]} Now the problem …
Category: Data Science

how to improve recall by retraining a model on its feedback

I am creating a supervised model using sensitive and scarce data. For the sake of discussion, I've simiplified the problem statement by assuming that I'm creating a model for identifying dogs. Let's say I am creating a model to identify dogs in pictures. I trained it with few positive and negative examples. I could not gather a lot of data because it is scarce. Therefore, the model accuracy is not good (say f-score = 0.64). I deployed this model in …
Category: Data Science

Logic behind pre-trained weights and transfer learning

I am not sure about the logic behind, how pre-trained weights actually make sense and translate into a new problem. To be more specific; for example in a object detection network, how would a model's weights that were trained, let's say, on the COCO dataset, with 80 categories, would translate into my new problem that only has 2 categories (classes). How does this make sense? What kind of meaningful features could even be transferred from the previously pre-trained model to …
Category: Data Science

Semantic segmentation with greyscale images

I'm trying to reproduce a research with greyscale images instead of colour images. I have found that there are pre-trained networks, like VGG16, with ImageNet. But that dataset has colour images, and I can't use it because I'm going to use greyscale images. Is there any pre-trained network with greyscale images? Failing that, I can also train a network with a greyscale image dataset but I can't find any.
Category: Data Science

Baseline model and transfer learning

I've tried to find any guidance on using transfer learning when building baseline models for ML projects (CNN in my case) but found no clues on good practices in the matter. My logic says that no baseline model should be pretrained first as it is complicating it without any known reason to do it (as yet it is not proven we need it). But it is not the first time my logic may be wrong in the case of DS. …
Category: Data Science

test data is not a good representation of train data

I have predefined train and test sets. On generating some statistics like value_counts and checking the unique values, I feel that there is a 'lot' of difference between the distributions of the variables. What should be done with this? Suppose if I want to delete a column from the train_set for any reason like near-zero variance, should I repeat the same for the test_set (even if there is no such problem in the test_set's frequency tables? I ran the following …
Category: Data Science

How long does it take to fine-tune XLNet?

XLNet takes a lot more time than BERT during pre-training. This results in XLNet performing better than BERT in over 20 NLP tasks. How long does XLNet take for fine-tuning (let's assume this is running on Google Colab)? (Let's assume a text summarization task with around 4000 examples)
Category: Data Science

Using mathematical derivatives of input data to augment training input data

I'm thinking of how to design a basic feedforward neural network that would be able to predict future datapoints given past datapoints. I'm very new to neural network design so I'm wondering if there's some sort of a best practice as far as getting as much data out of the input data as possible. Would it make sense to provide the neural network with mathematically computed derivatives of the input data or are feedforward networks capable of generating derivatives within …
Category: Data Science

One single-batch training on Huggingface Bert model "ruins" the model

For some reason, I need to do further (2nd-stage) pre-training on Huggingface Bert model, and I find my training outcome is very bad. After debugging for hours, surprisingly, I find even training one single batch after loading the base model, will cause the model to predict a very bad choice when I ask it to unmask some test sentences. I boil down my code to the minimal reproducible version here: import torch from transformers import AdamW, BertTokenizer from transformers import …
Category: Data Science

Working on an image classification project (microscopic images) , have some doubts

Currently, I am working on an image classification project. The data set contains very high resolution images taken via an electron microscope. Hence, I have few and limited instances. I have done EDA and made up a deep CNN to go about it. The results are not very satisfying. Even tweaking the model did not work. I got similar results in cross-validation as well. I also performed data augmentation, but I do not possess enough knowledge of it, can anyone …
Category: Data Science

What is the common practice for NLP or text mining for non-English?

A lot of natural language processing tools are pre-trained with corpus in English. What if ones need to analyze, say, Dutch text? The blogs I find online are mostly saying traslating text into English as pro-processing. Is this the common practice? If not, then what? Also, does how similar a language is to English have an impact on the model performance? For some also widely speaking languages (e.g French, Spanish), do people construct corpus in their own language and train …
Category: Data Science

Where to get models with weights instead of only weights? What's the purpose of .h5 files?

I have downloaded .h5 files from qubvel/resnet and qubvel/efficientnet. I was trying to use some models as a backbone for my model but I'm getting the following error: ValueError: No model found in the config file. As explained here this is because the .h5 file contains only weights, not a model. So those .h5 files are only weights. What's the purpose of having only weights without architecture? I was trying to do following code: resnet18_path_to_file = "models/resnet18.h5" resnet18 = tf.keras.models.load_model(resnet18_path_to_file) …
Category: Data Science

Would there be any reason to pretrain BERT on specific texts?

So the official BERT English model is trained on Wikipedia and BookCurpos (source). Now, for example, let's say I want to use BERT for Movies tag recommendation. Is there any reason for me to pretrain a new BERT model from scratch on movie-related dataset? Can my model become more accurate since I trained it on movie-related texts rather than general texts? Is there an example of such usage? To be clear, the question is on the importance of context (not …
Category: Data Science

How to access GPT-3, BERT or alike?

I am interested in accessing NLP models mentioned in scientific papers, to replicate some results and experiment. But I only see waiting lists https://openai.com/blog/openai-api/ and licenses granted in large commercial deals https://www.theverge.com/2020/9/22/21451283/microsoft-openai-gpt-3-exclusive-license-ai-language-research . How can a researcher not affiliated to a university or (large) tech company obtain access so to replicate experiments of scientific papers ? Which alternatives would you suggest to leverage on pre-trained data sets ?
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.