pretraining

Pretrain RoBERTa model with new data using PyTorch library

CapJS

2022年5月23日 11:26

I've pretrained the RoBERTa model with new data using a 'simpletransformers' library: from simpletransformers.classification import ClassificationModel OUTPUT_DIR = 'roberta_output/' model = ClassificationModel('roberta', 'roberta-base',use_cuda=False, num_labels=22, args={'overwrite_output_dir':True, 'output_dir':OUTPUT_DIR}) model.train_model(train_df) result, model_outputs, wrong_predictions = model.eval_model(test_df) # model evaluation on test data where 'train_df' is a pandas dataframe that consists of many samples (=rows) with two columns: the 1st column is a text data - input; the 2nd column is a category (=label) - output. I need to create the same model and pretrain …

Topic: pretraining pytorch nlp python

Category: Data Science

Is it possible to "fine-tune" a pre-trained logistic regression model?

eduardokapp

2022年5月17日 16:57

Fine tuning is a concept commonly used in deep learning. We may have a pre-trained model and then fine-tune it to our specific task. Does that apply to simple models, such as logistic regression? For example, let's say I have a dataset with attribute variables of an animal and I want to classify whether or not it is a mammal or not. The labels on that dataset are only "mammal"/"not mammal". I then train a logistic regression model for this …

Topic: pretraining finetuning logistic-regression scikit-learn

Category: Data Science

Pretrained vs. finetuned model

lazarea

2022年5月17日 07:15

I have a doubt regarding terminology. When dealing with huggingface transformer models, I often read about "using pretrained models for classification" vs. "fine-tuning a pretrained model for classification." I fail to understand what the exact difference between these two is. As I understand, pretrained models by themselves cannot be used for classification, regression, or any relevant task, without attaching at least one more dense layer and one more output layer, and then training the model. In this case, we would …

Topic: pretraining transformer finetuning transfer-learning

Category: Data Science

Fine-tuning pre-trained Word2Vec model with Gensim 4.0

NST

2022年4月7日 10:04

With Gensim < 4.0, we can retrain a word2vec model using the following code: model = Word2Vec.load_word2vec_format("GoogleNews-vectors-negative300.bin", binary=True) model.train(my_corpus, total_examples=len(my_corpus), epochs=model.epochs) However, what I understand is that Gensim 4.0 is no longer supporting Word2Vec.load_word2vec_format. Instead, I can only load the keyedVectors. How to fine-tune a pre-trained word2vec model (such as the model trained on GoogleNews) with my domain-specific corpus using Gensim 4.0?

Topic: pretraining transfer-learning gensim word2vec

Category: Data Science

Deploying multiple pre-trained model (tar.gz files) on Sagemaker in a single endpoint

Subh2608

2022年4月2日 13:13

We have followed the following steps: Trained 5 TensorFlow models in local machine using 5 different training sets. Saved those in .h5 format. Converted those into tar.gz (Model1.tar.gz,...Model5.tar.gz) and uploaded it in the S3 bucket. Successfully deployed a single model in an endpoint using the following code: from sagemaker.tensorflow import TensorFlowModel sagemaker_model = TensorFlowModel(model_data = tarS3Path + 'model{}.tar.gz'.format(1), role = role, framework_version='1.13', sagemaker_session = sagemaker_session) predictor = sagemaker_model.deploy(initial_instance_count=1, instance_type='ml.m4.xlarge') predictor.predict(data.values[:,0:]) The output was: {'predictions': [[153.55], [79.8196], [45.2843]]} Now the problem …

Topic: pretraining sagemaker machine-learning-model tensorflow aws

Category: Data Science

how to improve recall by retraining a model on its feedback

learnlifelong

2022年3月13日 17:54

I am creating a supervised model using sensitive and scarce data. For the sake of discussion, I've simiplified the problem statement by assuming that I'm creating a model for identifying dogs. Let's say I am creating a model to identify dogs in pictures. I trained it with few positive and negative examples. I could not gather a lot of data because it is scarce. Therefore, the model accuracy is not good (say f-score = 0.64). I deployed this model in …

Topic: pretraining machine-learning-model reinforcement-learning accuracy machine-learning

Category: Data Science

Logic behind pre-trained weights and transfer learning

Sahand

2022年2月24日 20:07

I am not sure about the logic behind, how pre-trained weights actually make sense and translate into a new problem. To be more specific; for example in a object detection network, how would a model's weights that were trained, let's say, on the COCO dataset, with 80 categories, would translate into my new problem that only has 2 categories (classes). How does this make sense? What kind of meaningful features could even be transferred from the previously pre-trained model to …

Topic: pretraining object-detection transfer-learning neural-network classification

Category: Data Science

Semantic segmentation with greyscale images

VansFannel

2022年2月24日 04:05

I'm trying to reproduce a research with greyscale images instead of colour images. I have found that there are pre-trained networks, like VGG16, with ImageNet. But that dataset has colour images, and I can't use it because I'm going to use greyscale images. Is there any pre-trained network with greyscale images? Failing that, I can also train a network with a greyscale image dataset but I can't find any.

Topic: pretraining vgg16 cnn dataset

Category: Data Science

Baseline model and transfer learning

industArk

2022年2月4日 12:55

I've tried to find any guidance on using transfer learning when building baseline models for ML projects (CNN in my case) but found no clues on good practices in the matter. My logic says that no baseline model should be pretrained first as it is complicating it without any known reason to do it (as yet it is not proven we need it). But it is not the first time my logic may be wrong in the case of DS. …

Topic: pretraining transfer-learning machine-learning-model data-augmentation neural-network

Category: Data Science

Can I leave natural outliers in a dataset in training?

Zexxxx

2021年12月31日 00:30

Can I leave unedited natural outliers in a dataset (outliers that have not appeared just because of mistyping of mistakes in the data)? Or should I also remove them or change them?

Topic: pretraining outlier statistics

Category: Data Science

test data is not a good representation of train data

letdatado

2021年11月4日 19:01

I have predefined train and test sets. On generating some statistics like value_counts and checking the unique values, I feel that there is a 'lot' of difference between the distributions of the variables. What should be done with this? Suppose if I want to delete a column from the train_set for any reason like near-zero variance, should I repeat the same for the test_set (even if there is no such problem in the test_set's frequency tables? I ran the following …

Topic: pretraining training preprocessing data-cleaning machine-learning

Category: Data Science

How long does it take to fine-tune XLNet?

Tony Jesuthasan

2021年10月8日 03:25

XLNet takes a lot more time than BERT during pre-training. This results in XLNet performing better than BERT in over 20 NLP tasks. How long does XLNet take for fine-tuning (let's assume this is running on Google Colab)? (Let's assume a text summarization task with around 4000 examples)

Topic: pretraining bert finetuning nlp

Category: Data Science

Using mathematical derivatives of input data to augment training input data

mhdnt

2021年9月23日 13:04

I'm thinking of how to design a basic feedforward neural network that would be able to predict future datapoints given past datapoints. I'm very new to neural network design so I'm wondering if there's some sort of a best practice as far as getting as much data out of the input data as possible. Would it make sense to provide the neural network with mathematically computed derivatives of the input data or are feedforward networks capable of generating derivatives within …

Topic: pretraining machine-learning-model training

Category: Data Science

One single-batch training on Huggingface Bert model "ruins" the model

Wei Zhong

2021年9月10日 17:06

For some reason, I need to do further (2nd-stage) pre-training on Huggingface Bert model, and I find my training outcome is very bad. After debugging for hours, surprisingly, I find even training one single batch after loading the base model, will cause the model to predict a very bad choice when I ask it to unmask some test sentences. I boil down my code to the minimal reproducible version here: import torch from transformers import AdamW, BertTokenizer from transformers import …

Topic: pretraining transformer deep-learning

Category: Data Science

Working on an image classification project (microscopic images) , have some doubts

Aditi

2021年6月1日 14:46

Currently, I am working on an image classification project. The data set contains very high resolution images taken via an electron microscope. Hence, I have few and limited instances. I have done EDA and made up a deep CNN to go about it. The results are not very satisfying. Even tweaking the model did not work. I got similar results in cross-validation as well. I also performed data augmentation, but I do not possess enough knowledge of it, can anyone …

Topic: pretraining cnn data-augmentation image-classification deep-learning

Category: Data Science

What the differences between self-supervised/semi-supervised in NLP?

Inhyeok Yoo

2021年5月27日 08:20

GPT-1 mentions both Semi-supervised learning and Unsupervised pre-training but it seems like the same to me. Moreoever, "Semi-supervised Sequence Learning" of Dai and Le also more like self-supervised learning. So what the key differences between them?

Topic: pretraining semi-supervised-learning nlp

Category: Data Science

What is the common practice for NLP or text mining for non-English?

Paw in Data

2021年4月14日 18:32

A lot of natural language processing tools are pre-trained with corpus in English. What if ones need to analyze, say, Dutch text? The blogs I find online are mostly saying traslating text into English as pro-processing. Is this the common practice? If not, then what? Also, does how similar a language is to English have an impact on the model performance? For some also widely speaking languages (e.g French, Spanish), do people construct corpus in their own language and train …

Topic: pretraining bert text-mining nlp

Category: Data Science

Where to get models with weights instead of only weights? What's the purpose of .h5 files?

karlosos

2021年2月15日 10:38

I have downloaded .h5 files from qubvel/resnet and qubvel/efficientnet. I was trying to use some models as a backbone for my model but I'm getting the following error: ValueError: No model found in the config file. As explained here this is because the .h5 file contains only weights, not a model. So those .h5 files are only weights. What's the purpose of having only weights without architecture? I was trying to do following code: resnet18_path_to_file = "models/resnet18.h5" resnet18 = tf.keras.models.load_model(resnet18_path_to_file) …

Topic: pretraining keras tensorflow

Category: Data Science

Would there be any reason to pretrain BERT on specific texts?

Moradnejad

2021年2月10日 13:40

So the official BERT English model is trained on Wikipedia and BookCurpos (source). Now, for example, let's say I want to use BERT for Movies tag recommendation. Is there any reason for me to pretrain a new BERT model from scratch on movie-related dataset? Can my model become more accurate since I trained it on movie-related texts rather than general texts? Is there an example of such usage? To be clear, the question is on the importance of context (not …

Topic: pretraining bert transfer-learning language-model

Category: Data Science

How to access GPT-3, BERT or alike?

user305883

2021年1月22日 10:10

I am interested in accessing NLP models mentioned in scientific papers, to replicate some results and experiment. But I only see waiting lists https://openai.com/blog/openai-api/ and licenses granted in large commercial deals https://www.theverge.com/2020/9/22/21451283/microsoft-openai-gpt-3-exclusive-license-ai-language-research . How can a researcher not affiliated to a university or (large) tech company obtain access so to replicate experiments of scientific papers ? Which alternatives would you suggest to leverage on pre-trained data sets ?

Topic: pretraining openai-gpt nlp

Category: Data Science

About