I've pretrained the RoBERTa model with new data using a 'simpletransformers' library: from simpletransformers.classification import ClassificationModel OUTPUT_DIR = 'roberta_output/' model = ClassificationModel('roberta', 'roberta-base',use_cuda=False, num_labels=22, args={'overwrite_output_dir':True, 'output_dir':OUTPUT_DIR}) model.train_model(train_df) result, model_outputs, wrong_predictions = model.eval_model(test_df) # model evaluation on test data where 'train_df' is a pandas dataframe that consists of many samples (=rows) with two columns: the 1st column is a text data - input; the 2nd column is a category (=label) - output. I need to create the same model and pretrain …
Fine tuning is a concept commonly used in deep learning. We may have a pre-trained model and then fine-tune it to our specific task. Does that apply to simple models, such as logistic regression? For example, let's say I have a dataset with attribute variables of an animal and I want to classify whether or not it is a mammal or not. The labels on that dataset are only "mammal"/"not mammal". I then train a logistic regression model for this …
I have a doubt regarding terminology. When dealing with huggingface transformer models, I often read about "using pretrained models for classification" vs. "fine-tuning a pretrained model for classification." I fail to understand what the exact difference between these two is. As I understand, pretrained models by themselves cannot be used for classification, regression, or any relevant task, without attaching at least one more dense layer and one more output layer, and then training the model. In this case, we would …
With Gensim < 4.0, we can retrain a word2vec model using the following code: model = Word2Vec.load_word2vec_format("GoogleNews-vectors-negative300.bin", binary=True) model.train(my_corpus, total_examples=len(my_corpus), epochs=model.epochs) However, what I understand is that Gensim 4.0 is no longer supporting Word2Vec.load_word2vec_format. Instead, I can only load the keyedVectors. How to fine-tune a pre-trained word2vec model (such as the model trained on GoogleNews) with my domain-specific corpus using Gensim 4.0?
We have followed the following steps: Trained 5 TensorFlow models in local machine using 5 different training sets. Saved those in .h5 format. Converted those into tar.gz (Model1.tar.gz,...Model5.tar.gz) and uploaded it in the S3 bucket. Successfully deployed a single model in an endpoint using the following code: from sagemaker.tensorflow import TensorFlowModel sagemaker_model = TensorFlowModel(model_data = tarS3Path + 'model{}.tar.gz'.format(1), role = role, framework_version='1.13', sagemaker_session = sagemaker_session) predictor = sagemaker_model.deploy(initial_instance_count=1, instance_type='ml.m4.xlarge') predictor.predict(data.values[:,0:]) The output was: {'predictions': [[153.55], [79.8196], [45.2843]]} Now the problem …
I am creating a supervised model using sensitive and scarce data. For the sake of discussion, I've simiplified the problem statement by assuming that I'm creating a model for identifying dogs. Let's say I am creating a model to identify dogs in pictures. I trained it with few positive and negative examples. I could not gather a lot of data because it is scarce. Therefore, the model accuracy is not good (say f-score = 0.64). I deployed this model in …
I am not sure about the logic behind, how pre-trained weights actually make sense and translate into a new problem. To be more specific; for example in a object detection network, how would a model's weights that were trained, let's say, on the COCO dataset, with 80 categories, would translate into my new problem that only has 2 categories (classes). How does this make sense? What kind of meaningful features could even be transferred from the previously pre-trained model to …
I'm trying to reproduce a research with greyscale images instead of colour images. I have found that there are pre-trained networks, like VGG16, with ImageNet. But that dataset has colour images, and I can't use it because I'm going to use greyscale images. Is there any pre-trained network with greyscale images? Failing that, I can also train a network with a greyscale image dataset but I can't find any.
I've tried to find any guidance on using transfer learning when building baseline models for ML projects (CNN in my case) but found no clues on good practices in the matter. My logic says that no baseline model should be pretrained first as it is complicating it without any known reason to do it (as yet it is not proven we need it). But it is not the first time my logic may be wrong in the case of DS. …
Can I leave unedited natural outliers in a dataset (outliers that have not appeared just because of mistyping of mistakes in the data)? Or should I also remove them or change them?
I have predefined train and test sets. On generating some statistics like value_counts and checking the unique values, I feel that there is a 'lot' of difference between the distributions of the variables. What should be done with this? Suppose if I want to delete a column from the train_set for any reason like near-zero variance, should I repeat the same for the test_set (even if there is no such problem in the test_set's frequency tables? I ran the following …
XLNet takes a lot more time than BERT during pre-training. This results in XLNet performing better than BERT in over 20 NLP tasks. How long does XLNet take for fine-tuning (let's assume this is running on Google Colab)? (Let's assume a text summarization task with around 4000 examples)
I'm thinking of how to design a basic feedforward neural network that would be able to predict future datapoints given past datapoints. I'm very new to neural network design so I'm wondering if there's some sort of a best practice as far as getting as much data out of the input data as possible. Would it make sense to provide the neural network with mathematically computed derivatives of the input data or are feedforward networks capable of generating derivatives within …
For some reason, I need to do further (2nd-stage) pre-training on Huggingface Bert model, and I find my training outcome is very bad. After debugging for hours, surprisingly, I find even training one single batch after loading the base model, will cause the model to predict a very bad choice when I ask it to unmask some test sentences. I boil down my code to the minimal reproducible version here: import torch from transformers import AdamW, BertTokenizer from transformers import …
Currently, I am working on an image classification project. The data set contains very high resolution images taken via an electron microscope. Hence, I have few and limited instances. I have done EDA and made up a deep CNN to go about it. The results are not very satisfying. Even tweaking the model did not work. I got similar results in cross-validation as well. I also performed data augmentation, but I do not possess enough knowledge of it, can anyone …
GPT-1 mentions both Semi-supervised learning and Unsupervised pre-training but it seems like the same to me. Moreoever, "Semi-supervised Sequence Learning" of Dai and Le also more like self-supervised learning. So what the key differences between them?
A lot of natural language processing tools are pre-trained with corpus in English. What if ones need to analyze, say, Dutch text? The blogs I find online are mostly saying traslating text into English as pro-processing. Is this the common practice? If not, then what? Also, does how similar a language is to English have an impact on the model performance? For some also widely speaking languages (e.g French, Spanish), do people construct corpus in their own language and train …
I have downloaded .h5 files from qubvel/resnet and qubvel/efficientnet. I was trying to use some models as a backbone for my model but I'm getting the following error: ValueError: No model found in the config file. As explained here this is because the .h5 file contains only weights, not a model. So those .h5 files are only weights. What's the purpose of having only weights without architecture? I was trying to do following code: resnet18_path_to_file = "models/resnet18.h5" resnet18 = tf.keras.models.load_model(resnet18_path_to_file) …
So the official BERT English model is trained on Wikipedia and BookCurpos (source). Now, for example, let's say I want to use BERT for Movies tag recommendation. Is there any reason for me to pretrain a new BERT model from scratch on movie-related dataset? Can my model become more accurate since I trained it on movie-related texts rather than general texts? Is there an example of such usage? To be clear, the question is on the importance of context (not …
I am interested in accessing NLP models mentioned in scientific papers, to replicate some results and experiment. But I only see waiting lists https://openai.com/blog/openai-api/ and licenses granted in large commercial deals https://www.theverge.com/2020/9/22/21451283/microsoft-openai-gpt-3-exclusive-license-ai-language-research . How can a researcher not affiliated to a university or (large) tech company obtain access so to replicate experiments of scientific papers ? Which alternatives would you suggest to leverage on pre-trained data sets ?