How to save hugging face fine tuned model using pytorch and distributed training

I am fine tuning masked language model from XLM Roberta large on google machine specs. When I copy the model using gsutil and subprocess from container to GCP bucket it gives me error. Versions Versions torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0+cu113 transformers==4.17.0 I am using pre-trained Hugging face model. I launch it as train.py file which I copy inside docker image and use vertex-ai ( GCP) to launch it using Containerspec machineSpec = MachineSpec(machine_type="a2-highgpu-4g",accelerator_count=4,accelerator_type="NVIDIA_TESLA_A100") python -m torch.distributed.launch --nproc_per_node 4 train.py --bf16 I am …
Category: Data Science

How to setup and run Conda on Google Colab

I am interested in using Google Colab for data modeling. How do I install conda, create an environment and run python in a notebook? I did some searching and found some helpful hints, but had several issues with this. I can only get a partially functional environment so far. I get stuck in running another cell in the same environment. Seems that switching cells resets the environment back to default.
Category: Data Science

Joining tables from different locations in Bigquery

I have been trying to join two tables from different datasets that are in different locations but in the same project. However, I keep getting the error: dataset not found in US location. The datasets' locations are US and us-east1 Here is what I am doing: select a.*, b.* from `project.dataset1.table1` a join `project.dataset2.table2` on a.common_col = b.common_col Please help me out on this.
Topic: google-cloud
Category: Data Science

Lower performace with same script on google cloud vs laptop

So I want to test a lot of hyperparameters for an xgboost classification model and also do cross validation for all of these. To do this I use a gridsearch. To speed up the process I want to use as many cpu cores as possible, so I set the n_jobs parameter to the number of available cpu cores in the system. This all works perfectly fine, see code below. xgb_model = XGBClassifier(use_label_encoder=False, eval_metric='auc') njobs = os.cpu_count() gsearch = GridSearchCV(estimator=xgb_model, param_grid=param_tuning, …
Category: Data Science

How many images can be trained in Google Colab?

I am using the ResNet50 pretrained model to train my images using TensorFlow. I have 70k images and upgraded to Google Colab Pro, but still I am facing a memory error. So how many images I can train in Google Colab? And how much RAM is needed for 70k images? This is how I labeled and loaded images from the drive. labels = [] imagePaths_generater = paths.list_images(Config.DATASET_PATH) imagePaths = [] for item in imagePaths_generater: imagePaths.append(item) for imagePath in imagePaths: label …
Category: Data Science

Recommendation Systems User Profile Streaming Data on GCP

I have a recommendation system that recommends articles to different users. I am planning to provide the recommendations in an off-line fashion. Where I already have a table in BigQuery which has the recommendations and an API call returns the recommendations for each page on the website. Now I want to have another table called user_profile which stores the information about the user_id|shown|clicked| articles to the users. This should happen in real-time. I looked into https://cloud.google.com/bigquery/streaming-data-into-bigquery but it has limitations. …
Category: Data Science

Google DataFlow

I'm trying to build a Google dataflow pipeline through one of the posts in Medium. https://levelup.gitconnected.com/scaling-scikit-learn-with-apache-beam-251eb6fcf75b However, it seems like I'm missing one of the project argument and it throws the following error. I'd appreciate your help to guide me through. Error: ERROR:apache_beam.runners.direct.executor:Giving up after 4 attempts. WARNING:apache_beam.runners.direct.executor:A task failed with exception: Missing executing project information. Please use the --project command line option to specify it. Code: import apache_beam as beam import argparse from apache_beam.options.pipeline_options import PipelineOptions from apache_beam.options.pipeline_options import …
Category: Data Science

How do I change the location of Google BigQuery

So, I'm trying to use Google BigQuery for the first time for a project of mine, and I'm a bit confused. The documentation isn't helping much, and it looks like all the Google employees are gone thanks to the current epidemic, judging by the blogpost by Google. I've got several .csv files containing tables of unlabeled numerical data uploaded to Google Cloud storage, in their Australian servers. What I want to do is to use BigQuery to perform k-means clustering …
Category: Data Science

Local RTX 2080 is 3x faster than V100 on GCP?

I have a gaming rig with an i9 CPU, 32GB RAM and RTX 2080, and I have a GCP VM with 4 vCPU, 52 GB RAM and V100. I try to train the same dataset using the same toolchain on both machines and these are my ETA's: GCP VM: 16 days Gaming rig: 5 days How can a single $600 GPU outperform a 10k GPU? What's going on here? And what should I even expect?
Category: Data Science

Which GPU instance to opt for in Google Cloud?

I am looking to rent a GPU instance in Google Cloud for casual deep learning model training purposes and wondered about the differences between the available Nvidia Tesla versions which are Nvidia Tesla T4 Nvidia Tesla P4 Nvidia Tesla V100 Nvidia Tesla P100 Nvidia Tesla K80 Here is the GPU Pricing page. If anyone has used it or currently doing or know the best GPU to rent it'll be appreciated if you can share your experience or knowledge about it.
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.