google-cloud

How to save hugging face fine tuned model using pytorch and distributed training

MAC

2022年4月12日 03:25

I am fine tuning masked language model from XLM Roberta large on google machine specs. When I copy the model using gsutil and subprocess from container to GCP bucket it gives me error. Versions Versions torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0+cu113 transformers==4.17.0 I am using pre-trained Hugging face model. I launch it as train.py file which I copy inside docker image and use vertex-ai ( GCP) to launch it using Containerspec machineSpec = MachineSpec(machine_type="a2-highgpu-4g",accelerator_count=4,accelerator_type="NVIDIA_TESLA_A100") python -m torch.distributed.launch --nproc_per_node 4 train.py --bf16 I am …

Topic: huggingface google-cloud pytorch python-3.x distributed

Category: Data Science

Comparing rows in Data Studio

helloworld

2022年2月3日 01:04

I am using Data Studio for a project and I am connecting to my BigQuery table. My table contains following columns: Date Store_name Footfall I'd love to compare the footfall of two stores using Data Studio but apparently I can't do that! Any hints or should I just switch to another viz tool?

Topic: google-cloud data-analysis google-data-studio visualization google

Category: Data Science

What are the value propositions of Databricks compared with GCP, AWS or Azure?

felixthecar

2021年11月2日 13:16

Especially when considering GCP, the analytics offer from Google is quite interesting. Why would you go with Databricks? GCP has also great integration between tools as well as great support for ML/AI, etc.

Topic: google-cloud azure-ml aws

Category: Data Science

How to setup and run Conda on Google Colab

Donald S

2021年10月8日 02:53

I am interested in using Google Colab for data modeling. How do I install conda, create an environment and run python in a notebook? I did some searching and found some helpful hints, but had several issues with this. I can only get a partially functional environment so far. I get stuck in running another cell in the same environment. Seems that switching cells resets the environment back to default.

Topic: google-cloud jupyter anaconda python

Category: Data Science

Joining tables from different locations in Bigquery

shivanshu dhawan

2021年9月29日 15:17

I have been trying to join two tables from different datasets that are in different locations but in the same project. However, I keep getting the error: dataset not found in US location. The datasets' locations are US and us-east1 Here is what I am doing: select a.*, b.* from `project.dataset1.table1` a join `project.dataset2.table2` on a.common_col = b.common_col Please help me out on this.

Topic: google-cloud

Category: Data Science

Lower performace with same script on google cloud vs laptop

JohnDoe

2021年5月20日 18:53

So I want to test a lot of hyperparameters for an xgboost classification model and also do cross validation for all of these. To do this I use a gridsearch. To speed up the process I want to use as many cpu cores as possible, so I set the n_jobs parameter to the number of available cpu cores in the system. This all works perfectly fine, see code below. xgb_model = XGBClassifier(use_label_encoder=False, eval_metric='auc') njobs = os.cpu_count() gsearch = GridSearchCV(estimator=xgb_model, param_grid=param_tuning, …

Topic: google-cloud training

Category: Data Science

How many images can be trained in Google Colab?

Bala venkatesh

2020年12月29日 16:40

I am using the ResNet50 pretrained model to train my images using TensorFlow. I have 70k images and upgraded to Google Colab Pro, but still I am facing a memory error. So how many images I can train in Google Colab? And how much RAM is needed for 70k images? This is how I labeled and loaded images from the drive. labels = [] imagePaths_generater = paths.list_images(Config.DATASET_PATH) imagePaths = [] for item in imagePaths_generater: imagePaths.append(item) for imagePath in imagePaths: label …

Topic: google-cloud colab keras tensorflow predictive-modeling

Category: Data Science

Recommendation Systems User Profile Streaming Data on GCP

m2rik

2020年9月25日 20:28

I have a recommendation system that recommends articles to different users. I am planning to provide the recommendations in an off-line fashion. Where I already have a table in BigQuery which has the recommendations and an API call returns the recommendations for each page on the website. Now I want to have another table called user_profile which stores the information about the user_id|shown|clicked| articles to the users. This should happen in real-time. I looked into https://cloud.google.com/bigquery/streaming-data-into-bigquery but it has limitations. …

Topic: google-bigquery google-cloud data recommender-system machine-learning

Category: Data Science

Google DataFlow

Cassie.L

2020年8月24日 20:54

I'm trying to build a Google dataflow pipeline through one of the posts in Medium. https://levelup.gitconnected.com/scaling-scikit-learn-with-apache-beam-251eb6fcf75b However, it seems like I'm missing one of the project argument and it throws the following error. I'd appreciate your help to guide me through. Error: ERROR:apache_beam.runners.direct.executor:Giving up after 4 attempts. WARNING:apache_beam.runners.direct.executor:A task failed with exception: Missing executing project information. Please use the --project command line option to specify it. Code: import apache_beam as beam import argparse from apache_beam.options.pipeline_options import PipelineOptions from apache_beam.options.pipeline_options import …

Topic: google-cloud pipelines python

Category: Data Science

How do I change the location of Google BigQuery

nick012000

2020年4月16日 12:43

So, I'm trying to use Google BigQuery for the first time for a project of mine, and I'm a bit confused. The documentation isn't helping much, and it looks like all the Google employees are gone thanks to the current epidemic, judging by the blogpost by Google. I've got several .csv files containing tables of unlabeled numerical data uploaded to Google Cloud storage, in their Australian servers. What I want to do is to use BigQuery to perform k-means clustering …

Topic: google-cloud csv data-cleaning

Category: Data Science

Local RTX 2080 is 3x faster than V100 on GCP?

vaid

2020年2月10日 16:29

I have a gaming rig with an i9 CPU, 32GB RAM and RTX 2080, and I have a GCP VM with 4 vCPU, 52 GB RAM and V100. I try to train the same dataset using the same toolchain on both machines and these are my ETA's: GCP VM: 16 days Gaming rig: 5 days How can a single $600 GPU outperform a 10k GPU? What's going on here? And what should I even expect?

Topic: google-cloud training machine-learning

Category: Data Science

Which GPU instance to opt for in Google Cloud?

mausamsion

2019年12月14日 06:00

I am looking to rent a GPU instance in Google Cloud for casual deep learning model training purposes and wondered about the differences between the available Nvidia Tesla versions which are Nvidia Tesla T4 Nvidia Tesla P4 Nvidia Tesla V100 Nvidia Tesla P100 Nvidia Tesla K80 Here is the GPU Pricing page. If anyone has used it or currently doing or know the best GPU to rent it'll be appreciated if you can share your experience or knowledge about it.

Topic: google-cloud gpu deep-learning

Category: Data Science

About