Lower performace with same script on google cloud vs laptop

So I want to test a lot of hyperparameters for an xgboost classification model and also do cross validation for all of these. To do this I use a gridsearch. To speed up the process I want to use as many cpu cores as possible, so I set the n_jobs parameter to the number of available cpu cores in the system. This all works perfectly fine, see code below.

    xgb_model = XGBClassifier(use_label_encoder=False, eval_metric='auc')
    njobs = os.cpu_count()
    gsearch = GridSearchCV(estimator=xgb_model,
                           param_grid=param_tuning,
                           scoring='roc_auc',
                           cv=3,
                           n_jobs=njobs,
                           verbose=10)

Now when I run this on my laptop, it takes around 1 second per core to complete a training cycle at a speed of around 2 GHZ. When I run this exact same script on a Google cloud N2 VM with 8 cores it takes around 1 minute per core to do the same, even though these cores are faster than 2GHZ. That's 60 times slower? Does one of you have any idea why it is so much slower on google cloud?

I already tried upgrading the harddrive to an SSD but that seems to have no impact.

Topic google-cloud training

Category Data Science


After some experimentation I found out that reducing the amount of jobs actually increases the speed. I'm now running 8 machines with a single core where each machine is running faster than the 8 core machine.

I either did something wrong in my code or this library is not built for multi core processing..

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.