Specifying number of threads using XGBoost.train

When using the xgboost.train() function, all the threads are used. I would like to use a specific amount. Unfortunately, this function does not accept the parameters nthread nor n_jobs. How can I control the number of threads being used?


// Edit

It seems that I found a solution. In contrast with the method, how one provides the nthread (or n_jobs) parameter to XGBClassifier of XGBRegressor, by adding this parameter directly to the brackets as xgb.XGBRegressor(nthread=n) then as indicated on xgboost document (page 46), I added an additional parameter parameters["nthread"] = number_of_threads to the parameters (a dictionary) I am using. After testing with different numbers, the number of threads being used reported in htop was the same as the number_of_threads parameter provided. Can anyone confirm this to be the right method?

Topic xgboost parallel processing

Category Data Science

libxgboost.so as of version XGBoost-1.5.2 uses OMP for parallelization.

There is no option to set the number of threads used for prediction using XGBoost API, unfortunately, nthreads DMatrix argument has no effect.

Set environment variable OMP_THREAD_LIMIT to the maximum number of threads OMP can use. E.g. export OMP_THREAD_LIMIT=1 causes the following messages:

OMP: Warning #96: Cannot form a team with 32 threads, using 1 instead.

In Python add these lines before other imports:

import os
os.environ["OMP_THREAD_LIMIT"] = "1"

Using xgb.train function, you can set nthread in params.

xgb.train({'nthread': 3}, dtrain)

In the xgboost documentation, afaict, there is no clear way how to set the nthread parameter. Some global params are set by xgb.set_config and some are not, like nthread.

But when I walk through their test script, I found nthread is set by params in xgb.train.

You can set the number of threads by nthread parameter in XGBClassifier or XGBRegressor

import time
import numpy as np
from sklearn.datasets import load_boston
import xgboost as xgb
num_threads = [1,2,3,4,5,6,8,16,32,64]
for n in num_threads:
    start = time.time()
    model = xgb.XGBRegressor(objective='reg:squarederror',nthread=n)
    model.fit(X, y)
    elapsed = time.time() - start
    print(n, round(elapsed,3))

The output after execution of this code is

    1 0.059
    2 0.071
    3 0.063
    4 0.094
    5 0.075
    6 0.078
    8 0.09
    16 0.099
    32 0.157
    64 0.235


Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.