Random search grid not displaying scoring metric

I want to do a grid search of some few hyperparameters through a XGBClassifier of a binary class, but whenever i run it the score value (roc_auc) is not being display. I read in other question that this can be related to some error in model training but i am not sure which one is in this case.

My model training data X_train is a np.array of (X, 19)

and my y_train is a numpy.ndarray of shape (X, ) which looks like this

And then i create my model params and model in this way

from sklearn.model_selection import RandomizedSearchCV, GridSearchCV
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import StratifiedKFold
from xgboost import XGBClassifier

# A parameter grid for XGBoost
params = {
        'min_child_weight': [1, 5, 10],
        'gamma': [0.5, 1, 1.5, 2, 5],
        }
xgb = XGBClassifier(use_label_encoder=False, eval_metric='logloss')

folds = 3
param_comb = 5

skf = StratifiedKFold(n_splits=folds, shuffle = True, random_state = 1001)

random_search = RandomizedSearchCV(xgb, 
                                   param_distributions=params,
                                   n_iter=param_comb, 
                                   scoring='roc_auc',
                                   n_jobs=4, 
                                   cv=skf.split(X_train, y_train), 
                                   verbose=3, 
                                   random_state=1001)

random_search.fit(X_train, y_train)

Whenever i hit code from above i am seeing this display which doesnt contain the scoring

[CV 3/3] END ..................gamma=0.5, min_child_weight=5; total time= 3.6min
[CV 1/3] END ..................gamma=0.5, min_child_weight=1; total time= 3.7min
[CV 3/3] END .................gamma=0.5, min_child_weight=10; total time= 3.5min
[CV 1/3] END ....................gamma=2, min_child_weight=5; total time= 3.6min
[CV 2/3] END ..................gamma=0.5, min_child_weight=1; total time= 3.5min
[CV 2/3] END .................gamma=0.5, min_child_weight=10; total time= 3.4min
[CV 2/3] END ..................gamma=1.5, min_child_weight=5; total time= 2.5min
[CV 2/3] END ..................gamma=0.5, min_child_weight=5; total time= 3.5min
[CV 2/3] END ....................gamma=2, min_child_weight=5; total time= 3.4min
[CV 3/3] END ..................gamma=0.5, min_child_weight=1; total time= 3.6min
[CV 1/3] END ..................gamma=1.5, min_child_weight=5; total time= 2.5min
[CV 1/3] END ..................gamma=0.5, min_child_weight=5; total time= 3.6min
[CV 3/3] END ....................gamma=2, min_child_weight=5; total time= 3.5min
[CV 1/3] END .................gamma=0.5, min_child_weight=10; total time= 3.4min
[CV 3/3] END ..................gamma=1.5, min_child_weight=5; total time= 2.5min

Topic grid-search xgboost scikit-learn python

Category Data Science


According to the RandomizedSearchCV and GridSearchCV documentation, I think you should set the return_train_score parameter to True, so that the cv_results_ attribute includes training scores.

Nevertheless, keep in mind their warning:

Computing training scores is used to get insights on how different parameter settings impact the overfitting/underfitting trade-off. However computing the scores on the training set can be computationally expensive and is not strictly required to select the parameters that yield the best generalization performance.

If you are only interested in the score of the best_estimator_, you can always call the best_score_ attribute.

Hope this helps! Cheers.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.