Calibration Curve Error

I want to calibrate probability outputs of a model. I'm using Isotonic Regression. After calibration, when I called calibration_curve function of sklearn calibration module I got this error: ValueError: 'list' argument must have no negative elements. However when I checked results that I obtain from Isotonic Regression there are no negative values and all values are in [0,1] range. There are no problems in targets too.

from sklearn.isotonic import IsotonicRegression

i_reg = IsotonicRegression().fit(X_train, y_train)
res = i_reg.predict(X_test)
prob_true, prob_pred = calibration_curve(y_test, res, n_bins=200, strategy=quantile)

Error log:

ValueError                                Traceback (most recent call last)
ipython-input-36-5a2bd10e64b2 in module
---- 1 prob_true_model_test , prob_pred_cal_test = calibration_curve(y_test, res, n_bins=200, strategy=quantile)

/environment/lib/python3.6/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
     70                           FutureWarning)
     71         kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
--- 72         return f(**kwargs)
     73     return inner_f
     74 

/environment/lib/python3.6/site-packages/sklearn/calibration.py in calibration_curve(y_true, y_prob, normalize, n_bins, strategy)
    590     binids = np.digitize(y_prob, bins) - 1
    591 
-- 592     bin_sums = np.bincount(binids, weights=y_prob, minlength=len(bins))
    593     bin_true = np.bincount(binids, weights=y_true, minlength=len(bins))
    594     bin_total = np.bincount(binids, minlength=len(bins))

__array_function__ internals in bincount(*args, **kwargs)

ValueError: 'list' argument must have no negative elements

Topic probability-calibration scikit-learn

Category Data Science


It's hard to say without your data. I'd comment this, but it's a bit long and better formatted as an answer.

The part of the source code that's relevant is quite short, so you can go through step by step to see what's wrong:

bins = np.linspace(0., 1. + 1e-8, n_bins + 1)
binids = np.digitize(y_prob, bins) - 1
bin_sums = np.bincount(binids, weights=y_prob, minlength=len(bins))

digitize can return 0 if an element is to the left of all the bins; that shouldn't happen here, but maybe some computational precision error has crept in? Check what values you get for binids above.

See also this StackOverflow question, where NaNs were the culprit.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.