probability-calibration

Is there any way to artificially create a probability calibration for data coming from another model?

Juan Esteban de la Calle

2022年5月12日 13:57

I have predictions, which come from a survival model, this model gives me very low probabilities, and I am not sure if they fulfill the real probability of the phenomenon. For example, I calculate $P\left( T\leq t+d \middle| T>t \right)$ and the probabilities are very low (with $d=180$). To summarize, I need these probabilities to be on average another number (let's say $0.2$). Is it possible to create an artificial calibration with only this number (the desired average) as the …

Topic: data-science-model probability-calibration probability

Category: Data Science

Best metric to evaluate model probabilities

Robert Chasnouski

2022年5月10日 16:16

i'm trying to create ML model for binary classification problem with balanced dataset and i care mostly about probabilities. I was trying to search web and i find only advices to use AUC or logloss scores. There is no advices to use Brier score as evaluation metric. Can i use brier score as evaluation metric or there is some pitfalls within it? As i can understand if i will use logloss score as evaluation metric the "winner one" model will …

Topic: probability-calibration probability evaluation machine-learning

Category: Data Science

Model recalibration on different dataset

James Flash

2022年4月28日 11:20

I have a large dataset approximately 150k rows and 1500 of positive labels on which I can train my model for binary classification. And also I have the other dataset which is smaller and is comprised from 80k rows and 100 positive labels. The problem is that I can't train model on the small dataset because it results in bad quality. And the model trained on the large dataset can provide more stable outcomes for the second case due to …

Topic: data-science-model probability-calibration logistic-regression classification

Category: Data Science

Determining threshold in an area with very few samples of positive label

Gnoevoet

2022年4月11日 01:08

I have a binary classification task where I want to either keep or discard samples. I have about a million samples, and about 1% should be kept. I want to discard as much as possible, but discarding the wrong sample carries a heavy penalty. I have concluded I want to optimize something like the following: n_discards - n_false_discards * penalty Where I expect penalty to be around 5000. Now, it's easy enough to look at my validation data (around 100k …

Topic: probability-calibration xgboost cross-validation class-imbalance python

Category: Data Science

How to interpret calibration curves for prediction models?

The Great

2022年3月28日 04:46

I am working on a binary classification using random forest with 977 records (77:23 is the class ratio). After building the model and getting an AUC of 81, i thought of building a calibration curve and calculate brier score. My graph looks like below (without calibration model being used). I don't why my AUC is dropped here (when I use the below code) Later, when I build the calibration model, I see the below output. You can see that my …

Topic: probability-calibration random-forest classification data-mining machine-learning

Category: Data Science

Predictions using calibrated classifer

Maths12

2022年3月13日 17:29

I find myself asking alot of calibration related questions recently - but i cannot find adequate material on it! I am training a binary classifier to predict default. This probability will be used in such a way that the customers predicted to be class '1' will form the target for us - we simply provide `summary stats on this target set. Since we sometimes move the threshold for when class is 1 e.g. moving lower or higher than 0.5 we …

Topic: probability-calibration probability xgboost classification machine-learning

Category: Data Science

Should I use "sample_weights" on a calibrator if I already used them while training the model (imbalanced dataset)?

Jacobo O

2022年2月22日 16:19

I was wondering what is the right way to proceed when you are dealing with an imbalanced dataset and you want to use a calibrator. When I work with a single model and imbalanced datasets I usually pass "sample_weights" to the model, but I don't know if "sample_weights" should be passed to the calibrator as well.

Topic: imbalanced-data probability-calibration

Category: Data Science

How can I improve calibration curves?

Maths12

2022年1月7日 16:49

I am training a binary xgboost classifer with an imbalance of : 85% = 0 class and 14 % = class 1. This was achieved after i took a random sample fromaround 11m to 1M. When i calibrate i get the following: It seems that using isotonic or sigmoid doesn't really improve the calbration much. Any idea how i can improve it? sig_clf = CalibratedClassifierCV(model, method="sigmoid", cv="prefit") iso_clf = CalibratedClassifierCV(model, method="isotonic", cv="prefit") sig_clf.fit(x_valid, y_valid) iso_clf.fit(x_valid, y_valid) prob_pos_sigmoid = sig_clf.predict_proba(x_test)[:, 1] …

Topic: probability-calibration class-imbalance scikit-learn python

Category: Data Science

Implementing Smoothed Isotonic Regression

user1337

2021年10月6日 06:05

In the paper here the authors suggest a new way of calibrating classifiers, called Smoothed Isotonic Regression (Algorithm 1). As I follow the algorithm along, I noticed a problem in lines 19-20: After IBL is first created, when getting to line 19 it becomes an empty list [I assume IL means IBL], which makes line 20 throw an exception. My questions on this: Is there really a problem in the algorithm or am I missing something? If there really is …

Topic: probability-calibration classification algorithms

Category: Data Science

Account for imbalanced data in a Neural Network using prior distribution

CutePoison

2021年7月9日 10:55

I have a dataset with 4 classes, say their distribution in the training-set is $P_{prior}(C1) = 60\% $ $P_{prior}(C2) = 25\% $ $P_{prior}(C3) = 10\% $ $P_{prior}(C4) = 5\% $ After training a Neural Network (on a balanced dataset, i.e after undersampling), I get the output for a new sample as $P(C1) = 50\%$, $P(C2) = 10\%$ $P(C3) = 10\%$ $P(C4)=30\%$ Ususally we would just assign the sample to class 1, since it has the greatest outcome. But, if we …

Topic: probability-calibration class-imbalance neural-network

Category: Data Science

Calibration of a few binary classifiers is not perfect - why?

Rafa

2021年6月11日 08:01

I am working on a binary classifier using LightGBM. I try to see the results of the classifiers when changing the costs of false positives and false negatives, still working on the same training and validating datasets. As I want to have probabilities as a result of my modelling, I use isotonic regression as a final part of the pipeline. Applying exactly the same methodology and code, but only changing those variables of customized objective function, I can see that …

Topic: lightgbm objective-function probability-calibration scikit-learn classification

Category: Data Science

why does my calibration curve for platts and isotonic have less points than my uncalibrated model?

Maths12

2021年4月7日 13:03

i train a model using grid search then i use the best parameters from this to define my chosen model. model = XGBClassifier() pipeline = make_pipeline(model) kfolds = StratifiedKFold(3) clf = GridSearchCV(pipeline, parameters, cv=kfolds.split(x_train, y_train), scoring='roc_auc', return_train_score=True) clf.fit(x, y) model = clf.best_estimator_ using this model from gridsearch i then calibrate it and plot uncalibrated vs calibrated.. y_test_uncalibrated = model.predict_proba(x_test)[:, 1] fraction_of_positives, mean_predicted_value=calibration_curve(y_test,y_test_uncalibrated,n_bins=10) plt.plot(mean_predicted_value, fraction_of_positives, 's-', label='Uncalibrated') clf_isotonic = CalibratedClassifierCV(model, cv='prefit', method='isotonic') clf_isotonic.fit(x_train, y_train) y_test_iso = clf_isotonic.predict_proba(x_test)[:, 1] fraction_of_positives, mean_predicted_value = …

Topic: grid-search probability-calibration probability xgboost python

Category: Data Science

Calibration curve motivation

James Flash

2021年3月19日 13:50

I struggle to understand the mathematical motivation for the binary classification model calibration curve. Why do we assume that the predicted probabilities should be consistent with the proportion of 1's in the probability bin (# of 1's in the bin)/(total # of samples in the bin) ? It's obvious for Decision Tree where the (# of 1's in the bin)/(total # of samples in the bin) ratio is explicitly the model output, but how is this related to the other …

Topic: probability-calibration bayesian classification

Category: Data Science

Calibration Curve Error

tkarahan

2020年12月21日 19:50

I want to calibrate probability outputs of a model. I'm using Isotonic Regression. After calibration, when I called calibration_curve function of sklearn calibration module I got this error: ValueError: 'list' argument must have no negative elements. However when I checked results that I obtain from Isotonic Regression there are no negative values and all values are in [0,1] range. There are no problems in targets too. from sklearn.isotonic import IsotonicRegression i_reg = IsotonicRegression().fit(X_train, y_train) res = i_reg.predict(X_test) prob_true, prob_pred = …

Topic: probability-calibration scikit-learn

Category: Data Science

Calibrating probability thresholds for multiclass classification

machinery

2020年12月20日 23:58

I have built a network for the classification of three classes. The network consists of a CNN followed by two fully-connected layers. The CNN consists of convolutional layers, followed by batch normalization, a RELU activation, max pooling and drop out. The three classes are imbalanced (as can be seen in the confusion matrix below). I have optimized the parameters of the network to maximize AUC. I'm calculating the AUC using macro- and micro-averaging. As can be seen in the ROC …

Topic: probability-calibration class-imbalance confusion-matrix classification machine-learning

Category: Data Science

How to ouput buckets of probabilities?

lcrmorin

2020年12月6日 23:42

I am dealing with an unbalanced binary classification problem. The problem is so unbalanced (2:98) and hard to predict that I am interested in probability of the positive outcome instead of trying to predict the actual binary output. Depending on the model used this require either to calibrate the model in probabilities (tree based models) or transforming scores into probabilities using some spline (NN). But in the end, for all practical matters I use buckets of probabilities. With 2% being …

Topic: probability-calibration supervised-learning classification predictive-modeling

Category: Data Science

How can i tell if my model is overfitting from the distribution of predicted probabilities?

Maths12

2020年8月24日 03:01

all, i am training light gradient boosting and have used all of the necessary parameters to help in over fitting.i plot the predicted probabilities (i..e probabililty has cancer) distribution from the model (after calibrating using calibrated classifier) i.e. their histogram or kde. as you can see from below the probabilities for my class 1 are concentrated on the upper and lower end. i have tried playing around with bandwith too to smooth this a little and it doesn't smooth the …

Topic: lightgbm probability-calibration probability classification python

Category: Data Science

XGBoost: how to adjust the probabilities of a binary classifier to match training data?

Henrique Nader

2020年8月18日 16:28

Training and testing data have around 1% positives, but the model predicts only around 0.1% as positives. The model is an xgboost classifier. I’ve tried calibration but it didn’t improve much. I also don’t want to pick thresholds since the final goal is to output probabilities. What I want is for the model to have a number of classified positives similar to the number of positives in the actual data.

Topic: probability-calibration xgboost python machine-learning

Category: Data Science

xgboost calibration kde plots (isotonic) not smooth

Maths12

2020年8月17日 09:38

i am training my xgboost model on an imbalanced binary classification problem. It is important to me to have well calibrated probabilities so i have chosen to optimize the brier score. I then plot the kde and reliability curve of my models where i try isotonic and platts. e.g my grid search is: gscv = GridSearchCV(pipeline, param_grid=params['xgboost'],scoring='neg_brier_score', cv=kfolds.split(x_train, y_train),return_train_score=True) the kde plot on the left corresponds to the uncalibrated probabilities , to me this looks good since the RED CURVE …

Topic: grid-search probability-calibration xgboost scoring

Category: Data Science

calibrating classifier probabilities for unbalanced data when class ratios are unknown

Graham501617

2020年7月3日 10:31

I've built a binary classification convolutional neutral network, trained on simulated data with equal numbers of simulations for each class. I've obtained good results for a validation set with equal classes and am using beta regression for calibrating the output probabilities [1]. The classifier will now be applied to an empirical dataset, where the classes are likely very unbalanced. If I knew the true class proportions in the empirical dataset, my approach would be to fit the calibration regression to …

Topic: probability-calibration class-imbalance classification

Category: Data Science

About