How do I modify a Logistic Regression to target a specific point on the ROC curve?

From a conceptual standpoint I understand the trade off involved with the ROC curve. You can increase the accuracy of true positive predictions but you will be taking on more false positives and vise versa. I wondering how one would target a specific point on the curve for a Logistic Regression model? Would you just raise the probability threshold for what would constitute a 0 or a 1 in the regression? (Like shifting at what probability predictions start to get …
Category: Data Science

Identify optimal thresholds for one-vs-one/one-vs-rest ROC-curve for multiclass classification

Say I have a multiclass classification problem with N classes. I have trained a classifier on a training set, I use a validation set and a One-vs-rest ROC-curve to give me N ROC curves. Since the ROC curve is created based on different thresholds of when we classify a sample as $Ci$ or not $Ci$. We can then chose (our) optimal FPR/TRP ratio and get the threshold (t) e.g say t=0.6 we classify a sample as $Ci$ if model_score>=0.6 else …
Category: Data Science

How to draw each ROC curve of an SVM model with cross validation

I would like to make a graph like the following in python: That is, one curve for each fold. I have the following code where I use an SVM model to classify some data kf = KFold(n_splits=10) a, fma, fmi = [], [], [] for train, eval in kf.split(x_train): x_train_i, x_eval_i, y_train_i, y_eval_i = x_train[train], x_train[eval], y_train[train], y_train[eval] c = svm.SVC(kernel='rbf', gamma='scale', C=40).fit( x_train_i, y_train_i ) p = c.predict(x_eval_i) acc = c.score(x_eval_i, y_eval_i) f1ma = f1_score(y_eval_i, p, average='macro') f1mi = …
Category: Data Science

Logistic Regression optimal threshold is a negative value

I run the code below: import pandas as pd import numpy as np from sklearn.model_selection import train_test_split from sklearn import linear_model import matplotlib.pyplot as plt from sklearn.linear_model import LogisticRegression from numpy import sqrt from numpy import argmax from sklearn.metrics import roc_curve from sklearn.preprocessing import StandardScaler def standardize(variable): return (variable - np.mean(variable)) / np.std(variable) def normalize(x): return (x-x.min()/(x.max()- x.min())) data.columns = np.arange(len(data.columns)) trainX, testX, trainy, testy=train_test_split(X,y,test_size=0.5,random_state=2, stratify=y) # fit a model model = LogisticRegression(solver='lbfgs') model.fit(trainX, trainy) #yhat = model.predict_proba(testX) yhat = …
Category: Data Science

ROC-AUC Imbalanced Data Score Interpretation

I have a binary response variable (label) in a dataset with around 50,000 observations. The training set is somewhat imbalanced with, =1 making up about 33% of the observation's and =0 making up about 67% of the observations. Right now with XGBoost I'm getting a ROC-AUC score of around 0.67. The response variable is binary so the baseline is 50% in term of chance, but at the same time the data is imbalanced, so if the model just guessed =0 …
Category: Data Science

How to plot one graph of ROC curve for 4 separate ML model located in different python notebooks

if we have 4 different notebooks for different ML model results .. and we have to plot one ROC curve graph which shows the ROc of all 4 models. how can we do this this is my code in every notebook to plot roc import sklearn.metrics as metrics # calculate the fpr and tpr for all thresholds of the classification fpr, tpr, threshold = metrics.roc_curve(y_true1, y_pred1) roc_auc = metrics.auc(fpr, tpr) # method I: plt import matplotlib.pyplot as plt plt.title('Receiver Operating …
Category: Data Science

Interpreting ROC curves across k-fold cross-validation

I have used a MARS model (multivariate adaptive regression splines) and I have used k fold cross validation for the evaluation of the model, obtaining the following graph: How would be the interpretation of this model? I understand that in the 6 fold, the model obtains a better AUC, but why? What is the interpretation of this? Thanks to all.
Category: Data Science

How to observe dependencies using ROC curves?

I am dealing with a database in which I am recording the scores that each race has to obtain a credit. I have made the following graphs: [enter image description here]2 Where you can see the relative benefit of each breed with respect to a threshold of their scores and the ROC curve of each breed. My question is: By means of these graphs, can you find dependencies between each of the races? Thank you and best regards.
Category: Data Science

My data can be approximated with Normal mixture. How can I find the reasons and explain this behaviour?

I use DeLonge method to compare two ROC AUCS. The result of it is Z-score. Both ROC AUCs obtained from LDA (linear discriminant analysis) from sklearn package. The first one uses eigen solver inside LDA and the second one uses svd solver. The dotted line is my data. The red line is N(0, 1) Note: there is a minor jump at the point Z = 0. Z = 0 means that classifiers did their job equally. Z > 0 (Z …
Category: Data Science

What does it mean when roc curves intersect at a point?

I am working with a data set and I have obtained the following roc curve: As you can see, black and Asian ethnicity cross at one point (green and purple lines). Does this have any significance? Could any conclusion be drawn from this? Note that I am dealing with the following datasets: -transrisk_performance_by_race_ssa -transrisk_cdf_by_race_ssa.csv -totals.csv In order to observe whether fairness affects profits.
Category: Data Science

How to ensamble different ranking models?

I have trained two different models, which give a score to each data point. The score of the models it is not necessarily comparable. The score is used to give a ranking, and the performance is measured with AUC and ROC curve. How can I ensamble the different models to obtain a better AUC and ROC curve?
Category: Data Science

Does thereshold of classifier close to 0 make sense?

I have roc curve with AUC of 0.91. I applied the following function to determine the best threshold: threshold1[np.argmin(np.abs(false_positive_rate1+true_positive_rate1-1))] and I got 0.004. Does it make sense? it means that the change between the classes is very gentle, that there is not enough difference between them?
Category: Data Science

Does it make sense to repeat calculating AUC in logistic regression?

I have a question regarding logistic regression models and testing its skill. I am not quite sure if I understand correctly how the ROC Curve is established. When calculating the ROC curve, is a train test split happening and then the skill of a model based on the training split is tested on the test split? or is a model based on the ENTIRE data just tested on the ENTIRE data? If the first is the case, would it make …
Category: Data Science

Uncertainty about shape of ROC curve

I am working on a binary classification and the plotted ROC curves that I am using for evaluation together with AUC, have seemed strange to me. Here is an example. I understand that ROC is a visual representation of the true positive rate versus the false positive rate. When plotting the confusion matrix I can see there are significant number of false negatives and false positives alike: I fail to understand how it is possible that the ROC curve only …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.