From a conceptual standpoint I understand the trade off involved with the ROC curve. You can increase the accuracy of true positive predictions but you will be taking on more false positives and vise versa. I wondering how one would target a specific point on the curve for a Logistic Regression model? Would you just raise the probability threshold for what would constitute a 0 or a 1 in the regression? (Like shifting at what probability predictions start to get …
Say I have a multiclass classification problem with N classes. I have trained a classifier on a training set, I use a validation set and a One-vs-rest ROC-curve to give me N ROC curves. Since the ROC curve is created based on different thresholds of when we classify a sample as $Ci$ or not $Ci$. We can then chose (our) optimal FPR/TRP ratio and get the threshold (t) e.g say t=0.6 we classify a sample as $Ci$ if model_score>=0.6 else …
So i have a multiclass problem and successfully computed the micro and macro average curves, how do I calculate the weighted value for each TPR and FPR?
I would like to make a graph like the following in python: That is, one curve for each fold. I have the following code where I use an SVM model to classify some data kf = KFold(n_splits=10) a, fma, fmi = [], [], [] for train, eval in kf.split(x_train): x_train_i, x_eval_i, y_train_i, y_eval_i = x_train[train], x_train[eval], y_train[train], y_train[eval] c = svm.SVC(kernel='rbf', gamma='scale', C=40).fit( x_train_i, y_train_i ) p = c.predict(x_eval_i) acc = c.score(x_eval_i, y_eval_i) f1ma = f1_score(y_eval_i, p, average='macro') f1mi = …
I run the code below: import pandas as pd import numpy as np from sklearn.model_selection import train_test_split from sklearn import linear_model import matplotlib.pyplot as plt from sklearn.linear_model import LogisticRegression from numpy import sqrt from numpy import argmax from sklearn.metrics import roc_curve from sklearn.preprocessing import StandardScaler def standardize(variable): return (variable - np.mean(variable)) / np.std(variable) def normalize(x): return (x-x.min()/(x.max()- x.min())) data.columns = np.arange(len(data.columns)) trainX, testX, trainy, testy=train_test_split(X,y,test_size=0.5,random_state=2, stratify=y) # fit a model model = LogisticRegression(solver='lbfgs') model.fit(trainX, trainy) #yhat = model.predict_proba(testX) yhat = …
I have a binary response variable (label) in a dataset with around 50,000 observations. The training set is somewhat imbalanced with, =1 making up about 33% of the observation's and =0 making up about 67% of the observations. Right now with XGBoost I'm getting a ROC-AUC score of around 0.67. The response variable is binary so the baseline is 50% in term of chance, but at the same time the data is imbalanced, so if the model just guessed =0 …
if we have 4 different notebooks for different ML model results .. and we have to plot one ROC curve graph which shows the ROc of all 4 models. how can we do this this is my code in every notebook to plot roc import sklearn.metrics as metrics # calculate the fpr and tpr for all thresholds of the classification fpr, tpr, threshold = metrics.roc_curve(y_true1, y_pred1) roc_auc = metrics.auc(fpr, tpr) # method I: plt import matplotlib.pyplot as plt plt.title('Receiver Operating …
I have used a MARS model (multivariate adaptive regression splines) and I have used k fold cross validation for the evaluation of the model, obtaining the following graph: How would be the interpretation of this model? I understand that in the 6 fold, the model obtains a better AUC, but why? What is the interpretation of this? Thanks to all.
I am dealing with the issue of fairness in machine learning models. One of the group fairness criteria is separation. I have read that it only makes sense to show the separation criterion using ROC curves, but I don't understand why. Could someone explain it to me? Thank you very much.
On the test set of a binary classification problem, the p25, p50 and p75 of the predictions are very close to each other (e.g. 0.123). Is it possible that my model can achieve a high AUC-ROC (e.g. 0.85) despite giving the same probability prediction for almost the rows? The data is imbalanced.
I am dealing with a database in which I am recording the scores that each race has to obtain a credit. I have made the following graphs: [enter image description here]2 Where you can see the relative benefit of each breed with respect to a threshold of their scores and the ROC curve of each breed. My question is: By means of these graphs, can you find dependencies between each of the races? Thank you and best regards.
I use DeLonge method to compare two ROC AUCS. The result of it is Z-score. Both ROC AUCs obtained from LDA (linear discriminant analysis) from sklearn package. The first one uses eigen solver inside LDA and the second one uses svd solver. The dotted line is my data. The red line is N(0, 1) Note: there is a minor jump at the point Z = 0. Z = 0 means that classifiers did their job equally. Z > 0 (Z …
I am working with a data set and I have obtained the following roc curve: As you can see, black and Asian ethnicity cross at one point (green and purple lines). Does this have any significance? Could any conclusion be drawn from this? Note that I am dealing with the following datasets: -transrisk_performance_by_race_ssa -transrisk_cdf_by_race_ssa.csv -totals.csv In order to observe whether fairness affects profits.
I have trained two different models, which give a score to each data point. The score of the models it is not necessarily comparable. The score is used to give a ranking, and the performance is measured with AUC and ROC curve. How can I ensamble the different models to obtain a better AUC and ROC curve?
I have got a binary classification problem with large dataset of dimensions (1155918, 55) Also dataset is fairly balanced of 67% Class 0 , 33% Class 1. I am getting test accuracy of 73% in test set and auc score is 50 % Recall is 0.02 for Class 1 I am using a logistic regression and also tried pycaret's classification algorithm
I have roc curve with AUC of 0.91. I applied the following function to determine the best threshold: threshold1[np.argmin(np.abs(false_positive_rate1+true_positive_rate1-1))] and I got 0.004. Does it make sense? it means that the change between the classes is very gentle, that there is not enough difference between them?
I have a question regarding logistic regression models and testing its skill. I am not quite sure if I understand correctly how the ROC Curve is established. When calculating the ROC curve, is a train test split happening and then the skill of a model based on the training split is tested on the test split? or is a model based on the ENTIRE data just tested on the ENTIRE data? If the first is the case, would it make …
I am working on a binary classification and the plotted ROC curves that I am using for evaluation together with AUC, have seemed strange to me. Here is an example. I understand that ROC is a visual representation of the true positive rate versus the false positive rate. When plotting the confusion matrix I can see there are significant number of false negatives and false positives alike: I fail to understand how it is possible that the ROC curve only …