Hey guys I'm currently reading about AUC-ROC and I have understood the binary case and I think that I understand the multi-classification case. Now I'm a bit confused on how to generalize it to the multi-label case, and I can't find any intuitive explanatory texts on the matter. I want to clarify if my intuition is correct with an example, let's assume that we have some scenario with three classes (c1, c2, c3). Let's start with multi-classification: When we're considering …
I'm developing image classifiers in a context with 25k images and 50 classes. The dataset is imbalanced. Some papers recommend AUC PR for comparing the performance of my classifiers in this setting. However, I was not able to find any implementation for calculating this metric for multiclass contexts. How can I calculate it? If you have some piece of code that implements this, it would be very helpfull.
I have got the following Precision Recall Curve for a classifier I built using AutoML. Most of the Precisio-Recall curves tend to start from (0, 1) go towards (1,0). But mine is the opposite. But I feel like, similar to the ROC curve it is actually good to get a PR curve that goes towards (1,1), is this understanding wrong? If you get a PR curve like this how would you interpret the results? Is it a good model? If …
I use DeLonge method to compare two ROC AUCS. The result of it is Z-score. Both ROC AUCs obtained from LDA (linear discriminant analysis) from sklearn package. The first one uses eigen solver inside LDA and the second one uses svd solver. The dotted line is my data. The red line is N(0, 1) Note: there is a minor jump at the point Z = 0. Z = 0 means that classifiers did their job equally. Z > 0 (Z …
I have trained two different models, which give a score to each data point. The score of the models it is not necessarily comparable. The score is used to give a ranking, and the performance is measured with AUC and ROC curve. How can I ensamble the different models to obtain a better AUC and ROC curve?
I am a newbie in deep learning field. Still trying to understand how this works. But now I am working on fashion compatibility prediction. The most well-known performance evaluation in this task is fill in the blank (FITB) and compatibility prediction (AUC). I am trying to modify some of the existing models and it comes out that it performs very well in AUC but very low in FITB. AUC~0.98 while FITB~0.30 only. I know it might be difficult to point …
This is one of my model variants. It achieves an AUC score of 0.73. Another one of my model variants achieves an AUC score of 0.7265. Below is the the confusion matrix - Like many problems, the minority class(positive class) represents the customers I'd like to target. But having many false positives is going to cost me money. Q - how to select a model and how such a massive difference in confusion matrix gives similar AUC scores?
I have roc curve with AUC of 0.91. I applied the following function to determine the best threshold: threshold1[np.argmin(np.abs(false_positive_rate1+true_positive_rate1-1))] and I got 0.004. Does it make sense? it means that the change between the classes is very gentle, that there is not enough difference between them?
I have a question regarding logistic regression models and testing its skill. I am not quite sure if I understand correctly how the ROC Curve is established. When calculating the ROC curve, is a train test split happening and then the skill of a model based on the training split is tested on the test split? or is a model based on the ENTIRE data just tested on the ENTIRE data? If the first is the case, would it make …
Question Does the AUC metric calculates the area of ROC or PR? Background tf.keras.metrics.AUC says: This value is ultimately returned as auc, an idempotent operation that computes the area under a discretized curve of precision versus recall values (computed using the aforementioned variables). Therefore, it should be calculating the area under PR, not ROC. However, it also says: Approximates the AUC (Area under the curve) of the ROC or PR curves If it calculate the area under PR, then why …
Im training an Xgb Multiclass problem, but im having doubts about my evaluation metrics, heres my code + output import matplotlib.pylab as plt from sklearn import metrics from matplotlib import pyplot from sklearn.model_selection import GridSearchCV import xgboost as xgb from statistics import mean %matplotlib inline from sklearn.preprocessing import label_binarize from itertools import cycle from sklearn.metrics import roc_curve, auc def plot_roc_curve(y_test, y_pred): n_classes = len(np.unique(y_test)) y_test = label_binarize(y_test, classes=np.arange(n_classes)) #y_pred = label_binarize(y_pred, classes=np.arange(n_classes)) # Compute ROC curve and ROC area for …
This was asked in viva of my ML course. I answered yes but could not precisely explain why. By 'better' I mean whether geometric interpretation gives more information than just the numeric score.
I do not understand why in uplift modeling (Class Transformation approach) not used ROC AUC score for changed target Z. I have a problem with a task where I tried to use this approach, but ROC AUC score have a dramatically low value. At the same time, I could not find any mention of using ROC AUC score for evaluation quality of the model which used Class transformation approach for uplift prediction. As a result I do not understand if …
I have a binary classifier. When i used my model to make predictions about 4k out of 10k were predicted to be "Rich". I am predicting affluence. Normally in classification the cut off to predict if class 1 is 0.5. I have been asked to lower this to 0.4, but lowering it means FP increases and TPs also. How can I numerically display the 'cost' in moving the threshold lower than 0.5?
I am trying with a logistic model with 2 features independently or with linear combination, but in the linear combination, combining these features would reverse importance through significance levels and regression coefficients and reduce prediction accuracy (AUC). For example, With a logistic regression where each of the predictor was scaled with MinMaxScaler When using each features independently: Feature A: coefficient = 0.5, P=0.005, AUC=0.59 Feature B: coefficient = 0.1, P=0.5, AUC=0.502 When using linear combination, the statistics are: Feature A: …