I am using machine learning models to predict an ordinal variable (values: 1,2,3,4, and 5) using 7 different features. I posed this as a regression problem, so the final outputs of a model are continuous variables. So an evaluation box plot looks like this: I experiment with both linear (linear regression, linear SVMs) and nonlinear models (SVMs with RBF, Random forest, Gradient boosting machines ). The models are trained using cross-validation (~1600 samples), and 25% of the dataset is used …
I am using a XGBoost's XGBClassifier, a binary 0-1 target, and I am trying to define a custom metric function. It supposedly receives an array of predictions and a DMatrix with the training set according to the XGBoost Tutorials. I have used objective='binary:logistic' in order to get probabilities but the prediction values passed to the custom metric function are not between 0 and 1. They can be like between -3 and 5 and the range of values seems to grow …
I have created three different models using deep learning for multi-class classification and each model gave me a different accuracy and loss value. The results of the testing model as the following: First Model: Accuracy: 98.1% Loss: 0.1882 Second Model: Accuracy: 98.5% Loss: 0.0997 Third Model: Accuracy: 99.1% Loss: 0.2544 My questions are: What is the relationship between the loss and accuracy values? Why the loss of the third model is the higher even though the accuracy is higher?
Hey guys I'm currently reading about AUC-ROC and I have understood the binary case and I think that I understand the multi-classification case. Now I'm a bit confused on how to generalize it to the multi-label case, and I can't find any intuitive explanatory texts on the matter. I want to clarify if my intuition is correct with an example, let's assume that we have some scenario with three classes (c1, c2, c3). Let's start with multi-classification: When we're considering …
Suppose I have 5 classes, denoted by 1, 2, 3, 4, and 5, and this is used in object detection. When evaluating an object detection performance, suppose I have classes 1, 2, and 3 present, but classes 4 and 5 are not present in the targeted values. Will each of classes 4 and 5 have average precision of 0 (due to its precision being zero as no true positives can be identified)? Or perhaps there are other considerations to take …
I don't understand how the calculation score taking out the sentences where the words contribute the most of to the result helps to show to what extent a model is "faithful" to a reasoning process. Indeed, a faithfulness score was proposed by Du et al. in 2019 to verify the importance of the identified contributing sentences or words to a given model’s outputs. It is assumed that the probability values for the predicted class will significantly drop if the truly …
I have two machine learning model, the model result different error value in MAE and MSE. M1 have smallest MAE, but M2 have smallest MSE. Can anyone explain to me, why this happen? I also add my actual and predicted data here. Thank You
SI is RMSE divided by the average value of the observed values (or the predicted values? am confused)? is SI = 25% acceptable? (is the model good enough? )
So i have a multiclass problem and successfully computed the micro and macro average curves, how do I calculate the weighted value for each TPR and FPR?
I am currently in the process of creating a recommender system. This recommender system works with a neural network and then searches for the closest neighbors and thus gives recommendations for a user. The data is implicit. I only have in the data which products a user has bought.On the basis of this data, I create the recommendations. What are the best metrics to evaluate this recommender system with implicit data? Can I evaluate the model and then the search …
I wanted to know if there are some metrics that work well when working with an unbalanced dataset. I know that accuracy is a very bad metric when evaluating a classifier when the data is unbalanced but, what about for example the Kappa index? Best regards and thanks.
How is "lift" computed? i was reading about "Gain and lift charts" in data science. I picked the following example from https://www.listendata.com/2014/08/excel-template-gain-and-lift-charts.html I am clear on how the gain values are computed. Not clear about lift values are computed? (last column in table)
I am working on two different architectures based on the LSTM model to predict the user's next action based on the previous actions. I am wondering, what is the best way to present the result? Is it okay to present only the prediction accuracy? Or Should I use other metrics? I found a paper using top_K_accuracy whereas on a different paper I found AUC or ROC. Overall, I would like to know what is the state of the art of …
I was reading an answer in qoura to calculate the specificity of a 3 class classifier from a confusion matrix. In the below answer https://www.quora.com/How-do-I-get-specificity-and-sensitivity-from-a-three-classes-confusion-matrix For below 3-class confusion matrix, The below is a screenshot from the answer. the sensitivity and specificity would be found by calculating the following: My question is in numerator for specificity which is false negatives shouldnt it be 4 terms. Eg if we are calculating w/r 1, Then in the table n22,n33,n32 and n23 were …
I'm still a bit new to deep learning. What I'm still struggling, is what is the best practice in re-training a good model over time? I've trained a deep model for my binary classification problem (fire vs non-fire) in Keras. I have 4K fire images and 8K non-fire images (they are video frames). I train with 0.2/0.8 validation/training split. Now I test it on some videos, and I found some false positives. I add those to my negative (non-fire) set, …
I have few queries. 1) Is normalization required for ANN / CNN /LSTM ? 2) If we normalize the data with MinMax Scaler, then in that case how to denormalize it and when to denormalize it so that we can get the Error Metrics in the original format?
I developed a new route prediction algorithm and I am trying to find a metric that informs on how well a prediction was. This metric is meant to be used offline, meaning that the goal is not to measure the quality of the prediction when it is need in real time. Instead, We are given a set $R=\{r_1,r_2,...r_{|R|}\}$ of routes that occurred in the past and for each $r_i\in R$ we take a small prefix of $r_i$ and provide it …
I tried out a few models on a highly imbalanced sample (~2:100) where I can get decent AUC from ROC (test sample). But when I plot precision-recall (test sample), it looks horrible. Kind of like the worse PR curve in box (d). This article contains the picture from below and describes that ROC is better suited since it is invariant to class distribution. My question is if there's anything than can be done to improve precision-recall?
I think I know the answer to this question but I am looking for a sanity check here: Is it appropriate to use z-test scores in order to evaluate the performance of my model? I have a binary model that I have developed with a NN in Keras. I know the size of my (equally balanced) training set and it has a proportion of 0.5 (duh!). I know that with my business use case, false-positives are financially expensive so I'm …
I m running experiments using benchmark datasets with auto-sklearn to see how its performance is different to the standard sklearn library, Since automl does an exhaustive search over parameters and sklearn has to be manually tuned. what could be the essential criteria to judge the performance between these two libraries