Chi-square as evaluation metrics for nonlinear machine learning regression models

I am using machine learning models to predict an ordinal variable (values: 1,2,3,4, and 5) using 7 different features. I posed this as a regression problem, so the final outputs of a model are continuous variables. So an evaluation box plot looks like this: I experiment with both linear (linear regression, linear SVMs) and nonlinear models (SVMs with RBF, Random forest, Gradient boosting machines ). The models are trained using cross-validation (~1600 samples), and 25% of the dataset is used …
Category: Data Science

XGBClassifier's predictions are not probabilities with objective='binary:logistic'

I am using a XGBoost's XGBClassifier, a binary 0-1 target, and I am trying to define a custom metric function. It supposedly receives an array of predictions and a DMatrix with the training set according to the XGBoost Tutorials. I have used objective='binary:logistic' in order to get probabilities but the prediction values passed to the custom metric function are not between 0 and 1. They can be like between -3 and 5 and the range of values seems to grow …
Category: Data Science

What is the relationship between the accuracy and the loss in deep learning?

I have created three different models using deep learning for multi-class classification and each model gave me a different accuracy and loss value. The results of the testing model as the following: First Model: Accuracy: 98.1% Loss: 0.1882 Second Model: Accuracy: 98.5% Loss: 0.0997 Third Model: Accuracy: 99.1% Loss: 0.2544 My questions are: What is the relationship between the loss and accuracy values? Why the loss of the third model is the higher even though the accuracy is higher?
Category: Data Science

AUC-ROC for Multi-Label Classification

Hey guys I'm currently reading about AUC-ROC and I have understood the binary case and I think that I understand the multi-classification case. Now I'm a bit confused on how to generalize it to the multi-label case, and I can't find any intuitive explanatory texts on the matter. I want to clarify if my intuition is correct with an example, let's assume that we have some scenario with three classes (c1, c2, c3). Let's start with multi-classification: When we're considering …
Category: Data Science

Average Precision if Target Class is Not in Evaluation

Suppose I have 5 classes, denoted by 1, 2, 3, 4, and 5, and this is used in object detection. When evaluating an object detection performance, suppose I have classes 1, 2, and 3 present, but classes 4 and 5 are not present in the targeted values. Will each of classes 4 and 5 have average precision of 0 (due to its precision being zero as no true positives can be identified)? Or perhaps there are other considerations to take …
Category: Data Science

Why calculating how much removed sentences with most contributing words to the result helps to show that a model is "*faithful*"?

I don't understand how the calculation score taking out the sentences where the words contribute the most of to the result helps to show to what extent a model is "faithful" to a reasoning process. Indeed, a faithfulness score was proposed by Du et al. in 2019 to verify the importance of the identified contributing sentences or words to a given model’s outputs. It is assumed that the probability values for the predicted class will significantly drop if the truly …
Category: Data Science

Which metrics for evaluating a recommender system with implicit data?

I am currently in the process of creating a recommender system. This recommender system works with a neural network and then searches for the closest neighbors and thus gives recommendations for a user. The data is implicit. I only have in the data which products a user has bought.On the basis of this data, I create the recommendations. What are the best metrics to evaluate this recommender system with implicit data? Can I evaluate the model and then the search …
Category: Data Science

What metrics work well in unbalanced assemblies?

I wanted to know if there are some metrics that work well when working with an unbalanced dataset. I know that accuracy is a very bad metric when evaluating a classifier when the data is unbalanced but, what about for example the Kappa index? Best regards and thanks.
Category: Data Science

What is the correct way to compute lift in lift charts

How is "lift" computed? i was reading about "Gain and lift charts" in data science. I picked the following example from https://www.listendata.com/2014/08/excel-template-gain-and-lift-charts.html I am clear on how the gain values are computed. Not clear about lift values are computed? (last column in table)
Category: Data Science

Metrics for presenting RNN/LSTM result

I am working on two different architectures based on the LSTM model to predict the user's next action based on the previous actions. I am wondering, what is the best way to present the result? Is it okay to present only the prediction accuracy? Or Should I use other metrics? I found a paper using top_K_accuracy whereas on a different paper I found AUC or ROC. Overall, I would like to know what is the state of the art of …
Category: Data Science

specificity for 3 class

I was reading an answer in qoura to calculate the specificity of a 3 class classifier from a confusion matrix. In the below answer https://www.quora.com/How-do-I-get-specificity-and-sensitivity-from-a-three-classes-confusion-matrix For below 3-class confusion matrix, The below is a screenshot from the answer. the sensitivity and specificity would be found by calculating the following: My question is in numerator for specificity which is false negatives shouldnt it be 4 terms. Eg if we are calculating w/r 1, Then in the table n22,n33,n32 and n23 were …
Category: Data Science

How to re-train a model from false positives

I'm still a bit new to deep learning. What I'm still struggling, is what is the best practice in re-training a good model over time? I've trained a deep model for my binary classification problem (fire vs non-fire) in Keras. I have 4K fire images and 8K non-fire images (they are video frames). I train with 0.2/0.8 validation/training split. Now I test it on some videos, and I found some false positives. I add those to my negative (non-fire) set, …
Category: Data Science

How to measure accuracy of a route prediction

I developed a new route prediction algorithm and I am trying to find a metric that informs on how well a prediction was. This metric is meant to be used offline, meaning that the goal is not to measure the quality of the prediction when it is need in real time. Instead, We are given a set $R=\{r_1,r_2,...r_{|R|}\}$ of routes that occurred in the past and for each $r_i\in R$ we take a small prefix of $r_i$ and provide it …
Category: Data Science

Can Precision-Recall be improved for imbalanced sample?

I tried out a few models on a highly imbalanced sample (~2:100) where I can get decent AUC from ROC (test sample). But when I plot precision-recall (test sample), it looks horrible. Kind of like the worse PR curve in box (d). This article contains the picture from below and describes that ROC is better suited since it is invariant to class distribution. My question is if there's anything than can be done to improve precision-recall?
Category: Data Science

Using Z-test score to evaluate model performance

I think I know the answer to this question but I am looking for a sanity check here: Is it appropriate to use z-test scores in order to evaluate the performance of my model? I have a binary model that I have developed with a NN in Keras. I know the size of my (equally balanced) training set and it has a proportion of 0.5 (duh!). I know that with my business use case, false-positives are financially expensive so I'm …
Category: Data Science

What would be the main and essential criteria for evaluating auto-sklearn library ?

I m running experiments using benchmark datasets with auto-sklearn to see how its performance is different to the standard sklearn library, Since automl does an exhaustive search over parameters and sklearn has to be manually tuned. what could be the essential criteria to judge the performance between these two libraries
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.