model-evaluations

Validating classification results

mr x

2022年5月23日 09:14

I created a model for only 2 classes and the classification report was: Although accuracy looks good, I don't think this model is good. The original data has 522 records of class 1 and 123 of class 2. So, I think that the model is guessing for the most common (class 1). When I applied the model on the original data, it was predicted 585 class 1 and 60 class 2. When I balanced the classes, the results were: The …

Topic: model-evaluations classification

Category: Data Science

Cluster Evaluation with Jaccard and Rand Index

Mirko

2022年5月22日 19:00

I've clusterized my data according to 3 criteria in 3 groups. I used kmeans to obtain those cluster so the label for each cluster is random and changes at each script run. To evaluate the consistency of my clusters I decided to use Jaccard index but I can't understand how to apply it properly. Let's say I have this data where alpha beta and gamma are the 3 methods, and the Cluster Index is the value returned by K-means for …

Topic: model-evaluations jaccard-coefficient visualization python clustering

Category: Data Science

Song playlist recommendation system

dom

2022年5月10日 09:58

I want to build a recommender system to suggest similar songs to continue a playlist (similar to what Spotify does by recommending similar songs at the end of a playlist). I want to build two models: one based on collaborative filtering and another one, a content-based model, to compare their results and choose the best one. Now, I have two questions: Where can I find a dataset with useful data for this type of work? How can I measure the …

Topic: model-evaluations machine-learning-model dataset python recommender-system

Category: Data Science

Meaningfully compare target vs observed TPR & FPR

Alexandru Dinu

2022年4月28日 10:26

Suppose I have a binary classifier $f$ which acts on an input $x$. Given a threshold $t$, the predicted binary output is defined as: $$ \widehat{y} = \begin{cases} 1, & f(x) \geq t \\ 0, & f(x) < t \end{cases} $$ I then compute the $TPR$ (true positive rate) and $FPR$ (false positive rate) metrics on the hold-out test set (call it $S_1$): $TPR_{S_1} = \Pr(\widehat{y} = 1 | y = 1, S_1)$ $FPR_{S_1} = \Pr(\widehat{y} = 1 | y …

Topic: binary-classification model-evaluations mlops

Category: Data Science

Bias-variance trade-off and model evaluation

ado sar

2022年4月15日 04:55

Suppose that we have train a model (as defined by its hyperparameters) and we evaluated it on a test set using some performance metric (say $R^2$). If we now train the same model (as defined by its hyperparameters) on a different training data we will get (probably) a different value for $R^2$. If $R^2$ depends on the training set, then we will obtain a normal distribution around a mean value for $R^2$. Shouldn't therefore average the $R^2$ from the various …

Topic: model-evaluations evaluation machine-learning

Category: Data Science

Is data leakage giving me misleading results? Independent test set says no!

PeMADS

2022年4月14日 08:07

TLDR: I evaluated a classification model using 10-fold CV with data leakage in the training and test folds. The results were great. I then solved the data leakage and the results were garbage. I then tested the model in an independent new dataset and the results were similar to the evaluation performed with data leakage. What does this mean? Was my data leakage not relevant? Can I trust my model evaluation and report that performance ? Extended version: I'm developing …

Topic: model-evaluations data-leakage overfitting model-selection machine-learning

Category: Data Science

How is model evaluation and re-training done after deployment without ground truth labels?

sangstar

2022年4月1日 00:00

Suppose I deployed a model by manual labeling the ground truth labels with my training data, as the use case is such that there's no way to get the ground truth labels without humans. Once the model is deployed, if I wanted to evaluate how the model is doing on live data, how can I evaluate it without sampling some of that live data, that doesn't come with ground truth labels, and manually giving it the ground truth labels? And …

Topic: model-evaluations mlops training

Category: Data Science

Is there a Mean Average Recall for Item Retrieval/ Recommendation Systems?

VaM

2022年3月17日 12:49

Mean Average Precision for Information retrieval is computed using Average Precision @ k (AP@k). AP@k is measured by first computing Precision @ k (P@k) and then averaging the P@k only for the k's where the document in position k is relevant. I still don't understand why the remaining P@k's are not used, but that is not my question. My question is: is there an equivalent Mean Average Recall (MAR) and Average Recall @ k (AR@k)? Recall @ k (R@k) is …

Topic: model-evaluations evaluation information-retrieval recommender-system

Category: Data Science

Repeatability tests for machine learning models (in the sense of measurement system analysis)

abaghb

2022年3月11日 08:15

For analyzing a machine learning model, we usually calculate the model performance metrics (such as accuracy...) and during validation step make sure that the model has not overfitted. We can consider a machine learning model (for example a machine vision model) that is deployed to an industrial system that performs a classification (e.g., defect detection) task as a measurement device. From this point of view, I would like to know if performing "measurement system analysis" and specifically repeatability are necessary. …

Topic: model-evaluations model-selection computer-vision evaluation machine-learning

Category: Data Science

Comparison of performance of regression models for multi-regression tasks

Mario

2022年3月1日 12:15

I have a sample time-series dataset (23, 14291) a pivot table count for 24hrs for some users. After pre-processing, I have a dataset with (23, 200) shape. I filtered some of the columns/features which don't have a time-series based nature to reach/keep meaningful columns/features by PCA method to keep those with a high amount of data variance or correlation matrix to exclude highly correlated columns/features. I took advantage of MultiOutputRegressor() and predicted all columns for a certain range of time …

Topic: model-evaluations multi-output r-squared regression machine-learning

Category: Data Science

Uncertainty about shape of ROC curve

lazarea

2022年2月27日 17:21

I am working on a binary classification and the plotted ROC curves that I am using for evaluation together with AUC, have seemed strange to me. Here is an example. I understand that ROC is a visual representation of the true positive rate versus the false positive rate. When plotting the confusion matrix I can see there are significant number of false negatives and false positives alike: I fail to understand how it is possible that the ROC curve only …

Topic: model-evaluations machine-learning-model roc machine-learning

Category: Data Science

Baseline result is much better than state-of-the-art model

nuht

2022年2月22日 14:19

I am researching about Deep Learning based Intrusion Detection System. I found a paper on a well-known journal, which is considered as a state-of-the-art method in this research area, because it got many citations. In the paper, they proposed using Inception Resnet v4 to solve the problem and got the lowest error rate, compared to other studies. I am developing a new method using their data pre-processing idea. First, I built a baseline, which is a very simple and shallow …

Topic: model-evaluations convolutional-neural-network deep-learning

Category: Data Science

Metrics in pediction different than evaluation

Hitesh Somani

2022年2月11日 05:13

In general when you have already evaluated your model on unseen data (test set) and its RMSE is different than predictions RMSE, is it ok ? How much difference is fine and how to know that ?

Topic: model-evaluations predictive-modeling machine-learning

Category: Data Science

Evaluation Metric for Imbalanced and Ordinal Classification

Fabio Magarelli

2022年2月4日 18:03

I'm looking for an ML evaluation metric that would work well with imbalanced and ordinal multiclass datasets: Imagine you want to predict the severity of a disease that has 4 grades of severity where 1 is mild and 4 represent the worse outcome. Now, this dataset would realistically have the vast majority of patients in the mild zone (classes 1 or 2) and fewer in classes 3 and 4. (Imbalanced/skewed dataset). Now in the example, a classifier that predicts a …

Topic: model-evaluations multiclass-classification class-imbalance evaluation scikit-learn

Category: Data Science

Choose ROC/AUC vs. precision/recall curve?

lazarea

2022年1月8日 17:56

I am trying to get a clear understanding on various classification metrics, including knowing when to choose ROC/AUC as opposed to opting for the Precision/Recall curve. I am reading Aurélien Géron's Hands-On Machine Learning with Scikit-Learn and TensorFlow book (page 92), where the following is stated: Since the ROC curve is so similar to the precision/recall (or PR) curve, you may wonder how to decide which one to use. As a rule of thumb, you should prefer the PR curve …

Topic: model-evaluations metric classification machine-learning

Category: Data Science

How to evaluate model accuracy at tail of empirical distribution?

Łukasz Czop

2021年12月22日 08:10

I am making a nonlinear regression on stationary dependent variable and I want to precisely forecast extreme values of this variable. So when my model predicts extreme values I want them to be highly accurate. Less extreme forecasts (eg. positioned near mean) don't need to be "as much" accurate. What are some useful metrics with favorable statistical properties, used to compare multiple models when tail accuracy matters?

Topic: model-evaluations cross-validation predictive-modeling

Category: Data Science

Quantitative measure of the smoothness of learning curves

Fallen Apart

2021年12月3日 03:05

$\DeclareMathOperator{\loss}{loss}$ $\DeclareMathOperator{\AvgVar}{AvgVar}$ Lat's say we have some deep learning task. We have our model and two sets of hyperparameters $A$ and $B$. We train both systems for 10000 mini-batches and we obtain two learning curves (losses on these train batches). Is there any quantitative measure of the smoothness of the learning curve? I saw few times in the articles that the authors just overlap two curves to show that one is smoother then the other, but obviously it would be …

Topic: model-evaluations training deep-learning

Category: Data Science

How can i adapt accuracy metric for multiclass classification?

Maths12

2021年11月16日 09:30

I have a problem which is multiclass e.g. That is 4 classes. I would like a custom metric to assess the model where only if class 3 is predicted as class 2 and class 2 is predicted as class 3 (i.e. those in the middle) then it is penalized less. How can i do this by adapting the sklearn accuracy_score metric of similar? e.g. comparing: predicted_labels = [1,3,0,0,2..] actual = [0,0,2,1,3,3...]

Topic: model-evaluations metric scoring accuracy machine-learning

Category: Data Science

How to calculate mAP for multi-label classification using output predictions?

Vivek Maskara

2021年11月14日 00:06

I have a model which predicts the actions happening in a video clip. Once I get these predictions, I use some rules(set of if-else conditions) to come up with composite labels for eg. action1_before_action2, action4_during_action5 etc. I also have the ground truth for these composite labels. How do I calculate the mAP score using my composite predictions? Notice, that for my composite predictions, I do not have sigmoid values. More details I have an action classification model that outputs the …

Topic: model-evaluations evaluation

Category: Data Science

Assess feature importance in Keras for one-hot-encoded categorical features

Jivan

2021年10月12日 11:08

An important aspect of tuning a model is assessing feature importance. In Keras, how to assess the importance of a categorical feature which is one-hot encoded? E.g. if a categorical feature is ice_cream_colour with a cardinality of 12 then I can assess the individual importances of ice_cream_colour_blue, ice_cream_colour_red, etc, but how to do it for the entire ice_cream_colour feature? A naïve approach would be to sum all individual importances, but this assumes that the relationship between distinct feature importances is …

Topic: model-evaluations feature-importances keras neural-network

Category: Data Science

About