Measuring performance of customer purchase predictions

My goal is to develop a model that predicts next customer purchases in USD (Update: During the time period of the dataset, if no purchase was made by the customer, the next purchase label is set to zero). I am trying to determine what would be the most effective metric for measuring the model's performance.

Results looks like so:

y_true_usd y_predicted_usd
1.2 0.8
0 0.3
0 1.1
0 0
0 0.1
5.3 4.3

First I thought about going with RMSE, but since most of my customers do not place an order, RMSE tends to obscure errors due to the rarity of paying users (Model predicted mostly 0 and did a poor job predicting purchases). My next step was to bin the customers into 5 groups and use quadratic cohen's kappa metric to measure the performance. The Kappa metric worked well and reflected models with bad performance, however, I was forced to bin the customers.

Which would be a good metric for measuring the model's performance without binning the customers?

Update: looking for a single metric that will emphasise the accuracy of predicting the right amount of USD within an imbalanced dataset and will help me to decide if a new model is better than the previous one.

Topic rmse imbalanced-data metric predictive-modeling

Category Data Science


Very interesting question.

Brian provides a good insight about stacking a classification and a regression model. You may:

  1. tune your classification model (with appropriate metric as PR AUC)
  2. set thresholds for classification
  3. train a regression model on the positively-classified one to predict the size of the purchases - if that matters.

In this context, you may look at the ability of your model to reconstruct the total (dollar?) volume of sales - and start iterating. It could be a sensible and easy to communicate metric to start with.

You may still do binning at the end of the process (total sales reconstruction error) to deliver metrics on different subcategories (e.g. inexpensive items).


One way is to frame the problem is as a hierarchical series of separate models.

First - fit a binary classification model that predicts purchase / not purchase. Those classes might be imbalanced so use precision, recall, or F score (do not use accuracy).

Second - if the first model predicts purchase, then fit a separate regression model for amount of purchase. Often times mean absolute error (MAE) is used for price since it is in more interpretable than than Root Mean Square Error (RMSE).


Generally, the right loss function is the one that is in dollars: the actual value to your employer of the prediction error. So the answer to your question depends on what use will be made of your model.

How can the "next customer purchase" be 0? There's a difference between "customer will never buy" and "customer has not bought yet". Maybe it is more useful to predict time to next customer purchase than amount.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.