Measuring performance of customer purchase predictions
My goal is to develop a model that predicts next customer purchases in USD (Update: During the time period of the dataset, if no purchase was made by the customer, the next purchase label is set to zero). I am trying to determine what would be the most effective metric for measuring the model's performance.
Results looks like so:
| y_true_usd | y_predicted_usd |
|---|---|
| 1.2 | 0.8 |
| 0 | 0.3 |
| 0 | 1.1 |
| 0 | 0 |
| 0 | 0.1 |
| 5.3 | 4.3 |
First I thought about going with RMSE, but since most of my customers do not place an order, RMSE tends to obscure errors due to the rarity of paying users (Model predicted mostly 0 and did a poor job predicting purchases). My next step was to bin the customers into 5 groups and use quadratic cohen's kappa metric to measure the performance. The Kappa metric worked well and reflected models with bad performance, however, I was forced to bin the customers.
Which would be a good metric for measuring the model's performance without binning the customers?
Update: looking for a single metric that will emphasise the accuracy of predicting the right amount of USD within an imbalanced dataset and will help me to decide if a new model is better than the previous one.
Topic rmse imbalanced-data metric predictive-modeling
Category Data Science