Combine several performance metrics from several datasets
We are developing and evaluating a multi knee/elbow point detection algorithm. For our evaluation, we have 200 sequences of real data. These sequences were annotated manually.
For each algorithm and sequence, we computed four different performance metrics: two variations of MSE and two custom cost functions.
The question is how can we combine the results in a summary to identify the overall best performing model?
Our solution right now is using two simple counting/voting systems The first is binary, the model that performs better in a sequence (for a specific metric) wins a vote, the other receive 0. The second uses weighted votes, similar to the first, the best algorithm wins a vote, the worst wins 0 and the other receive a fraction of a vote based on the proximity of the solution to the best one.
Is this the best method to combine the results? What other solutions exist? Is there any type of statistical significance that can be applied to this counting system?
Topic mse evaluation statistics performance
Category Data Science