Ensembling expressions

I have two models, $m_1$ and $m_2$, and I want to ensemble them into a final model. I want to be able weight one or the other more according to a grid search. There are two main ideas that come to my mind when doing so:

  • Define a family of models $m_1 \cdot a + m_2 \cdot (1 - a)$, where $0 a 1$, find the $a$ that gives the best score.
  • Define a family of models $m_1^a \cdot m_2^{1 - a}$, where $0 a 1$, find the $a$ that gives the best score.

However, in certain cases, I've seen top models in Kaggle competitions doing fairly different things, like having a final model of the form $m_1^a + m_2^b$.

My question is, what are the advantages and disadvantages of every solution? When do they work better and when do they work worse? When is the third kind of ensemble suitable, and is there any heuristic to tune $a$ and $b$?

Topic ensemble ensemble-modeling machine-learning

Category Data Science


You can make the same question with every Machine Learning algorithm, and still the answer will remain very similar.

What's the advantage of Linear regression over Decision Trees? To answer this you could define them mathematically. In your case, the mathematical definition seems easy: weighted mean or geometric mean.

When do any model works better by any other model? Give it a try in cross-validation.

Sadly, scientific methodology in Machine Learning is done by try and error. Saying what will be the value of an hyperparameter previous to fitting the model is un reliable.

You "prove" that an algorithm works in ML when you run it to through a set of datasets and it performs better than the rest.

Coming back to your question, what happens in kaggle tends to be the most technical advanced thing. So if its there, its worth giving it a try.


I agree with Brain. The solution that will work better, is the one that will fit better your data.

Please note that if you have only one parameter, you can derive the optimal value instead of doing a grid search. Your family of solution is we restricted so I don't expect a significant gain but there is no reason not to use it.


That is an empirical question. The answer will change for different models and different datasets.

The best approach would use cross validation to see which ensembling technique has the best score on the evaluation metric for the given data.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.