Does ensemble (bagging, boosting, stacking, etc) always at least increase performance?

Ensembling is getting more and more popular. I understand that there are in general three big fields of ensembling, bagging, boosting and stacking.

My question is that does the ensembling always at least increase the performance in practice? I guess mathematically, it is not true. I am jsut asking a real life situation.

For example, I could train 10 base learner, and then stack them with another learner, which is at 2nd level. Does this 2nd-level learner always outperform the best of the base learners in practice?

Topic ensemble-modeling self-study

Category Data Science


In practice, when it comes to stacking, almost all the time (I explain the reasons why in: Why does stacking work ?) but a summary could be the following:

The reason is that, if you have one model which is good, and all the others are bad, your "second stage" model would put all the weights on the best model and possibly ignore the others.

Of course, it does not always happens like this and I have seen cases where more feature extraction was better than stacking models (the number of classes was particularly high though).

The mathematical making stacking is so effective are:

  • Convexity of the loss (penalty) functions. If they are convex (as it often happens), the Jensen's inequality gives a good intuition why the mean of the predictions of various models reduces the error
  • More general decision boundaries. The graphs below show the decision boundaries of two models, and the stacked version of these two models:

enter image description here

enter image description here

enter image description here

As you can see, the last graph seem to appear to a more general class of functions than the first two graphs, allowing to learn more general boundaries


If your individual classifiers are better than random guessing , i.e their error rate is less than 0.5 , then ensemble of these classifiers will lead to an increase in performance , i.e , a drop in error rate.

You can refer to Combining Different Models for Ensemble Learning chapter in Sebastian Raschka - Python Machine Learning Book for a mathematical understanding of the same.


The short answer is no.

I have worked on several projects that evaluated an ensemble of several classifiers versus the classifies themselves. In some cases the precision and recall was better with the ensemble, but more often, it was not. That's not to say it's not worth investigating. But sometimes, there is one model that does a reasonable job of classifying data, but it can get drowned out in an ensemble. Perhaps a weighted ensemble might improve the results, but its not a clear cut approach to improving the performance.

In practice, I would try several models and then try an ensemble of the models. If the ensemble is the best, however you define best, go with it. But sometimes it is easier to just pick the best base model and then figure out how to tune that model.


Under Ensemble you can use Majority Votes, Average, Weights etc to get the final outcome from Ensemble model. To understand it better you can go through this Link, explained well by Alexander.

Now, let us consider that you have 3 models which has an accuracy of 65-70%. Now by stacking these 3 models there is very high chance that you models accuracy would increase. In another scenario you have 3 models model-1: 95%, model-2: 55%, model-3: 45% accuracy, then if you stack them then there is a very good chance it can worsen the result.

Conclusion, it all depends on the individual models performance, Ensemble performs well when you combine moderately performing models.

Technically there is no proof saying that this method is suitable for this scenario but trail and error might help you to get good results. It is subjective to the business scenario. Similarly, for bagging and boosting.

In my experience with Bagging when the model accuracy is bad, I tried using bagging to fit the data better but EOD training accuracy(20% to 10% approx) was decreased but test accuracy was worsened(11% to 20% approx). So, you have to decide which suits your business problem better and take it forward.


As you said, you cannot prove mathematically that esembling increases performance, but it generally does. That's reason why gradient boosting and random forests are so popular in kaggle competitions, because they outperform what a decision tree can learn in many ways.

As a curiosity, even Neural Networks can be used as "weak" learners, as can be seen in https://arxiv.org/abs/1704.00109. So, ensembling is a very powerful technique that can be applied in many areas of machine learning. The main problem is that ensembles are not easily interpretable, being way more black-boxy than its weak learners.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.