Output of a model as additional input of another model to solve the same task

I was wondering about whether it is possible to train a ML model for a classification task with dataset D, and then train another model to solve the same classification task, which takes as input dataset D and the output of the first model. I am worried about the data partition in this case (i.e. whether the models need to be trained and tested with different data samples). If this kind of ensemble method exists, how is it called?

As further details, I have a KNN model and a random forest to predict the same label, but they are trained with different variables.

Topic ensemble-modeling machine-learning

Category Data Science


Stacking comes closest to what you describe. From the sklearn docs:

Stacked generalization consists in stacking the output of individual estimator and use a classifier to compute the final prediction. Stacking allows to use the strength of each individual estimator by using their output as input of a final estimator.

Stacking is one type of "ensamble learning" where you aim at combining several models into one final classifier. The idea is to take advantage of the ability of different models to capture different details in the data.


Do you mean using an ML model to make predictions, and then combining the original features + predictions as new features into another ML model?

Conceptually that's a bit weird, unless you think the first ML model has some limitations (e.g. systematically biased) and you want to somehow correct for it using the second one. But in that case why not just use the second architecture.

A much more common type of ensembling is simply training both models, and then do some sort of aggregation (e.g. average) on the predictions.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.