Ensemble/combining models weighted by number of observations?
Across a few different projects, I have hit a problem where I have two (or more) models:
- General-Purpose Model: A model which is based on a large amount of data not specifically relevant to my current classifier label goal, but which predict other labels using similar features.
- Cold-Start Model: A model trained on data specifically related to my current label/task, which initially starts with zero observations and goes up from there.
So then, my question: what is an appropriate way to handle this problem, assuming that I need to put a model which leverages all the available data effectively? At a heuristic level, it seems fairly clear that when the number of label-specific observations is very low, the general purpose model should be trusted. Likewise, when the cold-start model reaches high levels of observations, it should be trusted.
However, I am not clear on what approaches are reasonable to determine an efficient ensemble between those points (e.g., weighting, stacking, etc.). The problem is that most ensemble approaches appear to specifically try to combine things based on label quality or variance. But in this case, the number of specifically-relevant observations starts too small to trust that they are representative. So then, it is my feeling that the number of observations is an essential piece of information for the ensemble to combine models properly.
Does anyone know an approach designed to specifically incorporate the number of observations supporting a model into its ensemble weighting / combination? I am also open to sample weighting approaches that train a single model or add a layer to a model, provided they are memory efficient enough that you could live with having thousands of them in-memory with normal hardware (e.g., to enable multi-label classification on the same new observation).
Topic weighted-data multilabel-classification ensemble-modeling
Category Data Science