Multi-country model or single model

Question

Multi-country model or single model

David Masip

2020年8月1日 08:19

I am working on a ML model to be deployed in a product operating in many countries.

The issue that I am having is the following: should I

train one model and serve it for all countries?
train a model per country and serve each model in its country?

I've faced this problem several times, and to me, there's a trade-off in the learning: in the first case, the model has more data to learn, and it'll be more robust (also, the solution is simpler). In the second case, I'll have a more tailored model to each country, and will be able to see effects that are specific to that country.

I'm very interested in knowing if there's an intermediate solution - a general model with some country-specific fine-tuning that can see all the data but also specialize in each specific country. If I were to use Neural Networks, this fine-tuning is natural - you train some epochs with all the data, then the last epochs with each specific country. I am wondering if something similar can be done in Linear regression models and Xgboost, which are the models I generally use.

Is there any literature on this? I think it is kind of a generic topic and there should be some.

Topic data-product xgboost linear-regression deep-learning machine-learning

Category Data Science

etiennedm · Accepted Answer · 2020年7月25日 14:17

I don't think there is a unique rule to answer that. It strongly depends on how pertinent is the country information regarding other input data and what you want to predict.

It is possible to face cases where similar input data in different countries lead to different outputs. In that case, it would be mandatory either to add the country as input information or to create a model per country.

In other cases, the country information would not lead to any improvement in the model (so no need to do a specific model per country).

Finally, there are cases for which you will find global information (whatever country) and specific information per country. In that case, there are multiple approaches to deal with it. The first and most common is to include the country as input of your global model. As @Fnguyen mentioned, why dealing with the country differently than other inputs?

Update

If you think that the country has a specific impact on the prediction, here are a non-exhaustive list of how you could create models that deal with your assumption:

Using transfer learning, train a global model to capture general trends, then train the same model on the specific countries starting from the global. You may still not be able to capture specific country effects.
Using boosting method: train a first classifier on all countries and then train a model per country that does boosting on the output of the global trained classifier. In that way you will keep the global trends and then use specific country information.
Using bagging method: train some classifier(s) on all countries, others on a specific country, and then you can combine them parellel to each others in one big model per country.
A specific example using NNs: train a global model and one model per country. Then use per country a model that combines both the global and the specific you trained before and only retrain the 'head'. For instance if using DNNs / CNNs, you only retrain the green part of the final model:

The list is non-exhaustive and you must have a good reason to use such approaches which give more importance to the country information. Normally, the machine learning algorithms would do it on their own.

noe · Accepted Answer · 2020年7月21日 15:51

I think the only objective criterion to decide this is to simply compare the performance of the candidate approaches over the validation data.

That being said, if I were to blindly choose the approach upfront without any other information, I would choose a single model, where the model is aware of the country of each piece of data. This would let it model the peculiarities of each country while profiting from the combined training data.

If you have reasons to believe this is harming the global performance because of the intrinsic difference of some countries, you can apply boosting and therefore let the performance of the classifiers speak by itself.

David Masip · Accepted Answer · 2020年7月21日 10:14

I think that @mirimo's idea of having a regularized model as an offset is very interesting.

My proposal is a slight variation where you ensure you don't overfit.

The idea is, to obtain the model for group $j$, train a model with all groups except $j$ and use that model as an offset to the model for group $j$. This way, we can have a complex model for the general behavior and still not train on the same target twice, thus having a more stable model.

The downside is that this is way slower, as, if there are $J$ groups, it takes around $J$ times more than the regular training.

Edit

On top of @Carlos Mougan proposal, we can:

Train a global model
Train a specific model for each country
Ensemble both models The ensemble can have some shrinkage, like: $$y_{final} = \frac{y_{global} \cdot m + y_{country} \cdot n_{country}}{m + n_{country}} $$ where $y_{country}$ is the prediction of the country-specific model, $y_{global}$ the global prediction, $n_{country}$ the number of samples in a country and $m$ an hyperparameter to tune, the higher the $m$ the more we trust on the global model.

I think this shrinkage is very relevant to the problem.

Gaurav Koradiya · Accepted Answer · 2020年7月20日 11:22

First of all in this use case i see we want our model to learn data or understand the data frist. This is a similar problem of natural language processing like one always try to make model to learn it from data. Here we can do little tricky things where we can declare country as target variable and rest of features as input variable. We can at least train model to learn mapping for input features to country so model might have an understanding of the representation of input with respect to the country. we can use this model as ensemble modeling suggested above. I think it would give little accuracy increment and cost-effective solution also.

Carlos Mougan · Accepted Answer · 2020年7月20日 09:48

In the paper of Hinton - Distilling the knowledge of Neural Networks, the following is mentioned (Section 5) when defining specialist models:

When the number of classes is very large, it makes sense for the cumbersome model to be an ensemble that contains one generalist model trained on all the data and many “specialist” models, each of which is trained on data that is highly enriched in examples from a very confusable subset of the classes (like different types of mushroom).

What they do is they use a general model first and then a specialist model each to focus on a different subset of the classes.

You could consider your problem something similar, instead of a specialist to classes, and specialist in countries. This way you could build a country(cluster of countries) specific ensemble of models.

Fnguyen · Accepted Answer · 2020年7月20日 08:08

I think the most important thing you can do to bridge both assumptions is to include the country as a variable in the global model.

Should there be any country-specific effects they will simply be modeled as interactions in the global model. This is how the model deals with any other variable anyway and why should country be any different?

I think the problem is much more complicated if the data is heavily imbalanced e.g. some products are only sold in one country, etc. However this only becomes a problem at a point when training a global model is infeasible anyway.

mirimo · Accepted Answer · 2020年7月10日 12:46

I don't have the theoretical ressources to confirm this but I think it's possible to train a first model on the whole dataset with a limited degree of freedom (high regularization) and with the commmon features that will allow you to capture the global trends and then train local models on the residuals.

Multi-country model or single model

About