Minimum number of samples to train XGBoost without overfitting

When using Neural Networks for image processing I learned a rule of thumb: to avoid overfitting, supply at least 10 training examples for every neuron.

Is there a similar rule of thumb for classifiers such as XGBoost, presumably taking into account the number of features and estimators?

And, considering the 'curse of dimensionality' shouldn't the rule of thumb be that n_training is geometric in n_dimensions, and not linear?

Topic overfitting xgboost neural-network classification

Category Data Science


This is not only a number of sample, this is also a question of depth.

The higher depth you have, the more you're likely to overfit.

You can reduce the overfit by adding a high number of trees, that allows you to "steady" your algorithm


It's completely true that the number of examples should be related to features. But it is not only the number of features because the range of a number(max-min and count of different number) is also important. On the other hand, if you have noise you need more examples, so it's related to your dataset.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.