Why would one crossvalidate the random state number?

Still learning about machine learning, I've stumbled across a kaggle (link), which I cannot understand.

Here are lines 72 and 73:

parameters = {'solver': ['lbfgs'], 
              'max_iter': [1000,1100,1200,1300,1400,1500,1600,1700,1800,1900,2000 ], 
              'alpha': 10.0 ** -np.arange(1, 10), 
              'hidden_layer_sizes':np.arange(10, 15), 
              'random_state':[0,1,2,3,4,5,6,7,8,9]}
clf = GridSearchCV(MLPClassifier(), parameters, n_jobs=-1)

As you can see, the random_state parameter is been tested across 10 values.

What is the point of doing this?

If one model perform better with some random_state, does it make any sense to use this particular parameter on other models?

Topic mlp randomized-algorithms scikit-learn python

Category Data Science


I personally think that the general idea of optimising your model with different random seeds is not a good idea. There are many other, more important, aspects of the modelling process that you can worry about, tweak and compare before spending time on the effects of random initialisation.

That being said, if you just want to test the effect of random initialisation of model weights on a final validation metric, this could be an approach to do so. Kind of the reverse argument to my point above. If you can show for different random seeds (ceteris paribus: with all other parameters equal) that the final model performs differently, it shows maybe that their is either inconsistency in the model, or a bug in the code even. I would not expect a well-validated model to give hugely differing results if being run with a different random seed, so if it does, it tells me something weird is going on!

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.