Why does hyperparameter tuning occur on validation dataset and not at the very beginning?

Despite doing/using it a few times, I'm still slightly confused by the use of a validation set for hyper parameter tuning.

As far as I can tell, I choose a model, train it on training data, assess performance on training data, then do hyper parameter tuning assessing model performance on validation data, then choose the best model and test this on test data.

In order to do this, I basically need to pick a model at random for training data. What I don't understand is I don't know which model is going to be best at the start anyway. Let's say I think neural networks and random forests may be useful for my problem. So why don't I start searching with a general e.g. Neural Network architecture, random forest architecture, and from the very beginning, assess which model is best on a small portion of data varying all hyper parameters at the start anyway.

Basically why choose a human based guess to do the training, then hyperparameter tune in validation phase? Why not start with total uncertainty, and do a broad search, assess performance of a wide range of hyperparameters from a general neural network or random forests or ... architecture, from the very beginning?

Thanks!

Topic hyperparameter-tuning hyperparameter deep-learning neural-network machine-learning

Category Data Science


You perform hyperparameter tuning using train dataset. Validation dataset is used to make sure the model you trained is not overfit. The issue here is that the model has already "seen" the validation dataset and it is possible that the model doesn't perform as expected against new/unseen data. That's why you need an additional dataset, namely test dataset.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.