Hyperparameter tuning of neural networks using Bayesian Optimization

One of the assumptions for finding good hyperparameters using Bayesian optimization (GP) is that the unknown function is smooth. Is this assumption valid for neural networks or at least for most of the neural networks? Can we find any reference?

Topic gaussian-process hyperparameter-tuning bayesian bayesian-networks hyperparameter

Category Data Science


Neural networks are optimized by gradient descent which assumes the loss function for parameters is a differentiable function, in other words smooth. Given the nature of the differentiable loss function, Bayesian Optimization could be used for neural networks hyperparameter optimization. In fact, gradient descent can be used to learn the hyperparameters themselves as evidenced in the paper - "Gradient Descent: The Ultimate Optimizer".

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.