What is the most efficient method for hyperparameter optimization in scikit-learn?

Question

What is the most efficient method for hyperparameter optimization in scikit-learn?

Brian Spiering

2019年5月31日 17:57

An overview of the hyperparameter optimization process in scikit-learn is here.

Exhaustive grid search will find the optimal set of hyperparameters for a model. The downside is that exhaustive grid search is slow.

Random search is faster than grid search but has unnecessarily high variance.

There are also additional strategies in other packages, including scikit-optimize, auto-sklearn, and scikit-hyperband.

What is the most efficient (finding reasonably performant parameters quickly) method for hyperparameter optimization in scikit-learn?

Ideally, I would like working code examples with benchmarks.

Topic hyperparameter-tuning grid-search randomized-algorithms hyperparameter scikit-learn

Category Data Science

Alex L · Accepted Answer · 2019年3月14日 03:27

Optimization isn't my field, but as far as I know, efficient and effective hyper-parameter optimization these days heavily revolves around building a surrogate model. As models increase in complexity, they become a more opaque black box. This is the case for deep neural nets and presumably complex trees as well. A surrogate model attempts to regress the underlying space within that black box. Based on a variety of sampling techniques, they probe the hyper-parameter space and attempt to build a function which represents the true underlying hyper-parameter space.

Bayesian optimization focuses on the surrogate model and how this model is constructed is crucial to BO. Also crucial to BO is choosing a good loss function.

I think performance between random search and Bayesian search varies from dataset to dataset, and model to model. Bergstra & Bengio (2012) made a strong argument for random search over grid search. Shahriari et al. (2016) make a strong case for BO. Model-based Hyperband strategies can potentially perform better than BO, especially for high-dimensions, however it is purely exploration, not exploitation. This can easily result in too early of stopping. However, there have been efforts to combine Hyperband and BO.

I've had good success scikit-optimize, despite there being quite a bit unimplemented. It's easy to prototype with and can easily interface with scikit-learn.

Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13(Feb), 281-305.

Shahriari, B., Swersky, K., Wang, Z., Adams, R. P., & De Freitas, N. (2016). Taking the human out of the loop: A review of bayesian optimization. Proceedings of the IEEE, 104(1), 148-175.

TQA · Accepted Answer · 2019年3月13日 22:36

1

TQA answered at 2019年3月13日 22:36

You can take a look at auto-sklearn. That's an automated machine learning toolkit which is a direct extension of scikit-learn.

What is the most efficient method for hyperparameter optimization in scikit-learn?

About