What is the difference between model hyperparameters and model parameters?

Question

What is the difference between model hyperparameters and model parameters?

minerals

2022年2月1日 05:51

I have noticed that such terms as model hyperparameter and model parameter have been used interchangeably on the web without prior clarification. I think this is incorrect and needs explanation. Consider a machine learning model, an SVM/NN/NB based classificator or image recognizer, just anything that first springs to mind.

What are the hyperparameters and parameters of the model?
Give your examples please.

Topic hyperparameter parameter language-model machine-learning

Category Data Science

enterML · Accepted Answer · 2022年2月1日 05:51

Hyperparameters and parameters are often used interchangeably but there is a difference between them. You can call something a 'hyperparameter' if it cannot be learned within the estimator directly. However, 'parameters' is a more general term. When you say 'passing the parameters to the model', it generally means a combination of hyperparameters along with some other parameters that are not directly related to your estimator but are required for your model.

For example, suppose you are building a SVM classifier in sklearn:

from sklearn import svm
X = [[0, 0], [1, 1]]
y = [0, 1]
clf = svm.SVC(C =0.01, kernel ='rbf', random_state=33)
clf.fit(X, y)

In the above code an instance of SVM is your estimator for your model for which the hyperparameters, in this case, are C and kernel. But your model has another parameter which is not a hyperparameter and that is random_state.

Prhld · Accepted Answer · 2021年4月17日 14:59

Model parameters are estimated based on the data during model training and model hyperparameters are set manually and are used in processes to help estimate model parameters.

Model hyperparameters are often referred to as parameters because they are the parts of the machine learning that must be set manually and tuned.

Basically, parameters are the ones that the “model” uses to make predictions etc. For example, the weight coefficients in a linear regression model. Hyperparameters are the ones that help with the learning process. For example, number of clusters in K-Means, shrinkage factor in Ridge Regression. They won’t appear in the final prediction piece, but they have a large influence on how the parameters would look like after the learning step.

Reference

Lakshmi Prasad Y · Accepted Answer · 2018年6月4日 04:11

Hyper-parameters are those which we supply to the model, for example: number of hidden Nodes and Layers,input features, Learning Rate, Activation Function etc in Neural Network, while Parameters are those which would be learned by the machine like Weights and Biases.

Dynamic Stardust · Accepted Answer · 2017年9月17日 17:37

In machine learning, a model $M$ with parameters and hyper-parameters looks like,

$Y \approx M_{\mathcal{H}}(\Phi | D)$

where $\Phi$ are parameters and $\mathcal{H}$ are hyper-parameters. $D$ is training data and $Y$ is output data (class labels in case of classification task).

The objective during training is to find estimate of parameters $\hat{\Phi}$ that optimizes some loss function $\mathcal{L}$ we have specified. Since, model $M$ and loss-function $\mathcal{L}$ are based on $\mathcal{H}$, then the consequent parameters $\Phi$ are also dependent on hyper-parameters $\mathcal{H}$.

The hyper-parameters $\mathcal{H}$ are not 'learnt' during training, but does not mean their values are immutable. Typically, the hyper-parameters are fixed and we think simply of the model $M$, instead of $M_{\mathcal{H}}$. Herein, the hyper-parameters can also be considers as a-priori parameters.

The source of confusion stems from the use of $M_{\mathcal{H}}$ and modification of hyper-parameters $\mathcal{H}$ during training routine in addition to, obviously, the parameters $\hat{\Phi}$. There are potentially several motivations to modify $\mathcal{H}$ during training. An example would be to change the learning-rate during training to improve speed and/or stability of the optimization routine.

The important point of distinction is that, the result, say label prediction, $Y_{pred}$ is based on model parameters $\Phi$ and not the hyper-parameters $\mathcal{H}$.

The distinction however has caveats and consequently the lines are blurred. Consider for example the task of clustering, specifically Gaussian Mixture Modeling (GMM). The parameters set here is $\Phi = \{\bar{\mu}, \bar{\sigma} \}$, where $\bar{\mu}$ is set of $N$ cluster means and $\bar{\sigma}$ is set of $N$ standard-deviations, for $N$ Gaussian kernels.

You may have intuitively recognized the hyper-parameter here. It is the number of clusters $N$. So $\mathcal{H} = \{N \}$. Typically, cluster validation is used to determine $N$ apriori, using a small sub-sample of the data $D$. However, I could also modify my learning algorithm of Gaussian Mixture Models to modify the number of kernels $N$ during training, based on some criterion. In this scenario, the hyper-parameter, $N$ becomes part of the set of parameters $\Phi = \{\bar{\mu}, \bar{\sigma}, N \}$.

Nevertheless, it should be pointed out that result, or predicted value, for a data point $d$ in data $D$ is based on $GMM(\bar{\mu}, \bar{\sigma})$ and not $N$. That is, each of the $N$ Gaussian kernels will contribute some likelihood value to $d$ based on the distance of $d$ from their respective $\mu$ and their own $\sigma$. The 'parameter' $N$ is not explicitly involved here, so its arguably not 'really' a parameter of the model.

Summary: the distinction between parameters and hyper-parameters is nuanced due to the way they are utilized by practitioners when designing the model $M$ and loss-function $\mathcal{L}$. I hope this helps disambiguate between the two terms.

Manju Savanth · Accepted Answer · 2017年7月16日 04:48

In simplified words,

Model Parameters are something that a model learns on its own. For example, 1) Weights or Coefficients of independent variables in Linear regression model. 2) Weights or Coefficients of independent variables SVM. 3) Split points in Decision Tree.

Model hyper-parameters are used to optimize the model performance. For example, 1)Kernel and slack in SVM. 2)Value of K in KNN. 3)Depth of tree in Decision trees.

minerals · Accepted Answer · 2016年9月27日 08:43

In addition to the answer above.

Model parameters are the properties of the training data that are learnt during training by the classifier or other ml model. For example in case of some NLP task: word frequency, sentence length, noun or verb distribution per sentence, the number of specific character n-grams per word, lexical diversity, etc. Model parameters differ for each experiment and depend on the type of data and task at hand.

Model hyperparameters, on the other hand, are common for similar models and cannot be learnt during training but are set beforehand. A typical set of hyperparameters for NN include the number and size of the hidden layers, weight initialization scheme, learning rate and its decay, dropout and gradient clipping threshold, etc.

What is the difference between model hyperparameters and model parameters?

About