What Equation is model.coef_ Derived From? (SKLearn)

Fairly simple question, but something I've been unable to understand firmly by scouring the interwebs. After running a LR model using SKlearn, one of the key outputs is coef_ , along with intercept_.

I understand that coef_ is a transformation matrix that fully describes the relationships of the model; and that performing the dot-product of the input data, with coef_ and adding intercept_ will produce the predicted values for your inputs.

My question is: What is the equation that defines coef_ for a 1st-order model? How does this change with a 2nd-order model? How does this equation change with a multi-variate model that has n-features?

I've gathered that it's something along the lines of b0 + b1x + b2x , but I don't understand how it evolves with the introduction of additional feature variables and for higher-order polynomial models.

Topic machine-learning-model linear-regression scikit-learn

Category Data Science


A linear regression line has an equation of the form $y = \beta_0 + \beta_1 x$, where $x$ is the explanatory variable and $y$ is the dependent variable. The slope of the line is $\beta_1$ (coeff), and $\beta_0$ is the intercept (the value of $y$ when $x = 0$).

The most common method for fitting a regression line is the method of least-squares. This method calculates the best-fitting line for the observed data by minimizing the sum of the squares of the vertical deviations from each data point to the line (if a point lies on the fitted line exactly, then its vertical deviation is 0). Because the deviations are first squared, then summed, there are no cancellations between positive and negative values.

Ordinary least squares (OLS) is the most common estimator. OLS estimates are commonly used to analyze both experimental and observational data. The OLS method minimizes the sum of squared residuals, and leads to a closed-form expression for the estimated value of the unknown parameter vector $β$: $\hat {\boldsymbol {\beta }}=(\mathbf {X} ^{\mathsf {T}}\mathbf {X} )^{-1}\mathbf {X} ^{\mathsf {T}}\mathbf {Y}$ where $Y$ is a vector whose ith element is the ith observation of the dependent variable, and $X$ is a matrix whose ij element is the ith observation of the jth independent variable.

NOTE Higher-order polynomial regression can be treated as an extension to linear regression by treating $x^2$, $x^3$, .. as extra features and using linear least squares with previous formula. That is one uses linear least squares on $y = \beta_0 + \beta_1 x + \beta_2 x^2 + ..$ where $x$, $x^2$, $x^3$, .. are assumed as different features.

In statistics, polynomial regression is a form of regression analysis in which the relationship between the independent variable x and the dependent variable y is modelled as an nth degree polynomial in x. Polynomial regression fits a nonlinear relationship between the value of x and the corresponding conditional mean of y, denoted E(y |x). Although polynomial regression fits a nonlinear model to the data, as a statistical estimation problem it is linear, in the sense that the regression function E(y | x) is linear in the unknown parameters that are estimated from the data. For this reason, polynomial regression is considered to be a special case of multiple linear regression.

References:

  1. http://www.stat.yale.edu/Courses/1997-98/101/linreg.htm
  2. https://en.wikipedia.org/wiki/Linear_least_squares
  3. https://en.wikipedia.org/wiki/Polynomial_regression

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.