What is the Time Complexity of Linear Regression?

I am working with linear regression and I would like to know the Time complexity in big-O notation. The cost function of linear regression without an optimisation algorithm (such as Gradient descent) needs to be computed over iterations of the weight combinations (as a brute force approach). This makes computation time dependent on the number of weights and obviously on the number of training data.

If $n$ is the number of training data, $W$ is the number of weights and each resolution of the weight space is set to $m$ meaning that each weight will iterate through $m$ number of possible values. Then the time complexity of this linear regression is

$O(m^Wn)$

Is this correct?

Topic cost-function linear-regression regression statistics machine-learning

Category Data Science


Just to be clear (since @RUser4512 hasn't updated his answer), in linear regression you have to solve $$ (X'X)^{-1}X'Y, $$ where $X$ is a $n\times p$ matrix. Now, in general the complexity of the matrix product $AB$ is O(abc) whenever $A$ is $a\times b$ and $B$ is $b\times c$. Therefore we can evaluate the following complexities:

a) the matrix product $X'X$ with complexity $O(p^2n)$.

b) the matrix-vector product $X'Y$ with complexity $O(pn)$.

c) the inverse $(X'X)^{-1}$ with complecity $O(p^3)$,

Therefore the complexity is $O(np^2 + p^3)$.


It highly depends on the "solver" you are using.

Calling $n$ the number of observations and $p$ the number of weights, the overall complexity should be $n^2p+p^3$.

Indeed, when performing a linear regression you are doing matrices multiplication whose complexity is $n^2p$ (when evaluating $X'X$) and inverting the resulting matrix. It is now a square matrix with $p$ rows, the complexity for matrix inversion usually is $p^3$ (though it can be lowered).

Hence a theoretical complexity : $n^2p+p^3$.

Side notes

However, numerical simulations (using python's scikit library) seem to have a time complexity close to $n^{0.72} p^{1.3}$

This may be due to the fact that no implementation actually perform the full inversion (instead, the system can be solved using gradient descents) or due to the fact that there are other ways to calibrate the weights of a linear regression.

Source

An article from my blog : computational complexity of machine learning algorithms

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.