Why transpose of independent feature matrix is necessary in case of linear regression?
I can follow classical linear regression steps:
$Xw=y$
$X^{-1}Xw=X^{-1}y$
$Iw=X^{-1}y$
$w=X^{-1}y$
However, on implementing in Python, I see that instead of simply using
w = inv(X).dot(y)
they apply
w = inv(X.T.dot(X)).dot(X.T).dot(y)
What is the explanation of the transpositions and the two times multiplication here? I'm confused...
Topic linear-algebra linear-regression
Category Data Science