Linear Regression Coefficient Calculation

Question

Linear Regression Coefficient Calculation

Anand Rayudu

2022年5月23日 02:05

class LR:

    def __init__(self, x, y):
        self.x = x
        self.y = y
        self.xmean = np.mean(x)
        self.ymean = np.mean(y)
        self.x_xmean = self.x - self.xmean
        self.y_ymean = self.y - self.ymean
        self.covariance = sum(self.x_xmean * self.y_ymean)
        self.variance = sum(self.x_xmean * self.x_xmean)

    def getYhat(self, input_x):
        input_x = np.array(input_x)
        return self.intercept + self.slope * input_x    

    def getCoefficients(self):
        self.slope = self.covariance/self.variance
        self.intercept = self.ymean - (self.xmean * self.slope)
        return self.intercept, self.slope

I am using the above class to calculate intercept and slope for a Simple Linear Regression. However, I would like to tweak it to make it work for Multiple Linear Regression as well, but WITHOUT using matrix formula $(XX^T)^{-1}X^TY$.

Please suggest.

Topic linear-regression

Category Data Science

Juan Antonio Gomez Moriano · Accepted Answer · 2018年12月14日 03:58

What I have found with this kind of exercise is that it is very beneficial to code it directly in numpy at least once and really try to understand what is going on.

I solved that (for my own learning) in a kaggle kernel.

The code that I used is

def predict(my_X, my_W, my_B):
    return np.dot(my_W, my_X) + my_B


def error(y, y_hat):
    diff = sum(y - y_hat)
    squared_diff = diff ** 2
    error = (1/n) * squared_diff
    return error

def derivative(X, w, b, y):
    n = len(y)
    y_hat = predict(X, w, b)
    diff_sum = sum(y-y_hat)

    w_derivative = (2/n) * sum((y_hat - y) * X)
    b_derivative = (2/n) * sum(y_hat-y)

    return w_derivative, b_derivative


lr = 0.01
for iteration in range(0, 100):
    y_hat = predict(X, w, b)

    if iteration % 10 == 0:
        print("Iteration ", iteration, "Error", error(y, y_hat))

    W_derivative, b_derivative = derivative(X, w, b, y)

    w = w - (lr * W_derivative)
    b = b - (lr * b_derivative)

Then you can inspect the W and b variables and see for yourself what is going on :)

Michael Grogan · Accepted Answer · 2018年10月14日 13:19

While I am not sure if you need the calculations done within the class specifically, there is a relatively more simple way to extract the intercept and slope coefficients using the linear_model from sklearn and pandas if it is of use to you.

Suppose we have the following variables:

y: [-0.006,-0.001,0.015,0.017,-0.0019,-0.005]
x1: [-0.018,-0.008,0.011,0.017,-0.008,-0.002]
x2: [-0.04,-0.003,0.012,0.011,-0.004,-0.009]
x3: [-0.06,-0.007,0.3,0.09,-0.005,-0.006]

Now, let's run a linear regression using sklearn:

from pandas import DataFrame
from sklearn import linear_model
import statsmodels.api as sm

dataset = {'y': [-0.006,-0.001,0.015,0.017,-0.0019,-0.005],
                'x1': [-0.018,-0.008,0.011,0.017,-0.008,-0.002],
                'x2': [-0.04,-0.003,0.012,0.011,-0.004,-0.009],
                'x3': [-0.06,-0.007,0.3,0.09,-0.005,-0.006]       
                }

df = DataFrame(dataset,columns=['y','x1','x2','x3'])


X = df[['x1','x2','x3']]
Y = df['y']

# Regression Model
regr = linear_model.LinearRegression()
regr.fit(X, Y)

print('Intercept: \n', regr.intercept_)
print('Coefficients: \n', regr.coef_)

Once we do this, our intercept and slope coefficients are then printed:

>>> print('Intercept: \n', regr.intercept_)
Intercept: 
 0.0022491408670789535
>>> print('Coefficients: \n', regr.coef_)
Coefficients: 
 [ 0.62742415 -0.06618899  0.02384715]

Hope you find this of use if you are simply looking to extract the intercept and slope coefficients.

Linear Regression Coefficient Calculation

About