Linear Regression Coefficient Calculation

class LR:

    def __init__(self, x, y):
        self.x = x
        self.y = y
        self.xmean = np.mean(x)
        self.ymean = np.mean(y)
        self.x_xmean = self.x - self.xmean
        self.y_ymean = self.y - self.ymean
        self.covariance = sum(self.x_xmean * self.y_ymean)
        self.variance = sum(self.x_xmean * self.x_xmean)

    def getYhat(self, input_x):
        input_x = np.array(input_x)
        return self.intercept + self.slope * input_x    

    def getCoefficients(self):
        self.slope = self.covariance/self.variance
        self.intercept = self.ymean - (self.xmean * self.slope)
        return self.intercept, self.slope

I am using the above class to calculate intercept and slope for a Simple Linear Regression. However, I would like to tweak it to make it work for Multiple Linear Regression as well, but WITHOUT using matrix formula $(XX^T)^{-1}X^TY$.

Please suggest.

Topic linear-regression

Category Data Science


What I have found with this kind of exercise is that it is very beneficial to code it directly in numpy at least once and really try to understand what is going on.

I solved that (for my own learning) in a kaggle kernel.

The code that I used is

def predict(my_X, my_W, my_B):
    return np.dot(my_W, my_X) + my_B


def error(y, y_hat):
    diff = sum(y - y_hat)
    squared_diff = diff ** 2
    error = (1/n) * squared_diff
    return error

def derivative(X, w, b, y):
    n = len(y)
    y_hat = predict(X, w, b)
    diff_sum = sum(y-y_hat)

    w_derivative = (2/n) * sum((y_hat - y) * X)
    b_derivative = (2/n) * sum(y_hat-y)

    return w_derivative, b_derivative


lr = 0.01
for iteration in range(0, 100):
    y_hat = predict(X, w, b)

    if iteration % 10 == 0:
        print("Iteration ", iteration, "Error", error(y, y_hat))

    W_derivative, b_derivative = derivative(X, w, b, y)

    w = w - (lr * W_derivative)
    b = b - (lr * b_derivative)

Then you can inspect the W and b variables and see for yourself what is going on :)


While I am not sure if you need the calculations done within the class specifically, there is a relatively more simple way to extract the intercept and slope coefficients using the linear_model from sklearn and pandas if it is of use to you.

Suppose we have the following variables:

y: [-0.006,-0.001,0.015,0.017,-0.0019,-0.005]
x1: [-0.018,-0.008,0.011,0.017,-0.008,-0.002]
x2: [-0.04,-0.003,0.012,0.011,-0.004,-0.009]
x3: [-0.06,-0.007,0.3,0.09,-0.005,-0.006]

Now, let's run a linear regression using sklearn:

from pandas import DataFrame
from sklearn import linear_model
import statsmodels.api as sm

dataset = {'y': [-0.006,-0.001,0.015,0.017,-0.0019,-0.005],
                'x1': [-0.018,-0.008,0.011,0.017,-0.008,-0.002],
                'x2': [-0.04,-0.003,0.012,0.011,-0.004,-0.009],
                'x3': [-0.06,-0.007,0.3,0.09,-0.005,-0.006]       
                }

df = DataFrame(dataset,columns=['y','x1','x2','x3'])


X = df[['x1','x2','x3']]
Y = df['y']

# Regression Model
regr = linear_model.LinearRegression()
regr.fit(X, Y)

print('Intercept: \n', regr.intercept_)
print('Coefficients: \n', regr.coef_)

Once we do this, our intercept and slope coefficients are then printed:

>>> print('Intercept: \n', regr.intercept_)
Intercept: 
 0.0022491408670789535
>>> print('Coefficients: \n', regr.coef_)
Coefficients: 
 [ 0.62742415 -0.06618899  0.02384715]

Hope you find this of use if you are simply looking to extract the intercept and slope coefficients.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.