Why are my ridge regression coefficients completely different from ordinary linear regression coefficients in MATLAB?

I am attempting to implement my own Ridge Regression algorithm and I am trying to achieve similar coefficients found in a MATLAB tutorial on regression.

Specifically, on the MATLAB tutorial page you will see:

load carsmall
x1 = Weight;
x2 = Horsepower;    % Contains NaN data
y = MPG;

X = [ones(size(x1)) x1 x2 x1.*x2];
b = regress(y,X)    % Removes NaN data

b = 4×1

   60.7104
   -0.0102
   -0.1882
    0.0000

Above, you can see the first coefficient is about 60, and the rest are pretty close to 0. I am trying to achieve similar results using Ridge Regression with the exact same data set carsmall provided with MATLAB.

The following is MATLAB code I have written:

load carsmall
x1 = Weight;
x2 = Horsepower;    % Contains NaN data
y = MPG;
x3 = x1.*x2;

% remove NaN data
y_nan = find (isnan(y));
x2_nan = find(isnan(x2));
all_nan = [y_nan; x2_nan];

counter = 1;
for m=1:length(y)
    flag=0;
    for j=1:length(all_nan)
        if m == all_nan(j)
           
        
           flag = 1;
        end
    end
    if flag  1
        y_clean(counter) = y(m);
        x1_clean(counter) = x1(m);
        x2_clean(counter) = x2(m);
        x3_clean(counter) = x3(m);
        counter = counter+1;
    end

end


clear x1 x2 x3 y
x1 = x1_clean;
x2 = x2_clean;
x3 = x3_clean;
y = y_clean;
n = length(y);

% at this point, x1,x2,x3, and y should not have any NaN data (i.e. clean)

% normalize the clean data
x1 = x1 / max(x1);
x2 = x2/max(x2);
x3 = x3/max(x3);
 y = y/max(y);



 % gradient descent iterates this many times
 max_iterations=10;

% this is the variable used for penalty in the cost function for Ridge
% Regression
lambda = .1;

% gradient descent uses this to compute a step size
learning_rate = .001;

% initialize parameters
    y_int = 10;
    B1 = .1;
    B2 = .1;
    B3 = 0;

% begin gradient descent iterations
thres_y_int = .01;  % -- used for stopping condition of gradient descent

for i=1:max_iterations
    
    dJ_d_y_int = 0;    
    dJ_d_B1 = 0;
    dJ_d_B2 = 0;
    %dJ_d_B3 = 0;
    for j=1:n
        % these are actually partial derivatives of cost function with
        % respect to the 3 params (y_intercept, B1, and B2)
        dJ_d_y_int = dJ_d_y_int -2 * (   y(j) - y_int -B1*x1(j) - B2*x2(j)- B3*x3(j)   );
    
        dJ_d_B1  = dJ_d_B1 -2 * x1(j) * (y(j) - y_int -B1*x1(j) - B2*x2(j)- B3*x3(j));

        dJ_d_B2  = dJ_d_B2 -2 * x2(j) * (y(j) - y_int -B1*x1(j) - B2*x2(j)- B3*x3(j));

        %dJ_d_B3  = dJ_d_B3 -2 * x3(j) * (y(j) - y_int -B1*x1(j) - B2*x2(j)- B3*x3(j));
    end

    dJ_d_B1 = dJ_d_B1 + 2*lambda*B1;
    dJ_d_B2 = dJ_d_B2 + 2*lambda*B2;
    %dJ_d_B3 = dJ_d_B3 + 2*lambda*B3;

    % step size
    delta_y_int = dJ_d_y_int * learning_rate;
    delta_B1 = dJ_d_B1 * learning_rate;
    delta_B2 = dJ_d_B2 * learning_rate;
    %delta_B3 = dJ_d_B3 * learning_rate;

    % stopping condition
    if ( abs(delta_y_int)  thres_y_int)
        disp('breaking')
        break
    end

    % update parameters
    y_int = y_int - delta_y_int;
    B1 = B1 - delta_B1;
    B2 = B2 - delta_B2;
   % B3 = B3 - delta_B3;

end

Running the above program results in the following coefficients:

B1 =

  -3.348401550938010

B2 =

  -2.504364991046751

y_int =

   4.206818888998534

These numbers look nothing like the coefficients found in the MATLAB tutorial. Hence, I am thinking I am doing something wrong. What I am missing?

Topic ridge-regression linear-regression matlab

Category Data Science


Ridge regression is conceptionally different compared to OLS (aka multiple linear regression, as I believe is used in the tutorial).

enter image description here

With Ridge you have an additional "penalty term" on top of minimizing RSS (residual sum of squares). See also "Introduction to Statistical Learning" (Chapter 6.2.1).

Thus you would not expect OLS and Ridge to yield the same coefficients. Ridge regression can "shrink" coefficients which do not contribute much to a good prediction so that they can be "close to zero" (not exactly zero, this is possible with Lasso, which uses the L1-norm for the penalty).

However, I cannot say if your code is correct or not since I don't use Matlab.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.