How variable alpha changes SGDRegressor behavior for outlier?

I am using SGDRegressor with a constant learning rate and default loss function. I am curious to know how changing the alpha parameter in the function from 0.0001 to 100 will change regressor behavior. Below is the sample code I have:

from sklearn.linear_model import SGDRegressor

out=[(0,2),(21, 13), (-23, -15), (22,14), (23, 14)] 
alpha=[0.0001, 1, 100]
N= len(out)
plt.figure(figsize=(20,15))
j=1

for i in alpha:
    X= b * np.sin(phi)   #Since for every alpha we want to start with original dataset, I included X and Y in this section
    Y= a * np.cos(phi)
    for num in range(N):
        plt.subplot(3, N, j)
        X=np.append(X,out[num][0]) # Appending outlier to main X
        Y=np.append(Y,out[num][1]) # Appending outlier to main Y
        j=j+1  # Increasing J so we move on to next plot
        model=SGDRegressor(alpha=i, eta0=0.001, learning_rate='constant',random_state=0)
        model.fit(X.reshape(-1, 1), Y) # Fitting the model

        plt.scatter(X,Y)
        plt.title(alpha = + str(i) +  |  + Slope : + str(round(model.coef_[0], 4))) #Adding title to each plot
   
        abline(model.coef_[0],model.intercept_)  # Plotting the line using abline function
    
plt.show()

As shown above I had the main datset of X and Y and in each iteration, I am adding a point as an outlier to the main dataset and train the model and plot regression line (hyperplane). Below you can see the result for different values of alpha:

I am looking at results and am still confused and can't make solid conclusion as how alhpa parameter changes the model? what's the effect of alpha? is it causing overfitting? underfitting?

Topic sgd hyperparameter-tuning outlier python

Category Data Science


The alpha parameter controls the scale of the regularization penalty (SGDRegressor). The default regularization is L2 so your loss function is basically:

$$ (y - (Wx + b))^2 + \alpha (b^2 + \sum_j W_j^{2}) $$ where $W, b$ are the weights/biases to be optimized. The first part is just the standard mean squared error (default loss), and the second is the regularization penalty.

By increasing alpha, you are penalizing large values of the model parameters to control for overfitting. You should be able to see that if you let alpha be very large (say 10 million), then the "optimal" $W, b$ is just 0 so you get a horizontal line at 0.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.