SKLearn - Different Results B/w Default Linear Model and1st Order Polynomial Linear Model

SUMMARY

I'm building a linear regression model using Scikit and noticing that the model performance (RMSE and max error, namely) varies depending on whether I use the default LR or whether I apply PolynomialFeature(degree=1).

My understanding is that these outcomes should be identical, since they are both utilizing a single-order LR model, however, my error is consistently lower when using the PolyFeatures version.

TLDR

When I run the code below, the second chunk (polynomial = degree of 1) is consistently more accurate than the default LR model. I expect these models to be identical, so can anyone explain why that is the case?

Code

model = LinearRegression()

# DEFAULT LR MODEL
# Perform a test/ train split with the transformed X data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_ratio)

# Fit the model  evaluate performance
model.fit(X_train, y_train)  # Feed it our input matrix and known outputs, after the t/t split
y_predicted = model.predict(X_test)  # Feed test data back into the newly generated model
rmse = np.sqrt(mean_squared_error(y_test, y_predicted))
max_error = np.max(abs(y_predicted - y_test))
r2 = model.score(X_train, y_train)
print(model.coef_)
print('   RMSE: ', rmse)
print('   Max Error: ', max_error)
print('   R2: ', r2, '\n')

# ---------------------------------------------------
# 1ST ORDER POLYNOMIAL MODEL
# Create a polynomial transformation matrix of X
poly = PolynomialFeatures(degree=i+1)
X_transf = poly.fit_transform(X)  # Transformation matrix to increase power of regressive model

# Perform a test/ train split with the transformed X data
X_train, X_test, y_train, y_test = train_test_split(X_transf, y, test_size=test_ratio)

# Fit the polynomial model  evaluate performance
model.fit(X_train, y_train)  # Feed it our input matrix and known outputs, after the t/t split
y_predicted = model.predict(X_test)  # Feed test data back into the newly generated model
rmse = np.sqrt(mean_squared_error(y_test, y_predicted))
max_error = np.max(abs(y_predicted - y_test))
r2 = model.score(X_train, y_train)
print(model.coef_)
print('   RMSE: ', rmse)
print('   Max Error: ', max_error)
print('   R2: ', r2, '\n')

Topic machine-learning-model python-3.x linear-regression scikit-learn

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.