SKLearn - Different Results B/w Default Linear Model and1st Order Polynomial Linear Model
SUMMARY
I'm building a linear regression model using Scikit and noticing that the model performance (RMSE and max error, namely) varies depending on whether I use the default LR or whether I apply PolynomialFeature(degree=1)
.
My understanding is that these outcomes should be identical, since they are both utilizing a single-order LR model, however, my error is consistently lower when using the PolyFeatures version.
TLDR
When I run the code below, the second chunk (polynomial = degree of 1) is consistently more accurate than the default LR model. I expect these models to be identical, so can anyone explain why that is the case?
Code
model = LinearRegression()
# DEFAULT LR MODEL
# Perform a test/ train split with the transformed X data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_ratio)
# Fit the model evaluate performance
model.fit(X_train, y_train) # Feed it our input matrix and known outputs, after the t/t split
y_predicted = model.predict(X_test) # Feed test data back into the newly generated model
rmse = np.sqrt(mean_squared_error(y_test, y_predicted))
max_error = np.max(abs(y_predicted - y_test))
r2 = model.score(X_train, y_train)
print(model.coef_)
print(' RMSE: ', rmse)
print(' Max Error: ', max_error)
print(' R2: ', r2, '\n')
# ---------------------------------------------------
# 1ST ORDER POLYNOMIAL MODEL
# Create a polynomial transformation matrix of X
poly = PolynomialFeatures(degree=i+1)
X_transf = poly.fit_transform(X) # Transformation matrix to increase power of regressive model
# Perform a test/ train split with the transformed X data
X_train, X_test, y_train, y_test = train_test_split(X_transf, y, test_size=test_ratio)
# Fit the polynomial model evaluate performance
model.fit(X_train, y_train) # Feed it our input matrix and known outputs, after the t/t split
y_predicted = model.predict(X_test) # Feed test data back into the newly generated model
rmse = np.sqrt(mean_squared_error(y_test, y_predicted))
max_error = np.max(abs(y_predicted - y_test))
r2 = model.score(X_train, y_train)
print(model.coef_)
print(' RMSE: ', rmse)
print(' Max Error: ', max_error)
print(' R2: ', r2, '\n')
Topic machine-learning-model python-3.x linear-regression scikit-learn
Category Data Science