SKlearn PolynomialFeatures R^2 score

I'm trying to create a linear regression model with use of PolynomialFeatures. But when I evaluate it, I get really strange scores. I know that R^2 can be applied to this model and I think I've trying everything. I'd really apricate a good advice. Here is my code.

X = df_all[['Elevation_gain', 'Distance']] 
y = df_all['Avg_tempo_in_seconds']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 42)

for n in range(2,10,1):
 
    poly_feat = PolynomialFeatures(degree=n, include_bias = True)

    X_poly_train = poly_feat.fit_transform(X_train)
    X_poly_test = poly_feat.transform(X_test)
    

    lin_reg_2 = LinearRegression()
    lin_reg_2.fit(X_poly_train, y_train)
    test_pred_2 = lin_reg_2.predict(X_poly_test)

    #testset evaluation
    r2 = metrics.r2_score(y_true = y_test, y_pred = test_pred_2)
    mse = metrics.mean_squared_error(y_true = y_test, y_pred = test_pred_2)
    print(round(r2,2))
    #print(round(mse,2))

And this is the output I get:

0.36
-3.99
-59.96
-1299.38
-627.37
-1773329.36
-19673802.94
-23125681.65

Here is the sample data:

Elevation_gain Distance Avg_tempo_in_seconds
70 6,13 290.1
135 9.27 301.0
10 4.94 287.5
270 15.74 310.2
120 8.11 298.5

Topic r-squared scikit-learn python machine-learning

Category Data Science


$$R^2_{out}=\dfrac{\sum \big( y_i-\hat y_i \big)^2 }{ \sum\big( y_i-\bar y_{in} \big)^2 } $$

If your out-of-sample performance (measured by squared residuals) is worse (bigger) than performance of a naïve model that always predicts the in-sample mean of $y$, then your out-of-sample $R^2_{out}<0$. This is not unique to polynomial regression.


The scores you are seeing indicate that a linear regression would with multiple polynomial features does not fit the data well, with performance decreasing drastically on new data when using features polynomial features of degree 5/6 and higher (likely because of overfitting and/or multicollinearity). R-squared can be negative, for what this exactly means see for example this question on stats.stackexchange.com.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.