gridsearchcv best coefficients do not match well with the perfect line

I wrote a program to find the best combination of coefficients to describe a variable. However, the coefficients from the gridsearchcv do not match well with the expected line. This is a sample of my data:

pipe = make_pipeline(process, SelectKBest(f_regression), model)
gs=GridSearchCV(pipe,params,n_jobs=-1,cv=5, return_train_score = False);
gs.fit(x_train, y_train)
fin = gs.best_estimator_.steps[2][1]; 
coef = fin.coef_; 
intercept = fin.intercept_

and these are the coefficients given:

Then if I plot the line with the coefficients:

xplot = 16.15589 + 1.13934372*df_loc.ChargeAmount + 1.605411*df_loc.PatientPrice + 6.81365603*df_loc.LastCost
plt.scatter(xplot, df_locpre.MSRP, color = 'black');
plt.plot(df_locpre.MSRP, df_locpre.MSRP, color = 'blue')

I obtain the following figure:

I think that either the scale is wrong, or some coefficients might be negative. Could you help me figure out where I am going wrong?

Topic machine-learning-model estimators gridsearchcv regression python

Category Data Science


I would just lean on the pipeline: predict instead of manually using the coefficients.

xplot = pipe.predict(df_locpre)

After some digging, it seems that the coefficient are simply scaled by the scaler I use. I have to reverse the transformation. I have not found a good way to do this automatically, so I followed this: https://www.tutorialguruji.com/python/how-linear-regression-coefficients-are-stored-in-sklearn-pipelines/amp/

If you know anything that might spare me from hard computing the values, please let me know!

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.