Can anyone tell me why is my pipeline wrong?
I am trying to build a pipeline in order to perform GridSearchCV to find the best parameters. I already split the data into train and validation and have the following code:
column_transformer = make_pipeline(
(OneHotEncoder(categories = cols)),
(OrdinalEncoder(categories = X[grade])),
passthrough)
imputer = SimpleImputer(strategy='median')
scaler = StandardScaler()
model = SGDClassifier(loss='log',random_state=42,n_jobs=-1,warm_start=True)
pipeline_sgdlogreg = make_pipeline(imputer, column_transformer, scaler, model)
When I perform GridSearchCV I am getting the follwing error:
cannot use median strategy with non-numeric data (...)
I do not understand why am I getting this error. None of the categorical variables have missing values.
I perfoming the follwing: Imputation-Encoding-Scaling- Modeling
Can anyone shed some light?
Topic pipelines missing-data encoding python
Category Data Science