How to use SMOTENC inside the Pipeline?
I would greatly appreciate if you could let me know how to use SMOTENC. I wrote:
num_indices1 = list(X.iloc[:,np.r_[0:94,95,97,100:123]].columns.values)
cat_indices1 = list(X.iloc[:,np.r_[94,96,98,99,123:160]].columns.values)
print(len(num_indices1))
print(len(cat_indices1))
pipeline=Pipeline(steps= [
# Categorical features
('feature_processing', FeatureUnion(transformer_list = [
('categorical', MultiColumn(cat_indices1)),
#numeric
('numeric', Pipeline(steps = [
('select', MultiColumn(num_indices1)),
('scale', StandardScaler())
]))
])),
('clf', rg)
]
)
Therefore, as it is indicated I have 5 categorical features. Really, indices 123 to 160 are related to one categorical feature with 37 possible values which is converted into 37 columns using get_dummies
.
I think SMOTENC
should be inserted before the classifier ('clf', reg)
but I don't know how to define "categorical_features
" in SMOTENC
. Besides, could you please let me know where to use imblearn.pipeline?
Thanks in advance.
Topic smotenc imbalanced-learn scikit-learn python
Category Data Science