when I only give command 'fit', my class does 'transform' too
I have created 2 classes, first of which is:
away_defencePressure_idx = 15
class IterImputer(TransformerMixin):
def __init__(self):
self.imputer = IterativeImputer(max_iter=10)
def fit(self, X, y=None):
self.imputer.fit(X)
return self
def transform(self, X, y=None):
imputed = self.imputer.transform(X)
X['away_defencePressure'] = imputed[:,away_defencePressure_idx]
return X
and the second one is
home_chanceCreationPassing_idx = 3
class KneighborImputer(TransformerMixin):
def __init__(self):
self.imputer = KNNImputer(n_neighbors=1)
def fit(self, X, y=None):
self.imputer.fit(X)
return self
def transform(self, X, y=None):
imputed = self.imputer.transform(X)
X['home_chanceCreationPassing'] = imputed[:,home_chanceCreationPassing_idx]
return X
When I put IterImputer()
in a pipeline and fit_transform
, the outcome is:
******************** Before Imputing ********************
7856 49.166667
12154 44.666667
10195 48.333333
18871 57.333333
267 48.833333
Name: home_chanceCreationPassing, dtype: float64
# of null values 70
******************** After Imputing ********************
7856 49.166667
12154 44.666667
10195 48.333333
18871 57.333333
267 48.833333
Name: home_chanceCreationPassing, dtype: float64
# of null values 0
It works fine.
But then if I put the two imputers into one pipeline as follows and fit:
p = Pipeline([
('imputerA', IterImputer()),
('imputerB', KneighborImputer())
])
p = Pipeline([
('imputerA', IterImputer()),
('imputerB', KneighborImputer())
])
X = X_train.copy()
p.fit(X)
even without transforming
display(X.head())
print('# of null values', X.isnull().sum())
the outcome would be like
home_buildUpPlaySpeed home_buildUpPlayDribbling home_buildUpPlayPassing home_chanceCreationPassing home_chanceCreationCrossing home_chanceCreationShooting home_defencePressure home_defenceAggression home_defenceTeamWidth away_buildUpPlaySpeed away_buildUpPlayDribbling away_buildUpPlayPassing away_chanceCreationPassing away_chanceCreationCrossing away_chanceCreationShooting away_defencePressure away_defenceAggression away_defenceTeamWidth
7856 50.833333 44.5 37.666667 49.166667 55.000000 48.166667 49.333333 43.000000 53.166667 61.333333 56.0 51.333333 67.000000 58.333333 57.166667 55.000000 47.166667 53.000000
12154 59.333333 69.0 42.666667 44.666667 59.166667 52.333333 40.333333 41.833333 52.666667 47.000000 54.0 41.166667 60.833333 53.833333 54.833333 49.666667 47.500000 56.500000
10195 58.000000 54.0 57.666667 48.333333 53.833333 55.833333 34.833333 60.333333 53.166667 56.333333 41.5 42.333333 52.166667 51.666667 57.166667 46.333333 53.666667 53.333333
18871 61.833333 54.5 58.000000 57.333333 55.000000 49.500000 47.833333 48.000000 57.000000 59.000000 64.0 57.333333 52.500000 63.000000 58.666667 46.500000 47.666667 60.833333
267 49.166667 52.0 46.500000 48.833333 55.833333 47.666667 53.666667 53.833333 54.666667 59.666667 45.0 60.333333 54.666667 58.833333 61.333333 51.500000 57.500000 56.500000
# of null values home_buildUpPlaySpeed 0
home_buildUpPlayDribbling 0
home_buildUpPlayPassing 0
home_chanceCreationPassing 70
home_chanceCreationCrossing 0
home_chanceCreationShooting 0
home_defencePressure 0
home_defenceAggression 0
home_defenceTeamWidth 0
away_buildUpPlaySpeed 0
away_buildUpPlayDribbling 0
away_buildUpPlayPassing 0
away_chanceCreationPassing 0
away_chanceCreationCrossing 0
away_chanceCreationShooting 0
away_defencePressure 0
away_defenceAggression 0
away_defenceTeamWidth 0
dtype: int64
So the thing is only by doing 'fit', second last step is commited! and the last step is commited when I do 'transform'.
Does anyone know why such thing happens?
Topic pipelines scikit-learn python machine-learning
Category Data Science