when I only give command 'fit', my class does 'transform' too

I have created 2 classes, first of which is:

away_defencePressure_idx = 15

class IterImputer(TransformerMixin):
    def __init__(self):
        self.imputer = IterativeImputer(max_iter=10)
        
    def fit(self, X, y=None):
        self.imputer.fit(X)
        return self
    
    def transform(self, X, y=None):
        imputed = self.imputer.transform(X)
        X['away_defencePressure'] = imputed[:,away_defencePressure_idx]
        return X

and the second one is

home_chanceCreationPassing_idx = 3

class KneighborImputer(TransformerMixin):
    def __init__(self):
        self.imputer = KNNImputer(n_neighbors=1)
        
    def fit(self, X, y=None):
        self.imputer.fit(X)
        return self
    
    def transform(self, X, y=None):
        imputed = self.imputer.transform(X)
        X['home_chanceCreationPassing'] = imputed[:,home_chanceCreationPassing_idx]
        return X

When I put IterImputer() in a pipeline and fit_transform, the outcome is:

******************** Before Imputing ********************
7856     49.166667
12154    44.666667
10195    48.333333
18871    57.333333
267      48.833333
Name: home_chanceCreationPassing, dtype: float64
# of null values 70
******************** After Imputing ********************
7856     49.166667
12154    44.666667
10195    48.333333
18871    57.333333
267      48.833333
Name: home_chanceCreationPassing, dtype: float64
# of null values 0

It works fine.

But then if I put the two imputers into one pipeline as follows and fit:

p = Pipeline([
              ('imputerA', IterImputer()),
              ('imputerB', KneighborImputer())
              ])


p = Pipeline([
              ('imputerA', IterImputer()),
              ('imputerB', KneighborImputer())
              ])

X = X_train.copy()

p.fit(X)

even without transforming

display(X.head())
print('# of null values', X.isnull().sum())

the outcome would be like

    home_buildUpPlaySpeed   home_buildUpPlayDribbling   home_buildUpPlayPassing home_chanceCreationPassing  home_chanceCreationCrossing home_chanceCreationShooting home_defencePressure    home_defenceAggression  home_defenceTeamWidth   away_buildUpPlaySpeed   away_buildUpPlayDribbling   away_buildUpPlayPassing away_chanceCreationPassing  away_chanceCreationCrossing away_chanceCreationShooting away_defencePressure    away_defenceAggression  away_defenceTeamWidth
7856    50.833333   44.5    37.666667   49.166667   55.000000   48.166667   49.333333   43.000000   53.166667   61.333333   56.0    51.333333   67.000000   58.333333   57.166667   55.000000   47.166667   53.000000
12154   59.333333   69.0    42.666667   44.666667   59.166667   52.333333   40.333333   41.833333   52.666667   47.000000   54.0    41.166667   60.833333   53.833333   54.833333   49.666667   47.500000   56.500000
10195   58.000000   54.0    57.666667   48.333333   53.833333   55.833333   34.833333   60.333333   53.166667   56.333333   41.5    42.333333   52.166667   51.666667   57.166667   46.333333   53.666667   53.333333
18871   61.833333   54.5    58.000000   57.333333   55.000000   49.500000   47.833333   48.000000   57.000000   59.000000   64.0    57.333333   52.500000   63.000000   58.666667   46.500000   47.666667   60.833333
267 49.166667   52.0    46.500000   48.833333   55.833333   47.666667   53.666667   53.833333   54.666667   59.666667   45.0    60.333333   54.666667   58.833333   61.333333   51.500000   57.500000   56.500000
# of null values home_buildUpPlaySpeed           0
home_buildUpPlayDribbling       0
home_buildUpPlayPassing         0
home_chanceCreationPassing     70
home_chanceCreationCrossing     0
home_chanceCreationShooting     0
home_defencePressure            0
home_defenceAggression          0
home_defenceTeamWidth           0
away_buildUpPlaySpeed           0
away_buildUpPlayDribbling       0
away_buildUpPlayPassing         0
away_chanceCreationPassing      0
away_chanceCreationCrossing     0
away_chanceCreationShooting     0
away_defencePressure            0
away_defenceAggression          0
away_defenceTeamWidth           0
dtype: int64

So the thing is only by doing 'fit', second last step is commited! and the last step is commited when I do 'transform'.

Does anyone know why such thing happens?

Topic pipelines scikit-learn python machine-learning

Category Data Science


Imputer

  • fit() : provides statistics for the imputer i.e., fits data to imputer
  • transform() : imputes and fills the missing values
  • fit_transform() : Fit to data, then transform it.

Pipeline ideally apply a list of transforms so the final estimator only needs to implement fit

So to answer your question in the pipeline, the transform is already in place, so all you have to do is ensure fit is called by the final estimator

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.