How do I replace NaN values using group by pivot_table in pandas DataFrame?

I am working on a machine learning practice problem, from https://datahack.analyticsvidhya.com/contest/practice-problem-big-mart-sales-iii/#ProblemStatement

I want to replace the null values in the column 'Item_Weight' and for that I am using the mean values given by a pivot_table where I calculated the mean of 'Item_Weight' and grouping the mean by column 'Item_Identifier' of the dataset.

item_weight_mean = ds.pivot_table(values='Item_Weight',columns='Item_Identifier')
loc2 = ds['Item_Weight'].isnull()
ds.loc[loc2, 'Item_Weight'] = ds.loc[loc2, 'Item_Identifier'].apply(lambda x: item_weight_mean[x])

I am getting an error for the same code.

(key)
- 2902             indexer = self.columns.get_loc(key)
   2903             if is_integer(indexer):
   2904                 indexer = [indexer]

D:\Important Applications\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2895                 return self._engine.get_loc(casted_key)
   2896             except KeyError as err:
- 2897                 raise KeyError(key) from err
   2898 
   2899         if tolerance is not None:

KeyError: 'FDN52'

How do I remove this error?

Topic data-engineering feature-engineering pandas python machine-learning

Category Data Science


The output of the pivot_table function is a dataframe, which you can confirm using the type command.

Secondly, the syntax item_weight_mean[x] is a way of indexing columns, whereas I suspect you want to index rows.

So the error message states that the key FDN52 Isn't a column I believe.


Consider using df.fillna(). That has a variety of options to replace nans with values of choices. You can apply it on either the complete data frame or on specific columns (I.e Series).

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.