Python Pandas agg error

I am trying to generate descriptive statistics using agg function in Pandas. I am having trouble with one line with a lambda function. They work when I run them as separate lines of code, but when I put them as a single line I get errors.

Any guidance is much appreciated.

The following two lines of codes work when I run them individually.

First line of code:

bh_df.groupby('CAT.MEDV').agg(
  avg_Nox=('NOX', 'mean'))

Second line with lambda function.

bh_df.groupby('CAT.MEDV').agg(
   rng=("NOX", lambda x: (max(x) - min(x))))

However, when I combine them into a single line of code as:

bh_df.groupby('CAT.MEDV').agg(
   avg_Nox=('NOX', 'mean'),
   rng=("NOX", lambda x: (max(x) - min(x))))

I get a whole bunch of errors:

File "", line 4, in

rng=("NOX", lambda x: (max(x) - min(x))))

File "C:\Users\pdile\Anaconda3\lib\site-packages\pandas\core\groupby\generic.py", line 1455, in aggregate return super().aggregate(arg, *args, **kwargs)

File "C:\Users\pdile\Anaconda3\lib\site-packages\pandas\core\groupby\generic.py", line 264, in aggregate result = result[order]

File "C:\Users\pdile\Anaconda3\lib\site-packages\pandas\core\frame.py", line 2986, in getitem indexer = self.loc._convert_to_indexer(key, axis=1, raise_missing=True)

File "C:\Users\pdile\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 1285, in _convert_to_indexer return self._get_listlike_indexer(obj, axis, **kwargs)[1]

File "C:\Users\pdile\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 1092, in _get_listlike_indexer keyarr, indexer, o._get_axis_number(axis), raise_missing=raise_missing

File "C:\Users\pdile\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 1185, in _validate_read_indexer

Final error:

raise KeyError("{} not in index".format(not_found))

KeyError: "[('NOX', '')] not in index"

Topic python-3.x aggregation pandas

Category Data Science


If I am not mistaken the dataset used is the Boston home values dataset from http://lib.stat.cmu.edu/datasets/boston

The given code works fine on google colab if the dataset used is correct. Please check the dataset before proceeding further. Here are the screenshots for the same.

Step 1: Reading the dataset

enter image description here

Step 2: Printing the first 5 rows of the dataset enter image description here

Step 3: Generating descriptive statistics using agg function in Pandas Note: I changed the column names according to the dataset

enter image description here

Here is another solution for step 3 enter image description here

Note: I have run the script on google colab. Your problem might be there due to version issues.

Here is the code used.

import pandas as pd
import numpy as np
house=pd.read_csv('Boston.csv')
house.columns
 
house.groupby('medv').agg(
  avg_Nox=('nox', 'mean'),
  rng=("nox", lambda x: (max(x) - min(x))))
 
house.groupby(['medv'])['nox'].agg(
    [('avg_nox',  np.mean), 
    ('rng', lambda x: (max(x)-min(x)))])

You can view a similar machine learning project which predicts the price of the house based on other variable here https://bit.ly/3ApVCAO


Because there is no more column "NOX"

you changed it in the previous mean aggregation. Try "avg_Nox" or whatever column name you get after applying your first aggregation function

bh_df.groupby('CAT.MEDV').agg(
  avg_Nox=('NOX', 'mean'))

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.