dataframe.columns.difference() use

I am trying to find the working of dataframe.columns.difference() but couldn't find a satisfactory explanation about it. Can anyone explain the working of this method in detail?

Topic dataframe difference pandas

Category Data Science


The function dataframe.columns.difference() gives you complement of the values that you provide as argument. It can be used to create a new dataframe from an existing dataframe with exclusion of some columns. Let us look through an example:

In [2]: import pandas as pd

In [3]: import numpy as np

In [4]: df = pd.DataFrame(np.random.randn(5, 4), columns=list('ABCD'))

In [5]: df
Out[5]: 
          A         B         C         D
0 -1.023134 -0.130241 -0.675639 -0.985182
1  0.270465 -1.099458 -1.114871  3.203371
2 -0.340572  0.913594 -0.387428  0.867702
3 -0.487784  0.465429 -1.344002  1.216967
4  1.433862 -0.172795 -1.656147  0.061359

In [6]: df_new = df[df.columns.difference(['B', 'D'])]

In [7]: df_new
Out[7]: 
          A         C
0 -1.023134 -0.675639
1  0.270465 -1.114871
2 -0.340572 -0.387428
3 -0.487784 -1.344002
4  1.433862 -1.656147

The function returns as output a new list of columns from the existing columns excluding the ones given as arguments. You can also check it:

In [8]: df.columns.difference(['B', 'D'])
Out[8]: Index(['A', 'C'], dtype='object')

I suggest you to take a look at the official documentation here.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.