Set value for column based on two other columns in pandas dataframe

I have a dataframe that has contracts with different order dates and I need to create a new column that assign a number to each contract if it has more than one order date. For example my sample dataframe looks something like this:

df = pd.DataFrame({'contract': ['123A','123A','123A','123A','123B','123B','123C'],'prod': ['X1','M1','V1','D1','A1','B1','C1'],'date':['2019-04-17','2019-07-02','2019-04-17','2019-07-02','2019-04-17','2019-09-01','2019-08-02'],'revenue': [5688,113932,5688,49157,5002,892,9000]})

I need my final table to have another column with a unique contract id for each date. My final table from above should look something like this:

contract date header_contract
123A 2019-04-17 123A_0
123A 2019-07-02 123A_1
123A 2019-04-17 123A_0
123A 2019-08-02 123A_2

I have the following code that does what I need on a smaller dataset:

contracts_num = df['contract'].unique()
for cm in contracts_num:
    for idx,val in enumerate(df[df['contract'] == cm]['contract']
        df.loc[((df['contract'] == cm)  (df['contract'] == str(val))),'contract'] = df['contract'] + '_' + str(idx)

I'm trying to do it on a much larger dataset (around 50,000 contracts) and it's taking a really long time. Is there anyway to make it more efficient?

Topic pandas python efficiency

Category Data Science

You can use groupby together with shift and cumsum as follows:

df['header_contract'] = df['contract'] + '_' + df.sort_values(['contract', 'date']).\
  apply(lambda x: (x.shift() != x).cumsum()).astype(str)

In the apply, x.shift() != x is used to create a new series of booleans corresponding to if the date has changed in the next row or not. cumsum will then create a cumulative sum (treating all True as 1) which creates the suffixes for each group. This is then merged with the contract names to create the new column.


  contract prod       date  revenue header_contract
0     123A   X1 2019-04-17     5688          123A_1
1     123A   M1 2019-07-02   113932          123A_2
2     123A   V1 2019-04-17     5688          123A_1
3     123A   D1 2019-07-02    49157          123A_2
4     123B   A1 2019-04-17     5002          123B_1
5     123B   B1 2019-09-01      892          123B_2
6     123C   C1 2019-08-02     9000          123C_1


Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.