Using user defined function in groupby
I am trying to use the groupby functionality in order to do the following given this example dataframe:
dates = ['2020-03-01','2020-03-01','2020-03-01','2020-03-01','2020-03-01',
'2020-03-10','2020-03-10','2020-03-10','2020-03-10','2020-03-10']
values = [1,2,3,4,5,10,20,30,40,50]
d = {'date': dates, 'values': values}
df = pd.DataFrame(data=d)
I want to take the largest n values grouped by date and take the sum of these values. This is how I understand I should do this: I should use groupby date, then define my own function that takes the grouped dataframes and spits out the value I need:
def myfunc(df):
a = df.nlargest(3, 'values')['values'].sum()
return a
data_agg = df.groupby('date').agg({'relevant_sentiment':myfunc})
However, I am getting various errors, like the fact that the value keep is not set, or that it's not clearly set when I do specify it in myfunc.
I would hope to get a dataframe with the two dates 03-01 and 03-10 with respectively the values 12 and 120.
Any help/insights/remarks will be appreciated.
Category Data Science