How to groupby and sum values of only one column based on value of another column
I have a dataset that has the following columns: Category, Product, Launch_Year, and columns named 2010, 2011 and 2012. These 13 columns contain sales of the product in that year. The goal is to create another column Launch_Sum that calculates the sum of the Category (not the Product) for each Launch_Year:
test = pd.DataFrame({
'Category':['A','A','A','B','B','B'],
'Product':['item1','item2','item3','item4','item5','item6'],
'Launch_Year':[2010,2012,2010,2012,2010,2011],
'2010':[25,0,27,0,10,0],
'2011':[50,0,5,0,20,39],
'2012':[30,40,44,20,30,42]
)}
Category | Product | Launch_Year | 2010 | 2011 | 2012 | Launch_Sum (to be created) |
---|---|---|---|---|---|---|
A | item1 | 2010 | 25 | 50 | 30 | 52 |
A | item2 | 2012 | 0 | 0 | 40 | 114 |
A | item3 | 2010 | 27 | 5 | 44 | 52 |
B | item4 | 2012 | 0 | 0 | 20 | 92 |
B | item5 | 2010 | 10 | 20 | 30 | 10 |
B | item6 | 2011 | 0 | 39 | 42 | 59 |
I did a groupby and sum, which does have the result I want (shown in bold in the table below), but I don't know how to pull out the specific result and assign it to a new column in the original dataset:
df.groupby(['Category']).sum()
Launch_Year | 2010 | 2011 | 2012 | |
---|---|---|---|---|
Category | ||||
A | 6032 | 52 | 55 | 114 |
B | 6032 | 10 | 59 | 92 |
Sorry, I don't know how to show a groupby object in markdown, hope the table above still makes sense. The Launch_Year sums are irrelevant. The required sums are all there under the year columns. How do I get these sum values and assign them to the rows with corresponding Category and Launch_Year? I need to be able to do this effectively for a large (5000) number of categories and the year sales columns from 2010 to 2022.
Category Data Science