How to groupby and sum values of only one column based on value of another column

Question

How to groupby and sum values of only one column based on value of another column

MushyMush

2022年6月3日 10:20

I have a dataset that has the following columns: Category, Product, Launch_Year, and columns named 2010, 2011 and 2012. These 13 columns contain sales of the product in that year. The goal is to create another column Launch_Sum that calculates the sum of the Category (not the Product) for each Launch_Year:

test = pd.DataFrame({
    'Category':['A','A','A','B','B','B'],
    'Product':['item1','item2','item3','item4','item5','item6'],
    'Launch_Year':[2010,2012,2010,2012,2010,2011],
    '2010':[25,0,27,0,10,0],
    '2011':[50,0,5,0,20,39],
    '2012':[30,40,44,20,30,42]
)}

Category	Product	Launch_Year	2010	2011	2012	Launch_Sum (to be created)
A	item1	2010	25	50	30	52
A	item2	2012	0	0	40	114
A	item3	2010	27	5	44	52
B	item4	2012	0	0	20	92
B	item5	2010	10	20	30	10
B	item6	2011	0	39	42	59

I did a groupby and sum, which does have the result I want (shown in bold in the table below), but I don't know how to pull out the specific result and assign it to a new column in the original dataset:

df.groupby(['Category']).sum()

	Launch_Year	2010	2011	2012
Category
A	6032	52	55	114
B	6032	10	59	92

Sorry, I don't know how to show a groupby object in markdown, hope the table above still makes sense. The Launch_Year sums are irrelevant. The required sums are all there under the year columns. How do I get these sum values and assign them to the rows with corresponding Category and Launch_Year? I need to be able to do this effectively for a large (5000) number of categories and the year sales columns from 2010 to 2022.

Topic groupby pandas

Category Data Science

MushyMush · Accepted Answer · 2022年6月3日 10:20

I figured a way to do it, but it doesn't look efficient at all. Regardless, the following code got me the results I needed.

test = pd.DataFrame({
    'Category':['A','A','A','B','B','B'],
    'Product':['item1','item2','item3','item4','item5','item6'],
    'Launch_Year':[2010,2012,2010,2012,2010,2011],
    '2010':[25,0,27,0,10,0],
    '2011':[50,0,5,0,20,39],
    '2012':[30,40,44,20,30,42]
)}

test_grouped = test.groupby(['Category']).sum()
test_grouped.reset_index(inplace=True)

for cat in test_grouped.Category:
    for year in test.Launch_Year.unique():
        test.loc[((test.Category == cat) & (test.Launch_Year == year)), 'Annual_Sum'] = test_grouped[test_grouped.Category == cat][str(year)].values[0]

How to groupby and sum values of only one column based on value of another column

About