How to groupby and sum values of only one column based on value of another column

I have a dataset that has the following columns: Category, Product, Launch_Year, and columns named 2010, 2011 and 2012. These 13 columns contain sales of the product in that year. The goal is to create another column Launch_Sum that calculates the sum of the Category (not the Product) for each Launch_Year: test = pd.DataFrame({ 'Category':['A','A','A','B','B','B'], 'Product':['item1','item2','item3','item4','item5','item6'], 'Launch_Year':[2010,2012,2010,2012,2010,2011], '2010':[25,0,27,0,10,0], '2011':[50,0,5,0,20,39], '2012':[30,40,44,20,30,42] )} Category Product Launch_Year 2010 2011 2012 Launch_Sum (to be created) A item1 2010 25 50 30 52 A item2 …
Topic: groupby pandas
Category: Data Science

Normalize data from different groups

I have data that has been grouped into 27 groups by different criteria. The reason for these groupings is to show that each group has different behavior. However, I would like to normalize everything to the same scale. For example, I would like to normalize to a 0-1 scale of 0-100, that way I could say something like $43^{rd}$ percentile and it would have the same meaning across groups. If I were to just, say, standardize each individually by subtracting …
Category: Data Science

Group_by 2 variables and pivot_wider distribution based on 2 others

Performing some calculations on a dataframe and stuck trying to calculate a few percentages. Trying to append 3 additional columns added for %POS/NEG/NEU. E.g., the sum of amount col for all observations w/ POS Direction in both Drew & A/total sum of all amounts for Drew ** Name Rating Amount Price Rate Type Direction Drew A 455 99.54 4.5 white POS Drew A 655 88.44 5.3 white NEG Drew B 454 54.43 3.4 blue NEU Drew B 654 33.54 5.4 …
Category: Data Science

Using Pandas.groupby.agg with multiple columns and functions

I have a data frame which contains duplicates I'd like to combine based on 1 column (name). In half of the other columns I'd like to keep one value (as they should all be the same) whereas I'd like to sum the others. I've tried the following code based on an answer I found here: Pandas merge column duplicate and sum value df2 = df.groupby(['name']).agg({'address': 'first', 'cost': 'sum'} The only issue is I have 100 columns, so would rather not …
Category: Data Science

Group and list posts by custom taxonomy

I'm trying to list posts from a CPT named "exposition" and group them based on a custom taxonomy called "date-exposition" (which are meant to be years such as 2010, 2015, 2017,...) in the following form : 2019 post 1 post 2 2018 post 3 post 4 etc... Closest code I managed to compile is this (it only shows one date/taxonomy-value and not the posts...) : <?php // Get current Category $get_current_cat = get_term_by('name', single_cat_title('',false), 'category'); $current_cat = $get_current_cat->term_id; // List …
Category: Web

How to combine rows after Pandas Groupby function

May I know how to combine several rows into one single row after I used Pandas groupby function? In below example, I would like to to group the data by Employee ID, Customer Last Name and Customer First Name. Then I want all his dependents' data listed in the same row. Thanks a lot!
Topic: groupby pandas
Category: Data Science

How to group posts by months and add pagination?

I want to group by posts by months. Like this: January 2014 Post name Post name December 2013 Post name November 2013 Post name Post name Post name Post name I need to have 10 months per page, each month can have any number of posts. I use WordPress kriesi pagination to add pagination into the website.
Category: Web

Grouping by Multi-Indices of both Row and Column

I have created a table using Pandas following material from here. The table created makes use of Multi-Indices for both columns and rows. I am trying to compute the descriptive statistics for each year and subject, meaning, displaying for instance the mean of 2013 for Bob, the mean for 2013 for Guido, and the mean for 2013 for Sue, for all subjects, and for all years. The means for Bob would consider the means for HR and Temp. Note: The …
Category: Data Science

Applied and view jobs ratio

I have the following data set where the column "kind" can be V(view) or A(apply), how can I do the following things given a particular job id how many applicant Apply (A) to that particular job and how many applicant View(V) the particular job? So I want a column with job and two columns with One labled view other labelled A for the job type. I am working in Jupyter notebook python, pandas, if someone can initiate or show me …
Category: Data Science

How to find median/average values between data frames with slightly different columns?

I am trying to combat run-to-run variance of the data I collect by combining the data from different runs and finding the mean/average. The problem is that in each run there is a chance that some of the features may not appear: x y z 0 0 2 2 1 0 1 3 2 5 3 0 3 1 1 0 4 0 2 0 x y d 0 1 0 2 1 1 1 3 2 0 4 2 …
Category: Data Science

panda grouping dates by variable with transposed value variables

I had read this post panda grouping by month with transpose and it gave me the nearest answer to my question but not the completely solution. How would I get somewhat like the reverse output? My target is: I have a pivoted df with a grouped text variable like above in the second pic and dates are my columns. But I would like to get the dates grouped by type and the text variable values are my new columns. It …
Category: Data Science

Mutlipe Custom fileds order by , i reload the page , result different every time

I’m writing a query_post() like this. $args = array( 'post_type' => DEBBING_TYPE_SLUG, 'post_status' => 'publish', 'meta_query' => array( array( 'relation' => 'AND', 'status_clause' => array( 'key' => '_debbing_status', // 'type' => 'NUMERIC', ), 'order_clause' => array( 'key' => '_debbing_order', // 'type' => 'NUMERIC', ), ) ), 'orderby' => array( 'status_clause' => 'DESC', 'order_clause' => 'DESC', ) ); And use query monitor the sql is below : SELECT SQL_CALC_FOUND_ROWS wp_posts.ID FROM wp_posts INNER JOIN wp_postmeta ON ( wp_posts.ID = wp_postmeta.post_id ) …
Category: Web

How to group by one column and count frequency from other column for each item in the previous column in python?

I am trying to group my data by the 'ID' column. Then I want to count the frequency of 'sequence' for each 'ID'. Here is a sample of the data frame: ID Sequence 101 1-2 101 3-1 101 1-2 102 4-6 102 7-8 102 4-6 102 4-6 103 1118-69 104 1-2 104 1-2 I am looking for a count same as: ID Sequence Count 101 1-2 2 3-1 1 102 4-6 3 7-8 1 103 1118-69 1 104 1-2 2 …
Category: Data Science

pandas groupby.count doesn't count zero occurrences

I am using groupby.count for 2 columns to get value occurrences under a class constraint. However, if value $x$ in feature never occurs with class $y$, then this pandas method returns only non-zero frequencies. Is there any solution or alternate method? The script is like: combine = np.vstack([X_train[:,-1], y_train]).T combine_df = pd.DataFrame(combine, columns = ['feature','class']) class_count_groupby = combine_df[combine_df['class'] == 2]['class'].groupby(combine_df['feature']).count()
Topic: groupby pandas
Category: Data Science

MongoDB Groupby Rank

Im Working With Mongodb And Wanted to do a query using Aggregate fucntion. Query Is Each city has several zip codes. Find the city in each state with the most number of zip codes and rank those cities along with the states using the city populations. The documents are in the following format { "_id": "10280", "city": "NEW YORK", "state": "NY", "pop": 5574, "loc": [ -74.016323, 40.710537 ] } I was able to count no of Zipcodes for each state …
Category: Data Science

Using user defined function in groupby

I am trying to use the groupby functionality in order to do the following given this example dataframe: dates = ['2020-03-01','2020-03-01','2020-03-01','2020-03-01','2020-03-01', '2020-03-10','2020-03-10','2020-03-10','2020-03-10','2020-03-10'] values = [1,2,3,4,5,10,20,30,40,50] d = {'date': dates, 'values': values} df = pd.DataFrame(data=d) I want to take the largest n values grouped by date and take the sum of these values. This is how I understand I should do this: I should use groupby date, then define my own function that takes the grouped dataframes and spits out the …
Topic: groupby pandas
Category: Data Science

Accessing group elements in groupby

I had a doubt about groupby operations. Lets suppose I have grouped my data based on one column and got 5 groups as output. I know that we can iterate over these groups and apply functions to them as a whole. But can I access the elements of each group (for example I have 5 groups each having 5 rows, can I access those rows one by one). I want to apply a function that compares two rows of a …
Category: Data Science

Access keys of pandas dataframe when using groupby

I have the following database: And I would like to know how many times a combination of BirthDate and Zipcode is repeated throughout the data table: Now, my question is: How can I access the keys of this output? For instance, how can I get Birthdate=2000101 ZipCode=8002, for i = 0? The problem is that this is a 'Series' object, so I'm not able to use .columns or .loc here.
Category: Data Science

Pandas Groupby datetime by multiple hours

I have a log dataset which looks like the following:- Time event 2020-08-27 07:00:00 1 2020-08-27 08:34:00 1 2020-08-27 16:42:23 1 2020-08-27 23:19:11 1 . . . I am trying to get the count of events that happened within different hourly interval (6 hours, 8 hours etc). Any ideas on how I can get it done pandas ?
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.