I have a dataset that has the following columns: Category, Product, Launch_Year, and columns named 2010, 2011 and 2012. These 13 columns contain sales of the product in that year. The goal is to create another column Launch_Sum that calculates the sum of the Category (not the Product) for each Launch_Year: test = pd.DataFrame({ 'Category':['A','A','A','B','B','B'], 'Product':['item1','item2','item3','item4','item5','item6'], 'Launch_Year':[2010,2012,2010,2012,2010,2011], '2010':[25,0,27,0,10,0], '2011':[50,0,5,0,20,39], '2012':[30,40,44,20,30,42] )} Category Product Launch_Year 2010 2011 2012 Launch_Sum (to be created) A item1 2010 25 50 30 52 A item2 …
I have data that has been grouped into 27 groups by different criteria. The reason for these groupings is to show that each group has different behavior. However, I would like to normalize everything to the same scale. For example, I would like to normalize to a 0-1 scale of 0-100, that way I could say something like $43^{rd}$ percentile and it would have the same meaning across groups. If I were to just, say, standardize each individually by subtracting …
Performing some calculations on a dataframe and stuck trying to calculate a few percentages. Trying to append 3 additional columns added for %POS/NEG/NEU. E.g., the sum of amount col for all observations w/ POS Direction in both Drew & A/total sum of all amounts for Drew ** Name Rating Amount Price Rate Type Direction Drew A 455 99.54 4.5 white POS Drew A 655 88.44 5.3 white NEG Drew B 454 54.43 3.4 blue NEU Drew B 654 33.54 5.4 …
I have a data frame which contains duplicates I'd like to combine based on 1 column (name). In half of the other columns I'd like to keep one value (as they should all be the same) whereas I'd like to sum the others. I've tried the following code based on an answer I found here: Pandas merge column duplicate and sum value df2 = df.groupby(['name']).agg({'address': 'first', 'cost': 'sum'} The only issue is I have 100 columns, so would rather not …
I'm trying to list posts from a CPT named "exposition" and group them based on a custom taxonomy called "date-exposition" (which are meant to be years such as 2010, 2015, 2017,...) in the following form : 2019 post 1 post 2 2018 post 3 post 4 etc... Closest code I managed to compile is this (it only shows one date/taxonomy-value and not the posts...) : <?php // Get current Category $get_current_cat = get_term_by('name', single_cat_title('',false), 'category'); $current_cat = $get_current_cat->term_id; // List …
May I know how to combine several rows into one single row after I used Pandas groupby function? In below example, I would like to to group the data by Employee ID, Customer Last Name and Customer First Name. Then I want all his dependents' data listed in the same row. Thanks a lot!
I want to group by posts by months. Like this: January 2014 Post name Post name December 2013 Post name November 2013 Post name Post name Post name Post name I need to have 10 months per page, each month can have any number of posts. I use WordPress kriesi pagination to add pagination into the website.
I have created a table using Pandas following material from here. The table created makes use of Multi-Indices for both columns and rows. I am trying to compute the descriptive statistics for each year and subject, meaning, displaying for instance the mean of 2013 for Bob, the mean for 2013 for Guido, and the mean for 2013 for Sue, for all subjects, and for all years. The means for Bob would consider the means for HR and Temp. Note: The …
I have the following data set where the column "kind" can be V(view) or A(apply), how can I do the following things given a particular job id how many applicant Apply (A) to that particular job and how many applicant View(V) the particular job? So I want a column with job and two columns with One labled view other labelled A for the job type. I am working in Jupyter notebook python, pandas, if someone can initiate or show me …
I am trying to combat run-to-run variance of the data I collect by combining the data from different runs and finding the mean/average. The problem is that in each run there is a chance that some of the features may not appear: x y z 0 0 2 2 1 0 1 3 2 5 3 0 3 1 1 0 4 0 2 0 x y d 0 1 0 2 1 1 1 3 2 0 4 2 …
I had read this post panda grouping by month with transpose and it gave me the nearest answer to my question but not the completely solution. How would I get somewhat like the reverse output? My target is: I have a pivoted df with a grouped text variable like above in the second pic and dates are my columns. But I would like to get the dates grouped by type and the text variable values are my new columns. It …
I am trying to group my data by the 'ID' column. Then I want to count the frequency of 'sequence' for each 'ID'. Here is a sample of the data frame: ID Sequence 101 1-2 101 3-1 101 1-2 102 4-6 102 7-8 102 4-6 102 4-6 103 1118-69 104 1-2 104 1-2 I am looking for a count same as: ID Sequence Count 101 1-2 2 3-1 1 102 4-6 3 7-8 1 103 1118-69 1 104 1-2 2 …
I am using groupby.count for 2 columns to get value occurrences under a class constraint. However, if value $x$ in feature never occurs with class $y$, then this pandas method returns only non-zero frequencies. Is there any solution or alternate method? The script is like: combine = np.vstack([X_train[:,-1], y_train]).T combine_df = pd.DataFrame(combine, columns = ['feature','class']) class_count_groupby = combine_df[combine_df['class'] == 2]['class'].groupby(combine_df['feature']).count()
Im Working With Mongodb And Wanted to do a query using Aggregate fucntion. Query Is Each city has several zip codes. Find the city in each state with the most number of zip codes and rank those cities along with the states using the city populations. The documents are in the following format { "_id": "10280", "city": "NEW YORK", "state": "NY", "pop": 5574, "loc": [ -74.016323, 40.710537 ] } I was able to count no of Zipcodes for each state …
I am trying to use the groupby functionality in order to do the following given this example dataframe: dates = ['2020-03-01','2020-03-01','2020-03-01','2020-03-01','2020-03-01', '2020-03-10','2020-03-10','2020-03-10','2020-03-10','2020-03-10'] values = [1,2,3,4,5,10,20,30,40,50] d = {'date': dates, 'values': values} df = pd.DataFrame(data=d) I want to take the largest n values grouped by date and take the sum of these values. This is how I understand I should do this: I should use groupby date, then define my own function that takes the grouped dataframes and spits out the …
I had a doubt about groupby operations. Lets suppose I have grouped my data based on one column and got 5 groups as output. I know that we can iterate over these groups and apply functions to them as a whole. But can I access the elements of each group (for example I have 5 groups each having 5 rows, can I access those rows one by one). I want to apply a function that compares two rows of a …
I have the following database: And I would like to know how many times a combination of BirthDate and Zipcode is repeated throughout the data table: Now, my question is: How can I access the keys of this output? For instance, how can I get Birthdate=2000101 ZipCode=8002, for i = 0? The problem is that this is a 'Series' object, so I'm not able to use .columns or .loc here.
I have a log dataset which looks like the following:- Time event 2020-08-27 07:00:00 1 2020-08-27 08:34:00 1 2020-08-27 16:42:23 1 2020-08-27 23:19:11 1 . . . I am trying to get the count of events that happened within different hourly interval (6 hours, 8 hours etc). Any ideas on how I can get it done pandas ?