How to make the values in different columns in the correct order based on another data frame (mapping) in Python Pandas
I am pretty new to Python and Pandas and I struggle with combining a messy dataframe from excel with a mapping. I have tried to find some solutions on the Internet, however with no success.
My first df_1 is as followed:
Product Name | Val_1 | Val_2 | Val_3 | Val_4 |
---|---|---|---|---|
Prod_1 | Level 1 | High | Yes | |
Prod_1 | Low | No | Level 2 | |
Prod_2 | Ab | Standard | No | |
Prod_2 | Bc | Non Standard | ||
Prod_2 | Non Standard | Yes | Bc | |
Prod_3 | High | Standard | ||
Prod_3 | a | Complex | Low |
As you can see the information in columns Val_1 - Val_4 are inserted in a random order. What I would like to achieve is to make all the Vals in the same order as it is in the df_mapping, so that I could merge these data frames together using eg. pd.merge and also possibly create some pivot table, etc.
The df_mapping table is as followed:
Procuct | Val_1 | Val_2 | Price |
---|---|---|---|
Prod_1 | Level 1 | High | 1 |
Prod_1 | Level 1 | Low | 2 |
Prod_1 | Level 2 | High | 3 |
Prod_1 | Level 2 | Low | 4 |
Prod_2 | Ab | Standard | 1.5 |
Prod_2 | Ab | Non Standard | 2 |
Prod_2 | Bc | Standard | 2.1 |
Prod_2 | Bc | Non Standard | 2.5 |
Prod_3 | High | Standard | 2 |
Prod_3 | High | Complex | 3 |
Prod_3 | Low | Standard | 4 |
Prod_3 | Low | Complex | 5 |
and the df_result would be as followed:
Product Name | Val_1 | Val_2 | Val_3 | Val_4 | Val_5 | Price |
---|---|---|---|---|---|---|
Prod_1 | Level 1 | High | 1 | |||
Prod_1 | Level 2 | Low | 4 | |||
Prod_2 | Ab | Standard | 1.5 | |||
Prod_2 | Bc | Non Standard | 2.5 | |||
Prod_2 | Bc | Non Standard | 2.5 | |||
Prod_3 | High | Standard | 2 | |||
Prod_3 | Low | Complex | 5 |
The Val data which is not in the mapping could be deleted from the df_result. I dealt with the problem by creating all possible variations in the mapping manually and then merging the data frames, however, the number of products and possible combinations are growing. What is more current df_result is still messy.
I would be very grateful for any support.
Topic dataframe excel pandas python
Category Data Science