How to optimize Pandas DataFrame reorganizing?
I have a DataFrame
which looks like this:
date | person | value |
---|---|---|
2022-05-01 | A | 5 |
2022-05-01 | B | 4 |
2022-05-02 | A | 5 |
2022-05-02 | B | 9 |
I want to convert it to that form:
date | person A | person B |
---|---|---|
2022-05-01 | 5 | 4 |
2022-05-02 | 5 | 9 |
I have implemented that code to colve the task:
raw_data = ... # data in the original form
people = raw_data[person].unique()
dates = raw_data[date].unique()
new_data = pd.DataFrame(columns=people, index=dates)
for person in people:
for date in dates:
new_data.loc[date, person] = raw_data[(raw_data[person] == person) (raw_data[date] == date)].values[0][2]
It works correctly, however, on a larger dataset it is so slow that it becomes impractical to use, partly because it only uses one CPU core.
How to speed it up?
Topic pandas python performance
Category Data Science