Efficient way to create matrix that shows if data exits per day
So I have a dataset containing different ID's and the time the data was created.
ID Date
0 123123 2021-03-24 12:43:13.494000+00:00
1 123412 2021-03-24 12:43:13.494000+00:00
2 123123 2021-03-24 12:43:15.935000+00:00
3 234234 2021-03-24 12:43:15.935000+00:00
4 432424 2021-03-24 12:43:13.494000+00:00
The goal should be to validate that there is at least one data row for every id for every given day. What I did so far is converting the timestamps to dates like this:
0 2021-03-24
1 2021-03-24
2 2021-03-24
3 2021-03-24
4 2021-03-24
Now I missing a solution to create a matrix that tells me if for the given ID on the given date a data row of this data exists or not: The data frame creation looks like this:
df = pd.DataFrame(index=df['id'].unique(),columns=df['date'].sort_values().unique())
which creates this matrix:
2021-03-19 2021-03-20 2021-03-21 2021-03-22 2021-03-23 2021-03-24 2021-03-25
12341 NaN NaN NaN NaN NaN NaN NaN
12312 NaN NaN NaN NaN NaN NaN NaN
12324 NaN NaN NaN NaN NaN NaN NaN
12345 NaN NaN NaN NaN NaN NaN NaN
12345 NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ...
12399 NaN NaN NaN NaN NaN NaN NaN
12394 NaN NaN NaN NaN NaN NaN NaN
34567 NaN NaN NaN NaN NaN NaN NaN
98764 NaN NaN NaN NaN NaN NaN NaN
10023 NaN NaN NaN NaN NaN NaN NaN
sure I could just use loops to now fill the values, however this would be a a really inefficient way to do it.. I think there is a better way and I am super sure somebody can tell me how so I can learn it for the future.
Thank you very much!
Category Data Science