Pandas Groupby datetime by multiple hours

I have a log dataset which looks like the following:-

   Time                 event
2020-08-27 07:00:00       1
2020-08-27 08:34:00       1
2020-08-27 16:42:23       1 
2020-08-27 23:19:11       1
             .
             .
             . 

I am trying to get the count of events that happened within different hourly interval (6 hours, 8 hours etc). Any ideas on how I can get it done pandas ?

Topic groupby time-series pandas

Category Data Science


Alternatively, you can use pd.cut to create your desired bins and then count your observations grouped by the created bins.

from faker import Faker
from datetime import datetime as dt
import pandas as pd

# Create sample dataframe
fake = Faker()
n = 100
start = dt(2020, 1, 1, 7, 0, 0)
end = dt(2020, 1, 1, 23, 0, 0)
df = pd.DataFrame({"datetime": [fake.date_time_between(start_date=start, end_date=end) for _ in range(n)]})

# Create bins
bins = pd.date_range(start, end, freq='2H')
df["bins"] = pd.cut(df.datetime, bins=bins)

# Count grouped observations
df.groupby("bins").count().rename(columns={"datetime": "counts"})
```

Another solution is to give the dates between which you want to count the events, extract the indexes and compute the sum of events.

Here is a code sample:

import pandas as pd
import numpy as np

# This is only for fake dataframe generation
def pp(start, end, n):
    start_u = start.value//10**9
    end_u = end.value//10**9

    return pd.DatetimeIndex((10**9*np.random.randint(start_u, end_u, n, dtype=np.int64)).view('M8[ns]'))

start = pd.to_datetime('2015-01-01 00:00:00')
end = pd.to_datetime('2015-01-01 23:59:59')
fake_date = pp(start, end, 100)

# Fake dataframe
df = pd.DataFrame(data=np.random.randint(0, 5, 100), index=fake_date, columns=['event'])

# Date between which you want to compute the number of event
start_d = pd.to_datetime('2015-01-01 06:00:00')
end_d = pd.to_datetime('2015-01-01 12:00:00')

indexNames = df[ (df.index >= start_d) & (df.index <= end_d) ].index
print( df['event'].loc[indexNames].sum() )
```

You can use the Grouper function. With the freq argument, you can set the time interval. The example is for 6 hours.

dfcounts = df.groupby(pd.Grouper(freq='6H')).count()

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.