Interrupted Time Series with Unevenly Distributed Samples

I'm working on causal inference using Interrupted Time Series Design. I have multiple samples per day and am selecting my analysis bandwidth based on pre-treatment RMSE on leave-on-out cross validation. I have both a treatment and a control group, which I use to obtain the baseline trends. The data is already 0 centered, with 0 being the date in which treatment/placebo administration began.

The catch is that for both of my groups, I have an uneven number of samples each day, and the distribution of those sample are also markedly different, as per the plot below:

How should I handle building the ITS regression model? Is it proper to disregard the difference in sample frequencies and go ahead with all samples? Should I instead downsample all days to match the lowest day? Should I create a single sample per day by taking daily averages (or medians)?

My ITS model is (what I believe to be) the standard one, with a single dependent variable and has as independent variables time, exposed (a dummy for treatment/control), interrupted (a dummy for pre/post treatment), and all their interaction terms.

Topic causalimpact regression sampling class-imbalance time-series

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.