Interrupted Time Series with Unevenly Distributed Samples

I'm working on causal inference using Interrupted Time Series Design. I have multiple samples per day and am selecting my analysis bandwidth based on pre-treatment RMSE on leave-on-out cross validation. I have both a treatment and a control group, which I use to obtain the baseline trends. The data is already 0 centered, with 0 being the date in which treatment/placebo administration began. The catch is that for both of my groups, I have an uneven number of samples each …
Category: Data Science

Is it possible to use roc auc metric in uplift modeling (class transformatio approach)

I do not understand why in uplift modeling (Class Transformation approach) not used ROC AUC score for changed target Z. I have a problem with a task where I tried to use this approach, but ROC AUC score have a dramatically low value. At the same time, I could not find any mention of using ROC AUC score for evaluation quality of the model which used Class transformation approach for uplift prediction. As a result I do not understand if …
Category: Data Science

Causal Inference where the treatment assignment is randomized

I have mostly worked with Observational data where the treatment assignment was not randomized. In the past, I have used PSM, IPTW to balance and then calculate ATE. My problem is: Now I am working on a problem where the treatment assignment is randomized meaning there won't be a confounding effect. But treatment and control groups have different sizes. There's a bucket imbalance. Now should I just analyze the data as it is and run statistical significance and Statistical power …
Category: Data Science

Identify causal feature in a classification model

Assume I have a model $f(x;b_1,b_2,b_3,b_4)$ which maps a 4-dimensional vector into a binary classifier e.g logistic regression with 4 parameters to create churn-classifier. Say, for instance, that $b_1 =\text{time spend on site (in minutes)}$ and $b_1=0.3$ (with no intercept) that means when "time on site increases by 1 minute the probability of churning increases with ~0.57, keeping all other variable fixed". But that does not mean, that we, on the other way round, can reduce the chance of people …
Category: Data Science

Google's Bayesian Structural Time-Series

I am attempting to get my head around Google's Causal Impact paper, which isn't completely clear to me. In the methodology part of the paper, the authors say: "The framework of our model allows us to choose from among a large set of potential controls by placing a spike-and-slab prior on the set of regression coefficients and by allowing the model to average over the set of controls ". My question is the following: For the synthetic control variables, I …
Category: Data Science

Quantifying treatment effect in Interrupted Time Series

I have a multivariate time series dataset, from which I am building an ITS (Interrupted Time Series) model by using facebook's Prophet to construct the counterfactual. Let's say I have a y variable that's affected by x1,x2,...xn. How do I quantify the treatment effect, that is the difference between the counterfactual and the post-interruption slope ? I read a couple of papers and posts stating that we needed to evaluate: change in level change in level at later time point …
Category: Data Science

Cross correlation

I am trying to find a good algo (low latency) that is able to take two time series and determine which one is leading on the other one if any. The time series do not necessarily have the same timestamp. There is a thing called the granger 'causality' test that gives an idea, but in my case (have in mind the trades from a financial asset traded on two different exchanges) I would like to think there is a player …
Category: Data Science

Treatment and Control selection in A/B Testing

I'm hoping to get a better understanding of A/B Testing design. In particular, I'm interested in understanding how treatment and control units are selected. I read that these 2 groups are selected randomly (for example, here), but then there are also approaches where after picking the treatment (either randomly or not) the control is selected based on "similarity" to the treatment group. Are both approaches valid and what's the rationale for picking one or the other? For example, Alteryx has …
Category: Data Science

How far or close would feature importance information from an ML model is from causal diagrams?

The title pretty much covers my question, but to elaborate it: given data (let's assume, for simplicity, it is good enough representation of the underlying distribution) for a binary classification problem (again, for simplicity, and to give a 'feel' of treatment and control groups), when we employ a machine learning model such as random forest, we eventually obtain feature importance from the trained model. The training has taken care of data imbalance using up or down sampling or some other …
Category: Data Science

Exploratory statistics, how to idenify and remove driver (bias)

I am looking at customer data, and created frequency tables (+histograms) for customers with different professional statuses and what the best time is to reach them. Status ranges here from employed, retired, self-employed, unemployed, blank. For each of these statuses, I expected some variation in terms of when the best time is to reach each type of customer. Intuitively and from experience e.g. employed people, on average, should be available early in the morning or early evening, while unemployed are …
Category: Data Science

difference between feature interactions and confounding variables

Let me define the problem space. I am working a binary classification problem. I am trying to build a causal model as well as predictive model. My aim is to find list of significant features (based on causal model) and use that to build a predictive model. I did refer the suggestions provided in this post and it was very much useful but I have few more questions due to my limitations with ML field. I understood from literature that …
Category: Data Science

Which are valid covariates in CausalImpact?

I am working lately with CausalImpact developed by Google. The paper described it is this one Inferring Causal Impact Using Bayesian Structural Time-Series Models In short, what you can do with CausalImpact is study the effect of a specific event in your time series. In the above paper, they mention the importance of covariates or control groups where you use together with your time series. The first is the time-series behaviour of the response itself, prior to the intervention. The …
Topic: causalimpact
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.