Test and Control analysis to measure the impact of Change in sales-rep for territories
I hope you all are doing well.
Before I proceed with my problem statement, a few terminologies for reference -
- Territory = Sales Territory - Think of it like a county/region assigned to a particular rep and no overlap of area/customers between 2 Reps
- Rep/ Sales Rep = Sales Representative who visits customers to convert sales
Calls = Number of times a customer is visited by the rep in a month
Goal Attainment = % of Target achieved for the month - e.g. if Target was 500 units and total sales were 600, Attainment is 120%
I am working on a statistical/data-science problem and would like to get some thoughts on how to approach the hypothesis testing.
We have some attributes at Territory level, viz. - Total Sales
, New to Brand Sales
, Total Calls (by Reps)
and % Goal Attainment
, all rolled-up at Territory-Month level. I have data for 2-years, Jan 2018 to Jan 2020.
Now the problem I want to solve is to do a test control analyses to see if the Sales Rep change has any impact on the territory performance (sales) or not. The Test group would be a set of 30 Territories who have undergone a Rep change in the last 2 years (Rep changed at least 6 months ago, i.e. no later than July 2019) and the control group having similar territories without any change in Sales Rep for last 2 years.
I want to get some thoughts on how to find a matching control pair for each test territory. I have a list of 107 Territories with 30 having a rep change (basically test group) and remaining 77 available to form a control group. Since Sales
are my target variable, I'm thinking of creating a composite score on normalized Calls
and Goal Attainment
and calculate distance from Mean (or Mean-Squared value) and pair the test territories with control ones having the least distance from mean for the pair.
After my test and control group is formed, I want to conduct hypothesis testing, the null hypothesis being Rep Change doesn't impact sales - for this, I'm planning to use two-tailed t-test (for n= 30) at 95% significance level. I would really appreciate your thoughts on this approach and If I could do anything else for robust testing.
Topic data-analysis experiments r
Category Data Science