How can data science teams inside businesses measure costs and efficiency of their technical work?

How can data science teams measure and improve costs of their technical work, when they often don't know the monetary value of the datasets and insights they are producing? Are they using industry based benchmarks for technical development, and some subjective measurement for business insight creation?

Topic data-analysis

Category Data Science


If you want to go hard on the numbers:

  1. A/B testing: This is very popular among ecommerce companies. Data science team deploys their new method and compare with the impact of old one

It can happen that A/B testing is not possible due to variety of reasons. There exists however many more alternatives.

  1. Synthetic control method: After deployment of the data science product. A "fake" controle group is extracted from the data to understand the impact. This is one of the most famous application (https://economics.mit.edu/files/11859)

  2. Counter-factual estimations. These do no necessarily create a new controle. But based on a model of various of biases, you can estimate impact of several "what-if"scenarios. There exists many packages for this. Google's CausalImpact (https://google.github.io/CausalImpact/), Ubser causal ml (https://github.com/uber/causalml). IF you want to read more about this topic (https://eng.uber.com/causal-inference-at-uber/)

From any of these techniques, you can measure whatever KPI you wish. Increase in revenue, customer retention etc...


Adding to weareglenn's answer, you can also look for number of man-hours being saved if the data science project is doing something along the lines of automation. This estimate of XX hours (per day etc) comes from the people who used to do that process manually before your project/solution.


I don't know what you mean by "measure and improve costs of their technical work" - but I'll answer in response to the question "how do you know how much value your insights are bringing the business":

To structure a successful project it is important to understand the baseline you are competing with. For example: let's say I have a timeseries forecasting problem where I'm forecasting how much cash will be in some corporate account. One baseline might be: "let's assume that tomorrow's net cash flows will be identical to yesterday's". Now that you have this benchmark you can evaluate how performant your solution is relative to this.

This can give you insight: "my solution outperforms the benchmark by some % and increases the detection of future overdraft fees by 90%". Convert this 90% improvement into the total # of new overdraft fees detected by your system relative to the baseline. This can be translated to some dollar amount on an annual basis - and you can say "my solution saves the business $###,### annually"

Understanding the system you are replacing allows you something to compare to and can track results. This is also an effective strategy for communicating successes to management (something that is vital in larger enterprises).

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.