Best practices for scaling data science / engineering teams

I am trying to find best practices for scaling data science teams, i.e find an efficient workflow/methodology to divide work between Software Engineers and Researchers working on a same product.

I’ll explain: both the SE and Researchers need the output produced by the others but they don’t necessarily have the same constraints.
- What’s important for a SE is: code maintainability, testing, CI/CD, refactoring codebase for improved development velocity,l as little branches as possible in the repository
- What’s important for a Researcher is: pace of experiments, management of experiments, journaling of experiments, model management and versioning, multiple Git branches for experimentation

How can we reconcile between both when working on the same Git repository, in a way that satisfy both stakeholders and make the work as efficient as possible ?

For example, researchers may be unhappy about a significant refactor of their experimentation script to a package that break down code to smaller bits of code, or may be frustrated by having to make sure their code does not break existing CI tests.

Can you think of interesting patterns (or point to interesting resources, books, blogs etc.) that help make the process smoother when both stakeholders are working in the same team / product ? Thanks much.

Topic data-science-model management deep-learning machine-learning

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.