What's your ideal work environment?

I'm a founder in a data science heavy startup, and I'm currently functioning as the entire dev team. Before I know it we'll have people working together on a project I, currently, work on completely alone. So:

  • What are some must have things data scientists need to work together in a production setting?
  • Where are some things data scientist expect to have done outside their scope of work?
  • What would make their lives easier and more productive?
  • What are some things companies often do, which I should or shouldn't do?
  • Which devs, which aren't data scientists, do you rely on the most, and how do you work together?

These questions are to inspire thought, I'm open to (and would greatly appreciate) general feedback.

Edit: I'll let the community decide who gets the Rep via votes. While I think we could all benefit from the discussion here, I invite you to vote for the answer(s) you think are the most helpful such that the writer can be rewarded.

Topic tools

Category Data Science


Being in a startup, you don't have the liberty to make many mistakes and loose time in experimentation. My advise would be to focus on the following aspects:

  1. Time to Production is utmost: Don't wait to be perfect, Implement something that starts delivering value for your customers at an early stage.
  2. Shorter iterations and early feedback while in development phase. keep improving and be in constant touch with your stake holders.
  3. Other Skills: Software Development, Data Engineering, ML-Ops and Cloud support: Sooner you realize the importance, good for your growth in medium to longer term. Keep in mind, great DS people are rarely good production level software developers. You need to create team around the value producing part of your startup.
  4. Documentation: Version control, scientific framework and onboarding: Helps you with both reproducibility, collaboration, workflow management and quickly bring new members up to speed.

All the very best!


To Do:

  1. Good Infrastructure : A data science team required a lot computing power and good infrastructure if they work on Big Data, ML problems etc. You should have machines with good computing power and ideally access to any cloud for better performance. A good data science team may not work efficiently if lack of resources

  2. Data Engineering Team : Most of data scientist can access data from databases but it is not fair to expect them to create Data Pipeline. So hire good data engineers who can help you ingest the data and create pipeline which can be used by Data Scientist. This may also include a thorough documentation of data to ease the process.

  3. Good MLOPs Practices : Data Scientist tries 100s of different ideas before implementing a final thing. It becomes very important to properly document these ideas for reproducibility and collaboration. A good company should encourage use of tools like MLFlow, Tensorboard etc for tracking. for more details : https://towardsdatascience.com/mlops-practices-for-data-scientists-dbb01be45dd8

  4. Code Sharing & Version Control : Every data science team should have best practices around code sharing and version control. This helps teams to collaborate more effectively and efficiently. Please refer to this link : https://neptune.ai/blog/version-control

  5. Problem First Approach : Sometimes for data scientist it becomes very important to understand what are the burning business problem. A data scientist should always be encouraged to solve business problems which brings value to the business. You may take ideas from this blog : https://www.ibm.com/garage/method/practices/discover/business-problem-to-ai-data-science-solution/

  6. CheckPoints & Discussion : Given the nature of business problems it is very difficult for data scientist to come up with right solution. Ensure that data science team and stakeholders have continuous checkpoints to ensure that everything is in right direction otherwise it can sometime lead to solutions which business was not expecting

  7. Help Them Improve Skill : Good data scientist thrive on learning. A good organisation should provide enough opportunities to learn and grow their skills. This will help them to stay happy and grow with you.

To Avoid:

  1. Expecting Unicorn Data Scientist : Some companies expect data scientist to do data engineering, model development and prodcutionalize the model. For more details : https://www.infoworld.com/article/3429185/stop-searching-for-that-data-science-unicorn.html

  2. Avoid Agile for Data Science Projects: Agile methodology for data science projects works in theory, but not in reality, in my humble opinion. The agile approach has been popularized by the success of agile software development and it relies on short-term deliverables aka sprints that allow teams to show progress frequently and adapt quickly. However, when we need to do research or explore data for unknown insights, the agile methodology does not work because we cannot predefine nor schedule these activities with certainty. Please refer to this link : https://towardsdatascience.com/6-reasons-why-i-think-agile-data-science-does-not-work-ee4dd680bb59#:~:text=The%20main%20problem%20with%20agile,pieces%20of%20data%20science%20code.


interesting question and it's a very wide range question that cannot be answered fully and exactly. I would like to share with you some thoughts and experiences I have made and hope this helps:

1. Everything as code approach: Try to have as much as possible as code, it will simplify so much in the long run for everyone in the team from dev to production (e.g. Infrastructure as Code, Environment as Code, Configuration as Code, Data Pipelines as Code etc.)

2. Version control at the core (e.g. Git + Github): Once you follow the everything as code approach, how you approach the code versioning as a team becomes extremely important. Try to find a working model and git strategy that fits the size of your team.

3. Establish a data team: In small startups one might end up doing a lot more compared to as classical data scientists in bigger companies. But try to balance out the strengths in the team on specific domains like data engineering, software development and devops/mlops.

4. Focus and facilitate: Once you grow, things can get very complicated (not only code). So I would recommend to focus early on what is the core product you are working on and to not get distracted. Use tools and services that facilitate your work and development. e.g. consider using the cloud and don't manage everything yourself.

5. Stay agile and fast: use appropriate agile tools and methods to not loose speed and establish a solid foundation for collaboration early on.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.