What is the best practice for data folder structuring?
I work for a small data science consultancy firm and we are trying to standardize our project folder structure. We started from the cookiecutter structure which is a great base.
However one of the discussion point lies in the subfolders of the data folder, which is structured as:
- Raw
- Interim
- Processed
Let's think about the following situations:
- The client gives you a manually extracted csv file -> This obviously goes into Raw
- You have acces to SQL databases and make a no-modification extract -> Still into Raw I guess?
- Because of very large databases, you create a semi-complex SQL query as base for a feature -> Is this Raw or Interim?
What are the best practices you apply? What would you recommend?
Ps: links to Github projects constructed following this kind of structure are very welcome
Topic project-planning python databases
Category Data Science