Project structure - many projects share same large dataset

I have a bunch of projects for my job that are largely unrelated except they use the same data, which is pretty big on disk in csv format. I want these to exist separately from each other and I usually try to use the cookie cutter data science model for project structure, and keep all my data in a data folder in the root of the project.

But because this dataset is big, I don't want to have ten copies of it in the root of these ten projects. I also don't want to put them in one big project sharing it because I feel like they don't belong together.

What's the best way to structure multiple different projects that all share the same large dataset?

Topic project-planning data dataset

Category Data Science


A database is the best option to share data across projects.

Another option is version control. Check the csv into version control. It could be git, GitHub, or data specific version control system.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.