Best way to get data into Colab Notebook
I am working concurrently with multiple very large datasets (10s-100s of GBs)). I signed up for Colab Pro+ thinking it is the best option. However, I face a significant bottleneck in getting data into Colab.
My options all seem very bad:
- Downloading from AWS (where the data is located) - very slow.
- Uploading data to Google Drive and mounting Drive using below code. This is also surprisingly very slow.
from google.colab import drive
drive.mount('/content/drive')
- Paying for a persistent server. Something like a persistent AWS SageMaker notebook. This is very expensive. With even a mediocre GPU, it comes out to $2k/mo.
What is the best solution here? I am ok with paying for a good option as long as its reasonable. Have any of the numerous MLOps startups developed a good solution to this?
Thanks!
Topic colab cloud-computing deep-learning aws machine-learning
Category Data Science