Best way to get data into Colab Notebook

I am working concurrently with multiple very large datasets (10s-100s of GBs)). I signed up for Colab Pro+ thinking it is the best option. However, I face a significant bottleneck in getting data into Colab.

My options all seem very bad:

  1. Downloading from AWS (where the data is located) - very slow.
  2. Uploading data to Google Drive and mounting Drive using below code. This is also surprisingly very slow.
from google.colab import drive
drive.mount('/content/drive')
  1. Paying for a persistent server. Something like a persistent AWS SageMaker notebook. This is very expensive. With even a mediocre GPU, it comes out to $2k/mo.

What is the best solution here? I am ok with paying for a good option as long as its reasonable. Have any of the numerous MLOps startups developed a good solution to this?

Thanks!

Topic colab cloud-computing deep-learning aws machine-learning

Category Data Science


Try Faculty Platform. I had a good experience with large datasets(10s of GB) with them.

Faculty Platform

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.