How to load numerous files from google drive into colab

I am trying to load in 30k images (600mb) from Google drive into Google Colaboratory to further process them with Keras/PyTorch.

Therefore I have first mounted my Google drive using:

from google.colab import drive
drive.mount('/content/gdrive')

Next I have unzipped the image file using:

!unzip -uq "/content/gdrive/My Drive/path.zip" -d "/content/gdrive/My Drive/path/"

Counting how many files are located in the directory using:

len(os.listdir(path-to-train-images))

I only find 13k images (whereas I should find 30k). According to the output of unzip, the files appear to be unzipped correctly.

Also, I found there are some issues with loading in many files from a google directory: https://github.com/googlecolab/colabtools/issues/510.

Does anyone know where I am going wrong? Or whether there is a workaround?

Topic kaggle dataset google machine-learning

Category Data Science


One possible option would be operate directly on the zip files using zipfile.ZipFile.

Counting the number of items in a zip file:

from contextlib import closing
from zipfile import ZipFile

with closing("/content/gdrive/My Drive/path.zip") as zip_file:
    count = len(zip_file.infolist())

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.