Do I load all files at once or one at a time?
I currently have $1700+$ CSV files. Each of them is in the same format and structure, give or take a row or possibly a column at the end. Each CSV is $\approx 3.8$ MB.
I need to perform a transformation on each file
- Extract one data set, perform a summation and column select, then store inside a folder.
- Extract an array, no need for column names here, then sore inside a folder.
From an algorithmic POV, is it better to load each file, perform the actions needed and then move on to the next file?
OR
Do I load all files, perform the action on all files and then store to hard drive?
I know how to do the actual process, I am after a 20,000 feet POV of dynamical programming/optimisation.
Thanks.
Topic julia optimization dataset processing
Category Data Science