How to work with hundreds of CSVs with millions of rows in each?
So I'm doing a project on the COVID-19 Tweets dataset from the IEEE port and I plan to analyse the tweets over the time period from March 2020 till date. The thing is there's more than 300 CSVs for each data with each having millions of rows. Now I need to hydrate all of these tweets before I can go and filter through them. Hydrating just 1 CSV alone took more than two hours today.
I wanted to know if there's a more efficient way I could go about this. Is it possible to combine all the CSVs into one and then hydrate that one file for a really long time or do I have to go for a smaller dataset if it's taking this long for each file.
I'm just starting out into dealing with real-life data so you can consider me a true beginner and any help would be appreciated. Thanks!
Topic text-filter csv sentiment-analysis databases machine-learning
Category Data Science