Exploratory data analysis (EDA) on large dataset
I am working with lots of data (we have a table that produces 30 million rows daily). What is the best way to explore it (do on EDA)? Take a frictional slicing of the data randomly (100000 rows) or select the first 100000 rows from the entire dataset or should i take all the dataset WHAT SHOULD I DO?
thanks!!!!
Topic pyspark deep-learning scikit-learn pandas machine-learning
Category Data Science