PySpark for Big Data and RAM usage
I'm trying to figure out the best and most efficient method of handing ETL operations for big data. My question is this.
Say I have a table that is ~50 GB in size. In order to effectively transfer the data from this table from one source to another, specifically using PySpark, do I need to have more than 50 GB of RAM?
Thanks for your help.
Topic dataframe etl memory pyspark
Category Data Science