Pyspark Dataframes to Pandas and ML Ops - Parallel Execution Hold?
If I convert a spark dataframe into a pandas dataframe and subsequently apply pandas operations and sklearn models to the dataset in databricks, will the operations from pandas and sklearn be distributed across the cluster? Or do i have to use pyspark dataframe operations and pyspark ml packages for operations to be distributed?
Topic pyspark apache-spark pandas dataset machine-learning
Category Data Science