Spark Dataframe APIs vs Spark SQL

I have a relatively complex query which runs against a database and contains multiple join statements, lead/lag functions, subquery, etc. These tables are available as individual files in my object store. I am trying to run a Spark job to perform the same query. Is it advisable to try and convert the SQL query into Spark SQL (which I was able to do by making few changes) or is it better to use dataframe APIs to reconstruct the query and execute it? Are there any considerations for choosing one over the other?

Topic data-engineering scala pyspark apache-spark sql

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.