does storing file in hdfs parallelize it for Spark?
For Spark's RDD operations, data must be in shape of RDD or be parallelized using:
ParallelizedData = sc.parallelize(data)
My question is that if I store data in HDFS, does it get parallelized automatically or I should use code above for using data in Spark? Does storing data in HDFS makes it in shape of RDD?
Topic apache-spark apache-hadoop bigdata
Category Data Science