Is it possible to implement an rdd version of a for loop having map and reduce using pyspark?

Question

Is it possible to implement an rdd version of a for loop having map and reduce using pyspark?

Josef

2022年5月4日 06:59

I need to test an algorithm that computes a function on a dataframe where in each execution I drop a column and computes the function. This is a example in python pyspark but without using rdd:

df2581=spark.sparkContext.parallelize([Row(a=1 ,b=3,c=5,d=7,e=9)]).toDF()
df2581.show()
wo = df2581.rdd.flatMap(lambda x: x[1:] ).map(lambda a:print(type(a)))
wo.collect()
def f(x):
  list3 = []
  index = 0 
  list2 = x
  for j in x:
    list = array(x)
    list.remove(list[index])
    list3 = list.copy()
    index += 1
    return list3
colu= df2581.columns
def add(x,y):
  return x+y
arr =[]
for i in range(0,len(colu)):
  words = df2581.rdd.map(lambda x: x[i:] ).reduce(lambda a,b:a+b)
  sum= spark.sparkContext.parallelize(words).reduce(add)
  arr.append(sum)

I need to know if there is a possibility to perform such algorithm but using rdd ruther than for loop (pyspark)

Topic pyspark apache-spark python bigdata

Category Data Science

Is it possible to implement an rdd version of a for loop having map and reduce using pyspark?

About