PySpark: How do I specify dropna axis in PySpark transformation?
I would like to drop columns that contain all null values using dropna()
. With Pandas you can do this with setting the keyword argument axis = 'columns'
in dropna()
. Here an example in a GitHub post.
How do I do this in PySpark ? dropna()
is available as a transformation in PySpark, however axis
is not an available keyword.
Note: I do not want to transpose my dataframe for this to work.
How would I drop the furniture column from this dataframe ?
data_2 = { 'furniture': [np.NaN ,np.NaN ,np.NaN], 'myid': ['1-12', '0-11', '2-12'], 'clothing': ["pants", "shoes", "socks"]}
df_1 = pd.DataFrame(data_2)
ddf_1 = spark.createDataFrame(df_1)
ddf_1.show()
Topic pyspark python data-cleaning
Category Data Science