How to compute the median of a Date type of column in Spark (JAVA)
I have extracted a column from a dataset that contains Date type of values:
+-------------------+
| Created_datetime |
+-------------------+
|2019-10-12 17:09:18|
|2019-12-03 07:02:07|
|2020-01-16 23:10:08|
The Type of the column being StringType in Spark.
And i want to compute the average of these dates, for example in the above case will be 2019-12-03 07:02:07
since it is the median date of the three dates.
How to achieve that in Spark in Java?
I tried using
dataset.select(org.apache.spark.sql.functions.avg(dataset.col(Created_datetime).cast(timestamp))).first().getDouble(0)
But as it is clear, it is returning a double
value not a date;
Thanks for the help.
Topic java apache-spark
Category Data Science