Hive query to get all rows where a particular column value lies in a particular precentile
I am trying to filter my rows in hive table named id_counts
based on percentile values.
Lets considers the following table.
+------+----------+
| id | quantity |
+------+----------+
| a01 | 234 |
| a02 | 345 |
| a03 | 23 |
+------+----------+
now lets say I want to get the rows which have quantity in the 90th percentile then what query should i give. I tried the following:
select * from id_counts having quantity= percentile(quantity, 0.9);
But it gives the error:
FAILED: SemanticException [Error 10025]: Line 1:54 Expression not in GROUP BY key '0.9'
Update: I am able to resolve this using the following query but is there a simpler method?
select * from id_count a cross join
(select percentile(quantity, 0.9) as top_ones
from id_count) aa
where a.quantity = aa.top_ones;
A simpler query would be appreciated. Is it possible that no join is required?
Topic hive
Category Data Science