Find outliers in Hive - SemanticException

I'm trying to find some outliers on my database using HIVE and I'm using Standard Deviation technique. My query is:

SELECT ID
FROM data
WHERE ID  (AVG(ID) + STDDEV(ID))
  AND ID  (AVG(ID) - STDDEV(ID));

When I run this code I'm getting the following error:

 Error while compiling statement: FAILED: SemanticException [Error 10128]: Line 3:12 Not yet supported place for UDAF 'AVG'

How to solve this problem? Many thanks!

Topic hive statistics data-cleaning

Category Data Science


Seems like Hive doesn't let you use avg in a where clause. You can solve this with a subquery.

SELECT id
FROM 
    (SELECT id, AVG(id) as avg_id, STDDEV(id) as stddev_id FROM data)
WHERE id < avg_id + stddev_id AND id > avg_id - stddev_id

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.