Error when using MAX in Apache Pig (Hadoop)

I am trying to calculate maximum values for different groups in a relation in Pig. The relation has three columns patientid, featureid and featurevalue (all int). I group the relation based on featureid and want to calculate the max feature value of each group, heres the code:

grpd = GROUP features BY featureid;
DUMP grpd;
temp = FOREACH grpd GENERATE $0 as featureid, MAX($1.featurevalue) as val;

Its giving me Invalid scalar projection: grpd Exception. I read on different forums that MAX takes in a "bag" format for such functions, but when I take the dump of grpd, it shows me a bag format. Here's a small part of the output from the dump:

(5662,{(22579,5662,1)})
(5663,{(28331,5663,1),(2624,5663,1)})
(5664,{(27591,5664,1)})
(5665,{(30217,5665,1),(31526,5665,1)})
(5666,{(27783,5666,1),(30983,5666,1),(32424,5666,1),(28064,5666,1),(28932,5666,1)})
(5667,{(31257,5667,1),(27281,5667,1)})
(5669,{(31041,5669,1)})

Whats the issue ?

Topic apache-pig apache-hadoop

Category Data Science


The issue was with column addressing, heres the correct working code:

grpd = GROUP features BY featureid;
temp = FOREACH grpd GENERATE group as featureid, MAX(features.featurevalue) as val;

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.