KMeans using Mapreduce in Python
I wrote a mapreduce code in python which works locally i.e., cat test_mapper |python mapper.py
sort the result, and cat sorted_map_output |python reducer.py
produces the desired result.
As soon as this code is submitted to the mapreduce engine, it fails:
code21/08/09 11:03:11 INFO mapreduce.Job: map 50% reduce 0%
21/08/09 11:03:11 INFO mapreduce.Job: Task Id : attempt_1628505794323_0001_m_000001_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)
...
21/08/09 11:03:21 INFO mapreduce.Job: map 100% reduce 100%
21/08/09 11:03:22 INFO mapreduce.Job: Job job_1628505794323_0001 failed with state FAILED due to: Task failed task_1628505794323_0001_m_000001
Job failed as tasks failed. failedMaps:1 failedReduces:0
/code
Topic map-reduce apache-hadoop bigdata
Category Data Science