KMeans using Mapreduce in Python

I wrote a mapreduce code in python which works locally i.e., cat test_mapper |python mapper.py sort the result, and cat sorted_map_output |python reducer.py produces the desired result.

As soon as this code is submitted to the mapreduce engine, it fails:

code21/08/09 11:03:11 INFO mapreduce.Job:  map 50% reduce 0%
21/08/09 11:03:11 INFO mapreduce.Job: Task Id : attempt_1628505794323_0001_m_000001_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)
...
21/08/09 11:03:21 INFO mapreduce.Job:  map 100% reduce 100%
21/08/09 11:03:22 INFO mapreduce.Job: Job job_1628505794323_0001 failed with state FAILED due to: Task failed task_1628505794323_0001_m_000001
Job failed as tasks failed. failedMaps:1 failedReduces:0
/code

Topic map-reduce apache-hadoop bigdata

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.