What is the MapReduce application master?

From Hadoop The Definitive Guide

The whole process is illustrated in Figure 7-1. At the highest level, there are five independent entities:

• The client, which submits the MapReduce job.

• The YARN resource manager, which coordinates the allocation of compute re‐ sources on the cluster.

• The YARN node managers, which launch and monitor the compute containers on machines in the cluster.

• The MapReduce application master, which coordinates the tasks running the Map‐ Reduce job. The application master and the MapReduce tasks run in containers that are scheduled by the resource manager and managed by the node managers.

What is the MapReduce application master?

In a MapReduce program written in Java, we need three things: a map function, a reduce function, and some code with main() function to run the job. Is the MapReduce application master the code with main() function to run a map reduce job?

Thanks

Topic map-reduce apache-hadoop

Category Data Science


Here is the life-cycle of MapReduce Application Master(AM):

  • Each application running on the Hadoop cluster has its own, dedicated Application Master instance, which runs in a container on a slave node. One Application Master per application.

  • Throughout its life (while the application is running), the Application Master sends heartbeat messages to the Resource Manager with its status and the state of the application’s resource needs.

  • The Application Master oversees/supervise the full life-cycle of an application, all the way from requesting the needed containers from the Resource Manager to submitting container lease requests to the Node Manager.

  • Each application framework that’s written for Hadoop must have its own Application Master implementation. Example: MapReduce application has a specific Application Master that’s designed to execute map tasks and reduce tasks in sequence.


MapReduce Application Master coordinates the tasks running the MapReduce job. It is the main container for requesting, launching and monitoring specific resources. It negotiates resources from the ResourceManager and works with the NodeManager to execute and monitor the granted resources.

For MapReduce program, you do not need map and reduce functions. You can have map-only jobs and reduce-only jobs.

MapReduce Application Master appropriately implements the code in the main().

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.