Word count with map reduce

Question

Word count with map reduce

def __init__

2021年10月4日 12:02

Suppose we use an input file that contains the following lyrics from a famous song:

We’re up all night till the sun

We’re up all night to get some

The input pairs for the Map phase will be the following: (0, We’re up all night to the sun) (31, We’re up all night to get some) The key is the byte offset starting from the beginning of the file. While we won’t need this value in Word Count, it is always passed to the Mapper by the Hadoop framework. The byte offset is a number that can be large if there are many lines in the file.

What will the output pairs look like?
What will be the types of keys and values of the input and output pairs in the Map phase?

my solution for this is: the out put will look like

(We’re,1) (up,1) (all,1) (night,1) (till,1) (the,1) (sun,1)

(we're,1) (up,1) (all,1) (night,1) (to,1) (get,1) (some,1)

the input type of key is: INT

the input type value is : VARCHAR

please help is my approach correct, I know it's a very small question but your answer will give me boost in confidence and I can step up for learning.

Topic sentiment-analysis map-reduce apache-hadoop bigdata

Category Data Science

Word count with map reduce

About