Reduce in MapReduce ... Unwinding - ENGINEERING PERFORMANCES

In our previous blogs we have studied about Big data, Hadoop. We have also explained MapReduce internal workings like how Map works using short and shuffle. This blog is dedicated to Reduce in MapReduce.

Once this shuffling completed, it is where Reduce in MapReduce come into action.

Its task is to process the input provided by SHUFFLE so that user can understand what is the result of the file processed by hadoop.

Reduce in MapReduce:

After shuffling completed, it is clear that one word will be processed by only one DN and not multiple DNs. Hence, to find a count of one particular word Reduce in MapReduce has to search that particular word at one specific DN only.

The task of Reduce is to pick a word on a DN and search for all occurrences of that particular word on that DN and clubbed its number.

Elegancy of Reduce:

We will explain this with the help of tables. If we look at the picture of tables below, we can easily understand how elegantly Reduce in MapReduce execute this task.

Reduce pick a word, for example, we have taken “can” on DN -1. What it does, it searches for all “can” in this node and clubbed together at one place.

Reduce Explained:

In the picture below, it found a total 4 “can” on the DN -1. It marked all 3 in red (encircled by green circle to remove this) and add 1 to one “can” (marked in brown rectangle).

Cell Marked in red color will be reduced and clubbed at cell not marked in any color for the same word.

Marking for Clubbing of word on DN-1:

NODE – 1 [ L (K, L(V))]
(K)	V	(K)	V
achieved	1	accommodate	1
adding	1	also	1
allow	1	an	1
and	1,1	analysis	1
any	1	and	1
at	1	be	1
be	1,1	but	1
best	1	By	1
by	1,1,1	can	1
By	1	can	1
can	1,1,1,1	can	1
capability	1	commodity	1
capture	1	configured	1
commodity	1,1	cost	1
data	1,1,1,1,1	data	1
data	1	data	1
decentralize	1,1	data	1
design	1	decentralized	1
distributed	1,1	enabling	1
distributed	1	failure	1
enables	1	falut	1
ETL	1	for	1,1
every	1	for	1

Marking for Clubbing of word on DN-2:

NODE – 2 [ L (K, L(V))]

Marking for Clubbing of word on DN-3:

NODE – 3 [ L (K,L( V))]
(K)	V	(K)	V
get	1	granular	1
Hadoop	1,1,1	hardware	1
Hadoop	1	horizontal	1,1
Hadoop	1	Horizontal	1
hardware	1,1	hours	1
harnessing	1	in	1
hit	1	it	1
huge	1	its	1
is	1	less	1
its	1,1,1	level	1
its	1	low	1
limitations	1
low	1,1

Marking for Clubbing of word on DN-4:

NODE – 4 [ L (K, L(V))]
(K)	V	(K)	V
,	1,1,1	,	1
.	1,1,1	.	1
.	1	machine	1
.	1	machines	1
machines	1,1,1	maximum	1
minute	1	more	1
much	1	new	1
of	1,1,1,1,1	not	1
of	1	on	1
of	1	only	1
of	1	or	1
of	1	organization	1
on	1,1	overcome	1
one	1	parallel	1,1
organization	1,1	parallel	1
part	1	performance	1
	1,1,1	processing	1,1,1
	1	processing	1
	1	processing	1

Marking for Clubbing of word on DN-5:

NODE – 5 [ L (K,L(V))]
(K)	V	(K)	V
speeds	1	relies	1
the	1,1,1,1	scaling	1,1
the	1	scaling	1
the	1	scenarios	1
the	1	technique	1
them	1	these	1
this	1,1,1	This	1
This	1	to	1
to	1,1,1	tolerent	1
to	1	tradition	1
true	1	type	1

Marking for Clubbing of word on DN-6:

NODE – 6 [ L (K, L(V))]
(K)	V	(K)	V
up	1	using	1
use	1	very	1
With	1,1,1	waiting	1
with	1	was	1
within	1	which	1
without	1,1	with	1
without	1

Next step is to remove words which is clubbed (as marked in RED cells)

The output would be [L<K, L<V> >]

Reducing of word on DN -1 & DN -2. You can see here the cell marked in RED is removed now.

Reduce on DN-1:

NODE – 1 [ L (K, L(V))]
(K)	V	(K)	V
accommodate	1	also	1
achieved	1	an	1
adding	1	analysis	1
allow	1	but	1
and	1,1	configured	1
any	1	cost	1
at	1	decentralize	1,1
be	1,1	design	1
best	1	distributed	1,1
by	1,1,1	enables	1
can	1,1,1,1	enabling	1
capability	1	ETL	1
capture	1	every	1
commodity	1,1	failure	1
data	1,1,1,1,1	falut	1
for	1,1

Reduce on DN-2:

NODE – 2 [ L (K, L(V))]

Reduce on DN-3:

NODE – 3 [ L (K,L( V))]
(K)	V	(K)	V
get	1	granular	1
Hadoop	1,1,1	horizontal	1,1
hardware	1,1	hours	1
harnessing	1	in	1
hit	1	it	1
huge	1	less	1
is	1	level	1
its	1,1,1	limitations	1
low	1,1

Reduce on DN-4:

NODE – 4 [ L (K, L(V))]
(K)	V	(K)	V
,	1,1	maximum	1
.	1,1,1,1	more	1
machines	1,1,1	new	1
minute	1	not	1
much	1	only	1
of	1,1,1,1,1	or	1
on	1,1	overcome	1
one	1	parallel	1,1
organization	1,1	performance	1
part	1	processing	1,1,1
	1,1,1

Reduce on DN-5:

NODE – 5 [ L (K,L(V))]
(K)	V	(K)	V
speeds	1	relies	1
the	1,1,1,1	scaling	1,1
them	1	scenarios	1
this	1,1,1	technique	1
to	1,1,1	these	1
tolerent	1	true	1
tradition	1	type	1

Reduce on DN-6:

NODE – 6 [ L (K, L(V))]
(K)	V	(K)	V
up	1	using	1
use	1	very	1
With	1,1,1	waiting	1
within	1	was	1
without	1,1	which	1

Calculating words count:

Now reduce will work to reduce the output above.

The list of counts ( L<V> )mentioned in the form of 1,1,1 etc will be converted into one digit ( <V> ).

The output would be [L<K, V >]

Finally, Reduced output will be provided to the user.

WORD	COUNT	WORD	COUNT	WORD	COUNT	WORD	COUNT	WORD	COUNT	WORD	COUNT
,	2	by	3	failure	1	its	3	only	1	these	1
.	4	can	4	falut	1	less	1	or	1	this	3
accommodate	1	capability	1	for	2	level	1	organization	2	to	3
achieved	1	capture	1	get	1	limitations	1	overcome	1	tolerent	1
adding	1	commodity	2	granular	1	low	2	parallel	2	tradition	1
allow	1	configured	1	Hadoop	3	machines	3	part	1	true	1
also	1	cost	1	hardware	2	maximum	1	performance	1	type	1
an	1	data	5	harnessing	1	minute	1	processing	3	up	1
analysis	1	decentralize	2	hit	1	more	1	relies	1	use	1
and	2	design	1	horizontal	2	much	1	scaling	2	using	1
any	1	distributed	2	hours	1	new	1	scenarios	1	very	1
at	1	enables	1	huge	1	not	1	speeds	1	waiting	1
be	2	enabling	1	in	1	of	5	technique	1	was	1
best	1	ETL	1	is	1	on	2	the	4	which	1
but	1	every	1	it	1	one	1	them	1	With	3
										within	1
										without	2
											3

Conclusion:

We have just seen a “Word count” program executed by hadoop using its MAP and REDUCE methods including SORT and SHUFFLE as intermediate process by distributing loads of process to all data nodes available, coordinating among DNs, and reducing the processing of all DNs into user understandable output.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Reduce in MapReduce … Unwinding

Reduce in MapReduce:

Elegancy of Reduce:

Reduce Explained:

Calculating words count:

Conclusion:

Leave a Reply Cancel reply