MapReduce – Sort & Shuffle

This is in continuation of MapReduce Processing
 

We are going to see how the input is provided to SORT process, how this is sorted and distributed on all available DNs and this input is taken over to the next step Shuffle.

This output will be input for next process which is SORT. Sort takes this [L<K,V>] and sorts all the words in order of alpha bates (a to z) on each DN.
 
Sorted arrangement on DNs :
 

SORT

DN -1:

NODE – 1 [ L (K, V)]

PKT-1(K)

V

 

PKT-6(K)

V

With

1

 

much

1

the

1

 

performance

1

use

1

 

.

1

of

1

 

hit

1

parallel

1

 

By

1

processing

1

 

harnessing

1

design

1

 

the

1

Hadoop

1

 

true

1

overcome

1

 

capability

1

distributed

1

 

of

1

   

distributed

1

   

parallel

1

 
DN -2:
Node not available
 
DN -3:

NODE – 3 [ L (K, V)]

PKT-4(K)

V

 

PKT-9(K)

V

,

1

 

get

1

,

1

 

Hadoop

1

achieved

1

 

huge

1

adding

1

 

its

1

an

1

 

limitations

1

analysis

1

 

machines

1

any

1

 

minute

1

best

1

 

new

1

by

1

 

of

1

By

1

 

organization

1

can

1

 

part

1

commodity

1

 

the

1

data

1

 

this

1

enabling

1

 

with

1

   

within

1

   

without

1

    

1

   

for

1

 
DN -4:

NODE – 4 [ L (K, V)]

PKT-3(K)

V

 

PKT-8(K)

V

.

1

 

level

1

also

1

 

low

1

be

1

 

maximum

1

but

1

 

of

1

can

1

 

on

1

data

1

 

on

1

decentralized

1

 

processing

1

granular

1

 

relies

1

hardware

1

 

scaling

1

horizontal

1

 

scaling

1

Horizontal

1

 

speeds

1

its

1

 

technique

1

   

the

1

   

to

1

   

up

1

 
DN -5:

NODE – 5 [ L (K, V)]

PKT-2(K)

V

 

PKT-7(K)

V

.

1

 

enables

1

.

1

 

every

1

allow

1

 

Hadoop

1

and

1

 

its

1

at

1

 

low

1

be

1

 

machines

1

can

1

 

not

1

capture

1

 

only

1

commodity

1

 

organization

1

configured

1

 

processing

1

cost

1

 

scenarios

1

data

1

 

them

1

data

1

 

these

1

decentralize

1

 

This

1

   

This

1

   

to

1

   

to

1

   

using

1

 
DN -6:

NODE – 6 [ L (K, V)]

PKT-5(K)

V

 

PKT-10(K)

V

accommodate

1

 

data

1

and

1

 

ETL

1

can

1

 

for

1

failure

1

 

hours

1

falut

1

 

in

1

hardware

1

 

it

1

is

1

 

less

1

machine

1

 

tradition

1

more

1

 

very

1

of

1

 

waiting

1

one

1

 

was

1

or

1

 

which

1

tolerent

1

 

with

1

type

1

  

1

without

1

  

1

Once SORT completes it task of sorting on MAP output, next process SHUFFLE takes the charge.

SHUFFLE negotiates among DN and finds which DN is actually processing (counting in our scenario) which words.

Once this negotiation is done on all the nodes, shuffling of words starts among DNs. The scenario for shuffling in our case are:

DNs

Word Start from

DN-1

a-f

DN-2

Node
not available

DN-3

g-l

DN-4

m-p . , “”

DN-5

q-t

DN-6

u-z

Colors are to denote and clarify the shuffling processing. 

Lets see the shuffling on all DNs.

DN-1 says that it will process all words start with a to f, Hence all other nodes who contains words starts with letter a to f will assign back to DN-1. DN -1, in return, will assign all other node back their respective words.

This exchange process is known as shuffling.

The same happens at each DN i.e. DN -1 to DN -6. To make a better understanding we have used different colors for each DN so that representation should bring clear picture.

We have braked the process in understandable manner and represented in form of images which are indicating the way entire process progresses.


Let see one by one how shuffling takes place at each node for all other nodes.

Initially we are marking the word in DN’s color (as mentioned in above picture) which are going to pass to another DN in order to show the more clear picture of this process.

Shuffle


The output would be [L<K,V>]
Marking for shuffling on DN-1:

NODE – 1 [ L (K, V)]

PKT-1(K)

V

PKT-6(K)

V

.

1

of

1

By

1

of

1

capability

1

overcome

1

design

1

parallel

1

distributed

1

parallel

1

distributed

1

performance

1

Hadoop

1

processing

1

harnessing

1

the

1

hit

1

the

1

much

1

true

1

  

use

1

  

With

1

Marking for shuffling on DN-2: Since, it is not available, nothing assigned

NODE – 2 [ L (K, V)]

    
    
Marking for shuffling on DN-3:

NODE – 3 [ L (K, V)]

PKT-4(K)

V

PKT-9(K)

V

,

1

get

1

,

1

Hadoop

1

achieved

1

huge

1

adding

1

its

1

an

1

limitations

1

analysis

1

machines

1

any

1

minute

1

best

1

new

1

by

1

of

1

By

1

organization

1

can

1

part

1

commodity

1

the

1

data

1

this

1

enabling

1

with

1

for

1

within

1

  

without

1

   

1

Marking for shuffling on DN-4:

NODE – 4 [ L (K, V)]

PKT-3(K)

V

PKT-8(K)

V

.

1

level

1

also

1

low

1

be

1

maximum

1

but

1

of

1

can

1

on

1

data

1

on

1

decentralized

1

processing

1

granular

1

relies

1

hardware

1

scaling

1

horizontal

1

scaling

1

Horizontal

1

speeds

1

its

1

technique

1

  

the

1

  

to

1

  

up

1

Marking for shuffling on DN-5:

NODE – 5 [ L (K, V)]

PKT-2(K)

V

PKT-7(K)

V

.

1

enables

1

.

1

every

1

allow

1

Hadoop

1

and

1

its

1

at

1

low

1

be

1

machines

1

can

1

not

1

capture

1

only

1

commodity

1

organization

1

configured

1

processing

1

cost

1

scenarios

1

data

1

them

1

data

1

these

1

decentralize

1

This

1

  

This

1

  

to

1

  

to

1

  

using

1

Marking for shuffling on DN-6:

NODE – 6 [ L (K, V)]

PKT-5(K)

V

PKT-10(K)

V

accommodate

1

more

1

and

1

of

1

can

1

one

1

data

1

or

1

ETL

1

tolerent

1

failure

1

tradition

1

falut

1

type

1

for

1

very

1

hardware

1

waiting

1

hours

1

was

1

in

1

which

1

is

1

with

1

it

1

without

1

less

1

 

1

machine

1

 

1

Shuffling of Words on DN-1:

NODE – 1 [ L (K, V)]

(K)

V

 

(K)

V

achieved

1

 

accommodate

1

adding

1

 

also

1

allow

1

 

an

1

and

1

 

analysis

1

any

1

 

and

1

at

1

 

be

1

be

1

 

but

1

best

1

 

By

1

by

1

 

can

1

By

1

 

can

1

can

1

 

can

1

capability

1

 

commodity

1

capture

1

 

configured

1

commodity

1

 

cost

1

data

1

 

data

1

data

1

 

data

1

decentralize

1

 

data

1

design

1

 

decentralized

1

distributed

1

 

enabling

1

distributed

1

 

failure

1

enables

1

 

falut

1

ETL

1

 

for

1

every

1

 

for

1

Shuffling of Words on DN-2:

NODE – 2 [ L (K, V)]

 

 

 

 

 

 

 

 

 

 

Shuffling of Words on DN-3:

NODE – 3 [ L (K, V)]

(K)

V

 

(K)

V

get

1

 

granular

1

Hadoop

1

 

hardware

1

Hadoop

1

 

horizontal

1

Hadoop

1

 

Horizontal

1

hardware

1

 

hours

1

harnessing

1

 

in

1

hit

1

 

it

1

huge

1

 

its

1

is

1

 

less

1

its

1

 

level

1

its

1

 

low

1

limitations

1

 

 

 

low

1

 

 

 

Shuffling of Words on DN-4:

NODE – 4 [ L (K, V)]

(K)

V

 

(K)

V

,

1

 

,

1

.

1

 

.

1

.

1

 

machine

1

.

1

 

machines

1

machines

1

 

maximum

1

minute

1

 

more

1

much

1

 

new

1

of

1

 

not

1

of

1

 

on

1

of

1

 

only

1

of

1

 

or

1

of

1

 

organization

1

on

1

 

overcome

1

one

1

 

parallel

1

organization

1

 

parallel

1

part

1

 

performance

1

 

1

 

processing

1

 

1

 

processing

1

 

1

 

processing

1

Shuffling of Words on DN-5:

NODE – 5 [ L (K, V)]

(K)

V

 

(K)

V

speeds

1

 

relies

1

the

1

 

scaling

1

the

1

 

scaling

1

the

1

 

scenarios

1

the

1

 

technique

1

them

1

 

these

1

this

1

 

This

1

This

1

 

to

1

to

1

 

tolerent

1

to

1

 

tradition

1

true

1

 

type

1

Shuffling of Words on DN-6:

NODE – 6 [ L (K, V)]

(K)

V

 

(K)

V

up

1

 

using

1

use

1

 

very

1

With

1

 

waiting

1

with

1

 

was

1

within

1

 

which

1

without

1

 

with

1

without

1

 

 

 

With this, we have all the DNs the words which they are going to process for our word count program.

Coming up next ……… Reduce

1 thought on “MapReduce – Sort & Shuffle

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.