Reduce in MapReduce … Unwinding

In our previous blogs we have studied about Big data, Hadoop.  We have also explained MapReduce internal workings like how Map works using short and shuffle.  This blog is dedicated to Reduce in MapReduce. Once this shuffling completed, it is where Reduce in MapReduce come into action. Its task is to process the input provided Read more about Reduce in MapReduce … Unwinding[…]

MapReduce – Sort & Shuffle

This is in continuation of MapReduce Processing   We are going to see how the input is provided to SORT process, how this is sorted and distributed on all available DNs and this input is taken over to the next step Shuffle. This output will be input for next process which is SORT. Sort takes Read more about MapReduce – Sort & Shuffle[…]

MapReduce – Unwinding Map

In last discussion on MapReduce, we discussed the algorithm which is used by Hadoop for data processing using MapReduce. In this blog, we will discuss the specific section of MAP in MapReduce and it’s functionality. Unwinding Map We will explain this in details and with example here. Example: Lets consider our scenario : The Scenario: Read more about MapReduce – Unwinding Map[…]

MapReduce : Fault Tolerance

The Fault Tolerance: Before we see the intermediate data produced by the mapper, it would be quite interesting to see the fault tolerant aspects of Hadoop with respect to MapReduce processing. The Replication Factor: Once Name node (NN) received data files which has to be processed, it splits data files to assign it to Data Read more about MapReduce : Fault Tolerance[…]