MapReduce – Sort & Shuffle

This is in continuation of MapReduce Processing   We are going to see how the input is provided to SORT process, how this is sorted and distributed on all available DNs and this input is taken over to the next step Shuffle. This output will be input for next process which is SORT. Sort takes Read more about MapReduce – Sort & Shuffle[…]

MapReduce – Unwinding Map

In last discussion on MapReduce, we discussed the algorithm which is used by Hadoop for data processing using MapReduce. In this blog, we will discuss the specific section of MAP in MapReduce and it’s functionality. Unwinding Map We will explain this in details and with example here. Example: Lets consider our scenario : The Scenario: Read more about MapReduce – Unwinding Map[…]

MapReduce – Unwinding Algorithm

With discussion, in my last blog, about “How Hadoop manages Fault Tolerance” within its cluster while processing data, it is now time to discuss the algorithm which MapReduce uses. Name Node (NN) It is Name Node (NN) where a user submits his request to process data and submits his data files.   As soon as NN receives data Read more about MapReduce – Unwinding Algorithm[…]

MapReduce Internals: Philosophy

In our last few blogs we have explained what is BigData, How Hadoop evolved & MapReduce workings.  In this blog we will see the philosophy of MapReduce. The Philosophy: The philosophy of MapReduce internals workings is straight forward and can be summarized in 6 steps. The smaller, the better, the quicker: Whatever data we provide Read more about MapReduce Internals: Philosophy[…]

MapReduce : Internals

The MapReduce Framework: MapReduce is a programming paradigm that provides an interface for developers to map end-user requirements (any type of analysis on data) to code. This framework is one of the core components of Hadoop. The capabilities: The way it provides fault-tolerant and massive scalability across hundreds or thousands of servers in a cluster Read more about MapReduce : Internals[…]