Reduce in MapReduce … Unwinding

Posted on July 14, 2017March 23, 2024 by sanjeeva

In our previous blogs we have studied about Big data, Hadoop. We have also explained MapReduce internal workings like how Map works using short and shuffle. This blog is dedicated to Reduce in MapReduce. Once this shuffling completed, it is where Reduce in MapReduce come into action. Its task is to process the input provided Read more about Reduce in MapReduce … Unwinding[…]

MapReduce – Sort & Shuffle

Posted on July 14, 2017January 18, 2022 by sanjeeva

This is in continuation of MapReduce Processing We are going to see how the input is provided to SORT process, how this is sorted and distributed on all available DNs and this input is taken over to the next step Shuffle. This output will be input for next process which is SORT. Sort takes Read more about MapReduce – Sort & Shuffle[…]

MapReduce – Unwinding Map

Posted on July 14, 2017March 23, 2024 by sanjeeva

In last discussion on MapReduce, we discussed the algorithm which is used by Hadoop for data processing using MapReduce. In this blog, we will discuss the specific section of MAP in MapReduce and it’s functionality. Unwinding Map We will explain this in details and with example here. Example: Lets consider our scenario : The Scenario: Read more about MapReduce – Unwinding Map[…]

MapReduce – Unwinding Algorithm

Posted on July 12, 2017March 23, 2024 by sanjeeva

With discussion, in my last blog, about “How Hadoop manages Fault Tolerance” within its cluster while processing data, it is now time to discuss the algorithm which MapReduce uses. Name Node (NN) It is Name Node (NN) where a user submits his request to process data and submits his data files. As soon as NN receives data Read more about MapReduce – Unwinding Algorithm[…]

MapReduce : Fault Tolerance

Posted on July 12, 2017January 17, 2022 by sanjeeva

The Fault Tolerance: Before we see the intermediate data produced by the mapper, it would be quite interesting to see the fault tolerant aspects of Hadoop with respect to MapReduce processing. The Replication Factor: Once Name node (NN) received data files which has to be processed, it splits data files to assign it to Data Read more about MapReduce : Fault Tolerance[…]

MapReduce Internals: Philosophy

Posted on July 12, 2017March 23, 2024 by sanjeeva

In our last few blogs we have explained what is BigData, How Hadoop evolved & MapReduce workings. In this blog we will see the philosophy of MapReduce. The Philosophy: The philosophy of MapReduce internals workings is straight forward and can be summarized in 6 steps. The smaller, the better, the quicker: Whatever data we provide Read more about MapReduce Internals: Philosophy[…]

MapReduce : Internals

Posted on July 12, 2017March 23, 2024 by sanjeeva

The MapReduce Framework: MapReduce is a programming paradigm that provides an interface for developers to map end-user requirements (any type of analysis on data) to code. This framework is one of the core components of Hadoop. The capabilities: The way it provides fault-tolerant and massive scalability across hundreds or thousands of servers in a cluster Read more about MapReduce : Internals[…]

HDFS Architecture : Explained

Posted on July 12, 2017December 17, 2021 by sanjeeva

Inspired by Google’s GFS,
HDFS (Hadoop Distributed File System ) architecture has designed in such a way that it has excellent fault-tolerant and self-healing features. It enables Hadoop to harness the true capability of distributed processing techniques…..

Magic of Hadoop

Posted on July 12, 2017March 23, 2024 by sanjeeva

Disadvantage of DWH: Because of the limitation of currently available Enterprise data warehousing tools, Organizations were not able to consolidate their data at one place to maintain faster data processing. Here comes the magic of hadoop for their rescue. Traditional ETL tools may take hours, days and sometimes even weeks. And because of this, performances Read more about Magic of Hadoop[…]

Journey of Hadoop

Posted on July 12, 2017December 8, 2021 by sanjeeva

History of Hadoop: At the outset of twenty-first century, somewhere 1999-2000, due to increasing popularity of XML and JAVA, internet was evolving faster than ever. This leads to the invention of Hadoop. Requirement is mother of invention: As the world wide web grew at dizzying pace, though current search engine technologies were working fine, a Read more about Journey of Hadoop[…]

Big Data: Introduction and 4V’s

Posted on July 10, 2017November 29, 2021 by sanjeeva

Innovations in technologies made the resources cheaper than earlier. This enables organizations to store more data at lower cost and thus increasing the size of data. Gradually the size of data becomes bigger and now it moves from Megabytes (MB) to Petabytes (1e+9 MB). This huge increase in data requires some different kind of processing. Read more about Big Data: Introduction and 4V’s[…]

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.