www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Agenda for today’s Session 1. What is Hadoop MapReduce? 2. MapReduce In Nutshell 3. Advantages of MapReduce 4. Hadoop MapReduce Approach with an Example 5. Hadoop MapReduce/YARN Components 6. YARN With MapReduce 7. Yarn Application Workflow 8. MapReduce Program with Hands On
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Hadoop Components 2 main Hadoop Components Storage Processing
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING MapReduce: Data Processing Using Programming Big Data Result  Hadoop MapReduce is the processing component of Apache Hadoop  It processes data parallelly in distributed environment
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING MapReduce In Nutshell MapReduce FeaturesLarge Scale Distributed Model Used in Function Design Pattern Parallel Programming A Program Model Classification Analytics Recommendation Index and Search Map Reduce Classification Eg: Top N records Analytics Eg: Join, Selection Recommendation Eg: Sort Summarization Eg: Inverted Index Implemented Google Apache Hadoop HDFS Pig Hive HBase For
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING 2 Biggest Advantages of MapReduce
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING  Data is processed in parallel  Processing becomes fast Advantage 1: Parallel Processing Slave A Slave B Slave C Slave D Slave E Master Data 
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING  Moving Data to processing is very costly  In MapReduce, we move processing to Data Advantage 2: Data Locality - Processing to Storage Slave A Slave B Slave C Slave D Slave E Data  Master
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Traditional vs MapReduce Way
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Election Votes Counting Election Votes Casting  Votes is stored at different Booths  Result Centre has the details of all the Booths Data  Booth A Booth B Booth C Booth D Booth E Result Centre
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Election Votes Counting – Traditional Way Counting – Traditional Approach  Votes are moved to Result Centre for counting  Moving all the votes to Centre is costly  Result Centre is over-burdened  Counting takes time Data  Booth A Booth B Booth C Booth D Booth E Result Centre
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Result Centre Data  Hadoop MapReduce To the Rescue! Hadoop MapReduce Doesn’t Follow This Approach Booth A Booth B Booth C Booth D Booth E
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Election Votes Counting – MapReduce Way Booth A Booth B Booth C Booth D Booth E Result Centre Counting – MapReduce Approach  Votes are counted at individual booths  Booth-wise results are sent back to the result centre  Final Result is declared easily and quickly using this way Votes
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING MapReduce In Detail
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING MapReduce Way
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Anatomy of a MapReduce Program MapReduce Map: Reduce: (K1, V1) List (K2, V2) (K2, list (V2)) List (K3, V3) Key Value
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Let us take an example to understand MapReduce Way
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING MapReduce Way – Word Count Process
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Executing a MapReduce Program
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING MapReduce Using Yarn
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING YARN – Moving beyond MapReduce BATCH (MapReduce) INTERACTIVE (Text) ONLINE (HBase) STREAMING (Storm, S4, …) GRAPH (Giraph) IN-MEMORY (Spark) HPC MPI (OpenMPI) OTHER (Search) (Weave..)
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Hadoop 2.x Daemons
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Hadoop 2.x MapReduce Yarn Components  ApplicationMaster » One per application » Short life » Coordinates and Manages MapReduce Jobs » Negotiates with Resource Manager to schedule tasks » The tasks are started by NodeManager(s)  Job History Server » Maintains information about submitted MapReduce jobs after their ApplicationMaster terminates  Client » Submits a MapReduce Job  Resource Manager » Cluster Level resource manager » Long Life, High Quality Hardware  Node Manager » One per Data Node » Monitors resources on Data Node  Container » Created by NM when requested » Allocates certain amount of resources (memory, CPU etc.) on a slave node
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING YARN Application Workflow in MapReduce
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING YARN Workflow Node Manager Node Manager Node Manager Node Manager Node Manager Node Manager Node Manager Node Manager Node Manager Node Manager Node Manager Node Manager Container 1.2 Container 1.1 Container 2.1 Container 2.2 Container 2.3 App Master 2 App Master 1 Scheduler Applications Manager (AsM) Resource Manager
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Application Workflow Execution Sequence : 1. Client submits an application Client RM NM AM 1
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Application Workflow Execution Sequence : 1. Client submits an application 2. RM allocates a container to start AM Client RM NM AM 1 2
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Application Workflow Execution Sequence : 1. Client submits an application 2. RM allocates a container to start AM 3. AM registers with RM Client RM NM AM 1 2 3
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Application Workflow Execution Sequence : 1. Client submits an application 2. RM allocates a container to start AM 3. AM registers with RM 4. AM asks containers from RM Client RM NM AM 1 2 3 4
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Application Workflow Execution Sequence : 1. Client submits an application 2. RM allocates a container to start AM 3. AM registers with RM 4. AM asks containers from RM 5. AM notifies NM to launch containers Client RM NM AM 1 2 3 4 5
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Application Workflow Execution Sequence : 1. Client submits an application 2. RM allocates a container to start AM 3. AM registers with RM 4. AM asks containers from RM 5. AM notifies NM to launch containers 6. Application code is executed in container Client RM NM AM 1 2 3 4 5 6
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Application Workflow Execution Sequence : 1. Client submits an application 2. RM allocates a container to start AM 3. AM registers with RM 4. AM asks containers from RM 5. AM notifies NM to launch containers 6. Application code is executed in container 7. Client contacts RM/AM to monitor application’s status Client RM NM AM 1 2 3 4 5 7 6
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Application Workflow Execution Sequence : 1. Client submits an application 2. RM allocates a container to start AM 3. AM registers with RM 4. AM asks containers from RM 5. AM notifies NM to launch containers 6. Application code is executed in container 7. Client contacts RM/AM to monitor application’s status 8. AM unregisters with RM Client RM NM AM 1 2 3 4 5 7 8 6
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Learning Resources  Hadoop Tutorial: www.edureka.co/blog/hadoop-tutorial  MapReduce Tutorial: www.edureka.co/blog/mapreduce-tutorial  MapReduce Interview Questions: www.edureka.co/blog/interview-questions/hadoop-interview-questions-mapreduce
www.edureka.co/big-data-and-hadoopEDUREKA HADOOP CERTIFICATION TRAINING Thank You … Questions/Queries/Feedback

MapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | Edureka