MapReduce HEART OF HADOOP
What is Map Reduce  MapReduce is a programming model designed for processing large volumes of data in parallel by dividing the work into a set of independent tasks.  MapReduce programs are written in a particular style inf luenced by functional programming constructs, specifically idioms for processing lists of data.
Map and Reduce  Conceptually, MapReduce programs transform lists of input data elements into lists of output data elements. A MapReduce program will do this twice, using two different list processing idioms: map, and reduce.
Dataflow  A mapreduce job is the a unit of work that the client wants to be performed it consist of the input data and theMapReduce Program and configuration info .  Hadoop runs the job by dividing it into tasks ,of which there are two type :map tasks and the reduce tasks
MapReduce How much data is getting processed at a time
MapReduce
Map Only Job
Combiner  The Combiner is a "mini-reduce" process which operates only on data generated by one machine.  Word count is a prime example for where a Combiner is useful. The Word Count program in listings 1--3 emits a (word, 1) pair for every instance of every word it sees. So if the same document contains the word "cat" 3 times, the pair ("cat", 1) is emitted three times; all of these are then sent to the Reducer.
FAULT TOLERANCE  One of the primary reasons to use Hadoop to run your jobs is due to its high degree of fault tolerance.  The primary way that Hadoop achieves fault tolerance is through restarting tasks. Individual task nodes (TaskTrackers) are in constant communication with the head node of the system, called the JobTracker. If a TaskTracker fails to communicate with the JobTracker for a period of time (by default, 1 minute), the JobTracker will assume that the TaskTracker in question has crashed. The JobTracker knows which map and reduce tasks were assigned to each TaskTracker. .
Hadoop Streaming  Hadoop provides an API to MapReduce that allows you to write your map and reduce functions in languages other than Java. Hadoop Streaming uses Unix standard streams as the interface between Hadoop and your program, so you can use any language that can read standard input and write to standard output to write your MapReduce program.
MapReduce basic

MapReduce basic

  • 1.
  • 2.
    What is MapReduce  MapReduce is a programming model designed for processing large volumes of data in parallel by dividing the work into a set of independent tasks.  MapReduce programs are written in a particular style inf luenced by functional programming constructs, specifically idioms for processing lists of data.
  • 3.
    Map and Reduce  Conceptually, MapReduce programs transform lists of input data elements into lists of output data elements. A MapReduce program will do this twice, using two different list processing idioms: map, and reduce.
  • 4.
    Dataflow  Amapreduce job is the a unit of work that the client wants to be performed it consist of the input data and theMapReduce Program and configuration info .  Hadoop runs the job by dividing it into tasks ,of which there are two type :map tasks and the reduce tasks
  • 5.
    MapReduce How muchdata is getting processed at a time
  • 6.
  • 8.
  • 9.
    Combiner  TheCombiner is a "mini-reduce" process which operates only on data generated by one machine.  Word count is a prime example for where a Combiner is useful. The Word Count program in listings 1--3 emits a (word, 1) pair for every instance of every word it sees. So if the same document contains the word "cat" 3 times, the pair ("cat", 1) is emitted three times; all of these are then sent to the Reducer.
  • 10.
    FAULT TOLERANCE One of the primary reasons to use Hadoop to run your jobs is due to its high degree of fault tolerance.  The primary way that Hadoop achieves fault tolerance is through restarting tasks. Individual task nodes (TaskTrackers) are in constant communication with the head node of the system, called the JobTracker. If a TaskTracker fails to communicate with the JobTracker for a period of time (by default, 1 minute), the JobTracker will assume that the TaskTracker in question has crashed. The JobTracker knows which map and reduce tasks were assigned to each TaskTracker. .
  • 11.
    Hadoop Streaming Hadoop provides an API to MapReduce that allows you to write your map and reduce functions in languages other than Java. Hadoop Streaming uses Unix standard streams as the interface between Hadoop and your program, so you can use any language that can read standard input and write to standard output to write your MapReduce program.