Confidential, Copyright © Quanticate Introduction to Map - Reduce Muralidharan Deenathayalan Technical Lead Muralidharan.deenathayalan@quanticate.com Apache logo are trademarks of The Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their respective owners.
Confidential, Copyright © Quanticate Agenda What is Map-Reduce? Map-Reduce architecture Advantages of Map-Reduce Frameworks available for writing Map-Reduce? WordCount – Map-Reduce Program explained How to compile Map-Reduce program using Eclipse? How to deploy Map-Reduce program? How to run Map-Reduce program? Q & A
Confidential, Copyright © Quanticate Who Am I ? 7+ years of experience in Microsoft technologies like Asp.net, C#, SQL server and SharePoint 2+ years of experience in open source technologies like Java, Alfresco and Apache Cassandra Author of Apache Cassandra Cookbook (In writing ) Csharpcorner MVP Frequent blogger
Confidential, Copyright © Quanticate What is Map-Reduce?  Generally called as Map-R program  MapReduce Map() + Reduce()  MapReduce is a programming approach to process large datasets in parallel, distributed on a cluster ( Divide and conquer). Map
Confidential, Copyright © Quanticate What is Map-Reduce? • Map: – Receives input key/value pair – Outputs intermediate key/value pair • Reduce : – Receives intermediate key/value pair – Outputs key/value pair Input Data Map Reduce Reduce Map Map Input Data
Confidential, Copyright © Quanticate Map-Reduce Architecture overview Job trackerJob tracker Task tracker Task tracker Task tracker Master node Slave node 1 Slave node 2 Slave node N Workers user Workers Workers
Confidential, Copyright © Quanticate Advantages of Map-Reduce  Distributed pattern-based searching  Distributed sorting  Web access logs  Machine Learning
Confidential, Copyright © Quanticate Framework available for writing Map-Reduce Courtesy & ©: http://blog.matthewrathbone.com/2013/01/05/a-quick-guide-to-hadoop-map-reduce-frameworks.html JAVA Cascading Crunch CLOJURE Cascalog SCALA Scrunch Scalding Scoobi R Rhadoop MICROSOFT .Net (C# / VB.net) SPECIAL (HIGH-LEVEL) Apache Hive Apache Pig RUBY Wukong Cascading Jruby PYTHON MR Job Dumbo Hadooppy Pydoop Luigi
Confidential, Copyright © Quanticate WordCount – Map-Reduce Program public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); output.collect(word, one); } } }
Confidential, Copyright © Quanticate WordCount – Map-Reduce Program public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { sum += values.next().get(); } output.collect(key, new IntWritable(sum)); } }
Confidential, Copyright © Quanticate WordCount – Map-Reduce Program public static void main(String[] args) throws Exception { JobConf conf = new JobConf(WordCount.class); conf.setJobName("wordcount"); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(Map.class); conf.setCombinerClass(Reduce.class); conf.setReducerClass(Reduce.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); JobClient.runJob(conf); }
Confidential, Copyright © Quanticate How to compile Map-Reduce program using Eclipse?  Refer Hadoop jar file from your disk  Maven is simple to use  Eclipse  Project  Build Project  No errors in the eclipse console 
Confidential, Copyright © Quanticate How to deploy Map-Reduce program?
Confidential, Copyright © Quanticate How to run Map-Reduce program?
Confidential, Copyright © Quanticate Summary  What is Map-Reduce?  Architecture of Map-Reduce?  Advantages of Map-Reduce  Frameworks available for Map-Reduce?  WordCount – Map-Reduce Program explained  Compiling WordCount Map-Reduce program using Eclipse  Deploying Map-Reduce program  Executing a Map-Reduce program
Confidential, Copyright © Quanticate Q & A
Confidential, Copyright © Quanticate References http://en.wikipedia.org/wiki/MapReduce http://hortonworks.com http://hadoop.apache.org
Confidential, Copyright © Quanticate Coding-Freaks.Net www.codingfreaks.net Quanticate OPDev Twitter https://twitter.com/quanticateopdev Twitter www.Twitter.com/muralidharand
Confidential, Copyright © Quanticate

Map Reduce introduction

  • 1.
    Confidential, Copyright ©Quanticate Introduction to Map - Reduce Muralidharan Deenathayalan Technical Lead Muralidharan.deenathayalan@quanticate.com Apache logo are trademarks of The Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their respective owners.
  • 2.
    Confidential, Copyright ©Quanticate Agenda What is Map-Reduce? Map-Reduce architecture Advantages of Map-Reduce Frameworks available for writing Map-Reduce? WordCount – Map-Reduce Program explained How to compile Map-Reduce program using Eclipse? How to deploy Map-Reduce program? How to run Map-Reduce program? Q & A
  • 3.
    Confidential, Copyright ©Quanticate Who Am I ? 7+ years of experience in Microsoft technologies like Asp.net, C#, SQL server and SharePoint 2+ years of experience in open source technologies like Java, Alfresco and Apache Cassandra Author of Apache Cassandra Cookbook (In writing ) Csharpcorner MVP Frequent blogger
  • 4.
    Confidential, Copyright ©Quanticate What is Map-Reduce?  Generally called as Map-R program  MapReduce Map() + Reduce()  MapReduce is a programming approach to process large datasets in parallel, distributed on a cluster ( Divide and conquer). Map
  • 5.
    Confidential, Copyright ©Quanticate What is Map-Reduce? • Map: – Receives input key/value pair – Outputs intermediate key/value pair • Reduce : – Receives intermediate key/value pair – Outputs key/value pair Input Data Map Reduce Reduce Map Map Input Data
  • 6.
    Confidential, Copyright ©Quanticate Map-Reduce Architecture overview Job trackerJob tracker Task tracker Task tracker Task tracker Master node Slave node 1 Slave node 2 Slave node N Workers user Workers Workers
  • 7.
    Confidential, Copyright ©Quanticate Advantages of Map-Reduce  Distributed pattern-based searching  Distributed sorting  Web access logs  Machine Learning
  • 8.
    Confidential, Copyright ©Quanticate Framework available for writing Map-Reduce Courtesy & ©: http://blog.matthewrathbone.com/2013/01/05/a-quick-guide-to-hadoop-map-reduce-frameworks.html JAVA Cascading Crunch CLOJURE Cascalog SCALA Scrunch Scalding Scoobi R Rhadoop MICROSOFT .Net (C# / VB.net) SPECIAL (HIGH-LEVEL) Apache Hive Apache Pig RUBY Wukong Cascading Jruby PYTHON MR Job Dumbo Hadooppy Pydoop Luigi
  • 9.
    Confidential, Copyright ©Quanticate WordCount – Map-Reduce Program public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); output.collect(word, one); } } }
  • 10.
    Confidential, Copyright ©Quanticate WordCount – Map-Reduce Program public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { sum += values.next().get(); } output.collect(key, new IntWritable(sum)); } }
  • 11.
    Confidential, Copyright ©Quanticate WordCount – Map-Reduce Program public static void main(String[] args) throws Exception { JobConf conf = new JobConf(WordCount.class); conf.setJobName("wordcount"); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(Map.class); conf.setCombinerClass(Reduce.class); conf.setReducerClass(Reduce.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); JobClient.runJob(conf); }
  • 12.
    Confidential, Copyright ©Quanticate How to compile Map-Reduce program using Eclipse?  Refer Hadoop jar file from your disk  Maven is simple to use  Eclipse  Project  Build Project  No errors in the eclipse console 
  • 13.
    Confidential, Copyright ©Quanticate How to deploy Map-Reduce program?
  • 14.
    Confidential, Copyright ©Quanticate How to run Map-Reduce program?
  • 15.
    Confidential, Copyright ©Quanticate Summary  What is Map-Reduce?  Architecture of Map-Reduce?  Advantages of Map-Reduce  Frameworks available for Map-Reduce?  WordCount – Map-Reduce Program explained  Compiling WordCount Map-Reduce program using Eclipse  Deploying Map-Reduce program  Executing a Map-Reduce program
  • 16.
  • 17.
    Confidential, Copyright ©Quanticate References http://en.wikipedia.org/wiki/MapReduce http://hortonworks.com http://hadoop.apache.org
  • 18.
    Confidential, Copyright ©Quanticate Coding-Freaks.Net www.codingfreaks.net Quanticate OPDev Twitter https://twitter.com/quanticateopdev Twitter www.Twitter.com/muralidharand
  • 19.