www.edureka.co/big-data-and-hadoop When not to use Hadoop View Big Data and Hadoop Course at: http://www.edureka.co/big-data-and-hadoop
www.edureka.co/big-data-and-hadoopSlide 2 Objectives At the end of this module, you will be able to…  Understand When not to use Hadoop » Real Time Analytics » Not a Replacement » Dataset Size » Complexity » Security  Understand When to use Hadoop » Huge Unstructured Datasets » Response Time is Not an Issue » Future Planning » Multiple Frameworks for Big Data » Lifetime Data Availability
Slide 3Slide 3 www.edureka.co/big-data-and-hadoopSlide 3 Hadoop Mania
Slide 4Slide 4 www.edureka.co/big-data-and-hadoopSlide 4 When Not To Use Hadoop
Slide 5Slide 5 www.edureka.co/big-data-and-hadoopSlide 5  If you want to do some Real Time Analytics, where you are expecting result quickly, Hadoop should not be used directly  Hadoop works on Batch processing, hence response time is high Day 1 Day 2 Day 3 Day 4 ......... ………. ………. Day n Day 1 Day 2 Day 3 Day 4 ......... ………. ………. Day n Input Data Processing Data Input Data Processing Data Input Data Processing Data Input Data Processing Data using MR Time Lag Real Time Analytics
Slide 6Slide 6 www.edureka.co/big-data-and-hadoopSlide 6 Real Time Analytics – Accepted Way Streaming Data Storing
Slide 7Slide 7 www.edureka.co/big-data-and-hadoopSlide 7 14 sec 0.6 sec Real Time Analytics – Accepted Way
Slide 8Slide 8 www.edureka.co/big-data-and-hadoopSlide 8  Hadoop is not a replacement for your existing data processing infrastructure  After processing the data in Hadoop you need to send the output to relational database technologies today for BI, decision support, reporting etc  It’s not going to replace your database, but your database isn’t likely to replace Hadoop either  Different tools for different jobs Not a Replacement for Existing Infrastructure
Slide 9Slide 9 www.edureka.co/big-data-and-hadoopSlide 9  Hadoop framework is not recommendable for small structured datasets as you have other tools available in market which can do this work quite easily and at a fast pace than Hadoop like MS excel, RDBMS etc  For a small data analytics, Hadoop can be costlier than other tools Merge all the small files into one Multiple Smaller Datasets – Accepted Way
Slide 10Slide 10 www.edureka.co/big-data-and-hadoopSlide 10 Multiple Smaller Datasets – Accepted Way 4225284 Each file of x MB Slow Execution – 10400 ms 4225284 All the above files merged into one file (9x MB) Fast Execution – 6140 ms Same OutputSame Input
Slide 11Slide 11 www.edureka.co/big-data-and-hadoopSlide 11  Unless you have a better understanding of the Hadoop framework, its not suggested to use Hadoop for production  Learning Hadoop and it eco-system tools and deciding which technology suits your need is again a different level of complexity Novice Hadoopers
Slide 12Slide 12 www.edureka.co/big-data-and-hadoopSlide 12  Many enterprises — especially within highly regulated industries dealing with sensitive data— aren’t able to move as quickly as they would like towards implementing Big Data projects and Hadoop “Example Health-care data used by Insurance companies to calculate premium” Where Security is the Primary Concern? They don’t have to hesitate though, as many of the security and compliance challenges are being continuously worked upon and can be surmountable (for example, by using Apache Accumulo on top of Hadoop).
Slide 13Slide 13 www.edureka.co/big-data-and-hadoopSlide 13 Where security is the primary concern – Accepted way Healthcare Data Hadoop Analytic Integration Healthcare Data Hadoop Analytic Integration
Slide 14Slide 14 www.edureka.co/big-data-and-hadoopSlide 14 When To Use Hadoop
Slide 15Slide 15 www.edureka.co/big-data-and-hadoopSlide 15  Your have different types of data : structured, semi-structured and unstructured  The data set is huge in size i.e. several Terabytes or Petabytes  You are not in a hurry for Answers Data Size and Data Diversity
Slide 16Slide 16 www.edureka.co/big-data-and-hadoopSlide 16  To implement Hadoop on you data you should first understand the level of complexity of data and the rate it is going to grow  So we need a cluster planning, its may begin with building a small or medium cluster in your industry as per data (in GBs or few TBs ) available at present and scale up your cluster in future depending on the growth of your data Future Planning
Slide 17Slide 17 www.edureka.co/big-data-and-hadoopSlide 17  Hadoop can be integrated with multiple analytic tools to get the best out of it, like M-Learning, R , Python, Spark, MongoDB etc. Multiple Frameworks for Big Data
Slide 18Slide 18 www.edureka.co/big-data-and-hadoopSlide 18  When you want your data to be live and running forever, it can be achieved using Hadoop’s scalability Lifetime Data Availability
Slide 19Slide 19 www.edureka.co/big-data-and-hadoopSlide 19
LIVE Online Class Class Recording in LMS 24/7 Post Class Support Module Wise Quiz Project Work Verifiable Certificate Slide 20 www.edureka.co/big-data-and-hadoop How it Works?
Slide 21Slide 21 www.edureka.co/big-data-and-hadoopSlide 21  Module 1 » Understanding Big Data and Hadoop  Module 2 » Hadoop Architecture and HDFS  Module 3 » Hadoop MapReduce Framework - I  Module 4 » Hadoop MapReduce Framework - II  Module 5 » Advance MapReduce Course Topics  Module 6 » PIG  Module 7 » HIVE  Module 8 » Advance HIVE and HBase  Module 9 » Advance HBase  Module 10 » Oozie and Hadoop Project
Slide 22 Questions Twitter @edurekaIN, Facebook /edurekaIN, use #askEdureka for Questions
5 Scenarios: When To Use & When Not to Use Hadoop

5 Scenarios: When To Use & When Not to Use Hadoop

  • 1.
    www.edureka.co/big-data-and-hadoop When not touse Hadoop View Big Data and Hadoop Course at: http://www.edureka.co/big-data-and-hadoop
  • 2.
    www.edureka.co/big-data-and-hadoopSlide 2 Objectives At theend of this module, you will be able to…  Understand When not to use Hadoop » Real Time Analytics » Not a Replacement » Dataset Size » Complexity » Security  Understand When to use Hadoop » Huge Unstructured Datasets » Response Time is Not an Issue » Future Planning » Multiple Frameworks for Big Data » Lifetime Data Availability
  • 3.
    Slide 3Slide 3www.edureka.co/big-data-and-hadoopSlide 3 Hadoop Mania
  • 4.
    Slide 4Slide 4www.edureka.co/big-data-and-hadoopSlide 4 When Not To Use Hadoop
  • 5.
    Slide 5Slide 5www.edureka.co/big-data-and-hadoopSlide 5  If you want to do some Real Time Analytics, where you are expecting result quickly, Hadoop should not be used directly  Hadoop works on Batch processing, hence response time is high Day 1 Day 2 Day 3 Day 4 ......... ………. ………. Day n Day 1 Day 2 Day 3 Day 4 ......... ………. ………. Day n Input Data Processing Data Input Data Processing Data Input Data Processing Data Input Data Processing Data using MR Time Lag Real Time Analytics
  • 6.
    Slide 6Slide 6www.edureka.co/big-data-and-hadoopSlide 6 Real Time Analytics – Accepted Way Streaming Data Storing
  • 7.
    Slide 7Slide 7www.edureka.co/big-data-and-hadoopSlide 7 14 sec 0.6 sec Real Time Analytics – Accepted Way
  • 8.
    Slide 8Slide 8www.edureka.co/big-data-and-hadoopSlide 8  Hadoop is not a replacement for your existing data processing infrastructure  After processing the data in Hadoop you need to send the output to relational database technologies today for BI, decision support, reporting etc  It’s not going to replace your database, but your database isn’t likely to replace Hadoop either  Different tools for different jobs Not a Replacement for Existing Infrastructure
  • 9.
    Slide 9Slide 9www.edureka.co/big-data-and-hadoopSlide 9  Hadoop framework is not recommendable for small structured datasets as you have other tools available in market which can do this work quite easily and at a fast pace than Hadoop like MS excel, RDBMS etc  For a small data analytics, Hadoop can be costlier than other tools Merge all the small files into one Multiple Smaller Datasets – Accepted Way
  • 10.
    Slide 10Slide 10www.edureka.co/big-data-and-hadoopSlide 10 Multiple Smaller Datasets – Accepted Way 4225284 Each file of x MB Slow Execution – 10400 ms 4225284 All the above files merged into one file (9x MB) Fast Execution – 6140 ms Same OutputSame Input
  • 11.
    Slide 11Slide 11www.edureka.co/big-data-and-hadoopSlide 11  Unless you have a better understanding of the Hadoop framework, its not suggested to use Hadoop for production  Learning Hadoop and it eco-system tools and deciding which technology suits your need is again a different level of complexity Novice Hadoopers
  • 12.
    Slide 12Slide 12www.edureka.co/big-data-and-hadoopSlide 12  Many enterprises — especially within highly regulated industries dealing with sensitive data— aren’t able to move as quickly as they would like towards implementing Big Data projects and Hadoop “Example Health-care data used by Insurance companies to calculate premium” Where Security is the Primary Concern? They don’t have to hesitate though, as many of the security and compliance challenges are being continuously worked upon and can be surmountable (for example, by using Apache Accumulo on top of Hadoop).
  • 13.
    Slide 13Slide 13www.edureka.co/big-data-and-hadoopSlide 13 Where security is the primary concern – Accepted way Healthcare Data Hadoop Analytic Integration Healthcare Data Hadoop Analytic Integration
  • 14.
    Slide 14Slide 14www.edureka.co/big-data-and-hadoopSlide 14 When To Use Hadoop
  • 15.
    Slide 15Slide 15www.edureka.co/big-data-and-hadoopSlide 15  Your have different types of data : structured, semi-structured and unstructured  The data set is huge in size i.e. several Terabytes or Petabytes  You are not in a hurry for Answers Data Size and Data Diversity
  • 16.
    Slide 16Slide 16www.edureka.co/big-data-and-hadoopSlide 16  To implement Hadoop on you data you should first understand the level of complexity of data and the rate it is going to grow  So we need a cluster planning, its may begin with building a small or medium cluster in your industry as per data (in GBs or few TBs ) available at present and scale up your cluster in future depending on the growth of your data Future Planning
  • 17.
    Slide 17Slide 17www.edureka.co/big-data-and-hadoopSlide 17  Hadoop can be integrated with multiple analytic tools to get the best out of it, like M-Learning, R , Python, Spark, MongoDB etc. Multiple Frameworks for Big Data
  • 18.
    Slide 18Slide 18www.edureka.co/big-data-and-hadoopSlide 18  When you want your data to be live and running forever, it can be achieved using Hadoop’s scalability Lifetime Data Availability
  • 19.
    Slide 19Slide 19www.edureka.co/big-data-and-hadoopSlide 19
  • 20.
    LIVE Online Class ClassRecording in LMS 24/7 Post Class Support Module Wise Quiz Project Work Verifiable Certificate Slide 20 www.edureka.co/big-data-and-hadoop How it Works?
  • 21.
    Slide 21Slide 21www.edureka.co/big-data-and-hadoopSlide 21  Module 1 » Understanding Big Data and Hadoop  Module 2 » Hadoop Architecture and HDFS  Module 3 » Hadoop MapReduce Framework - I  Module 4 » Hadoop MapReduce Framework - II  Module 5 » Advance MapReduce Course Topics  Module 6 » PIG  Module 7 » HIVE  Module 8 » Advance HIVE and HBase  Module 9 » Advance HBase  Module 10 » Oozie and Hadoop Project
  • 22.
    Slide 22 Questions Twitter @edurekaIN,Facebook /edurekaIN, use #askEdureka for Questions