Data Integration Alternatives Paul Moxon, Senior Director, Product Management
Agenda1.Three Key Trends Affecting IT 2.The Logical Data Warehouse 3.Data Integration Layer Alternatives 4.The Logical Data Warehouse Revisited
Three Key Trends Affecting IT
4 1. Reduce corporate data silos to gain efficiency and productivity 2. Towards a common data backbone for operational and informational use 3. Enterprises going with bimodal IT in their modernization efforts Three Key Trends
5 1. Reduce corporate data silos to gain efficiency and productivity 2. Towards a common data backbone for operational and informational use 3. Enterprises going with bimodal IT in their modernization efforts • Organizational structures create specialized data and application silos • The proliferation of silos has inhibited access to and the sharing of data across the organization • Consolidating and opening up these silos (while retaining ownership and control) will promote efficiency and productivity Trend I - Consolidation
6 1. Reduce corporate data silos to gain efficiency and productivity 2. Towards a common data backbone for operational and informational use 3. Enterprises going with bimodal IT in their modernization efforts • Access to data via logical layer for common and consistent view of data assets • Example: Customer Data • All analytics, reports, processes, applications (web, mobile, desktop) should see same customer data • Is this a Data Lake? • In reality there will be more than one data lake (separate or refined) Trend II – Common Data Backbone
7 1. Reduce corporate data silos to gain efficiency and productivity 2. Towards a common data backbone for operational and informational use 3. Enterprises going with bimodal IT in their modernization efforts • Bimodal IT has two IT ‘flavors’ • Type 1 – focused on stability and efficiency (traditional IT) • Type 2 – experimental and agile focused on TTM and rapid app evolution. Aligned with business. • Some have compared to ‘SoR’ and ‘SoE’ differentiation • Two need to live side-by-side and interact • New apps still need data from ‘SoR’ Trend III – Bimodal IT
8 What Does This Mean? • A data access layer is needed to ‘open up’ data silos  But retaining local ownership and control of the data • The access layer must provide access to all data sources and support different modes of access  Reporting/analytics, real-time applications access (mobile/web and ‘traditional’), etc. • New technologies will be an important part of the information infrastructure  Hadoop ecosystem, NoSQL, streaming data, “Data Lakes” • The traditional IT infrastructure is not going away soon  ‘Systems of Record’ still needed • The new and the old need to work together  Newer systems still needs to interact with ‘Systems of Record’ How does this affect the ‘Information Architecture’?
Logical Data Warehouse
10 Logical Data Warehouse Definition: “The Logical Data Warehouse (LDW) is a new data management architecture for analytics combining the strengths of traditional repository warehouses with alternative data management and access strategy.” “The LDW is an evolution and augmentation of DW practices, not a replacement” “A repository-only style DW contains a single ontology/taxonomy, whereas in the LDW a semantic layer can contain many combination of use cases, many business definitions of the same information” “The LDW permits an IT organization to make a large number of datasets available … via query tools and applications” Gartner Hype Cycle for Enterprise Information Management, 2012.
11 Architecture of the Logical Data Warehouse Data Warehouse Sensor Data Machine Data (Logs) Social Data Clickstream Data Internet Data Image and Video Enterprise Content (Unstructured) Big Data Enterprise Applications Traditional Enterprise Data Cloud Cloud Applications Metadata Management, Data Governance, Data Security NoSQL EDW In-Memory (SAP Hana, …) Analytical Appliances Cloud DW (Redshift,..) ODS Big Data E T L C D C S q o o p (Flume, Kafka, …) Real-Time Data Access (On-Demand / Streaming) Batch YARN / Workload Management HDFS Hive Spark Drill Impala Storm HBase Solr Hunk DW Streams NoSQL SearchSQL Hadoop Tez Map Red. DataIntegration/SemanticLayer Real-Time Decision Management Alerts Scorecards Dashboards Reporting Data Discovery Self-Service Search Predictive Analytics Statistical Analytics (R) Text Analytics Data Mining
12 Autodesk Data Architecture DataIntegration/SemanticLayer
Data Integration/Semantic Layer Alternatives
14 Three Integration/Semantic Layer Alternatives Application/BI Tool as Data Integration/Semantic Layer EDW as Data Integration/Semantic Layer Data Virtualization as Data Integration/Semantic Layer Application/BI Tool Data Virtualization EDW EDW ODS ODS EDW ODS
15 Application/BI Tool as the Data Integration Layer Application/BI Tool as Data Integration/Semantic Layer Application/BI Tool EDW ODS • Integration is delegated to end user tools and applications • e.g. BI Tools with ‘data blending’ • Results in duplication of effort – integration defined many times in different tools • Impact of change in data schema? • End user tools are not intended to be integration middleware • Not their primary purpose or expertise
16 EDW as the Data Integration Layer EDW as Data Integration/Semantic Layer EDW ODS • Access to ‘other’ data (query federation) via EDW • Teradata QueryGrid, IBM FluidQuery, SAP Smart Data Access, etc. • Often coupled with traditional ETL replication of data into EDW • EDW ‘center of data universe’ • Provides data integration and semantic layer • Appears attractive to organizations heavily invested in EDW • More than one EDW? EDW costs?
17 Data Virtualization as the Data Integration Layer Data Virtualization as Data Integration/Semantic Layer Data Virtualization EDW ODS • Move data integration and semantic layer to independent Data Virtualization platform • Purpose built for supporting data access across multiple heterogeneous data sources • Separate layer provides semantic models for underlying data • Physical to logical mapping • Enforces common and consistent security and governance policies • Gartner’s recommended approach
Logical Data Warehouse Revisited
19 Architecture of the Logical Data Warehouse Real-Time Decision Management Alerts Scorecards Dashboards Reporting Data Discovery Self-Service Search Predictive Analytics Statistical Analytics (R) Text Analytics Data Mining Data Warehouse Sensor Data Machine Data (Logs) Social Data Clickstream Data Internet Data Image and Video Enterprise Content (Unstructured) Big Data Enterprise Applications Traditional Enterprise Data Cloud Cloud Applications NoSQL EDW In-Memory (SAP Hana, …) Analytical Appliances Cloud DW (Redshift,..) ODS Big Data E T L C D C S q o o p (Flume, Kafka, …) Data Virtualization Real-Time Data Access (On-Demand / Streaming) Data Caching DataServices Data Search & Discovery Governance Security Optimization DataAbstraction DataTransformation DataFederation Batch YARN / Workload Management HDFS Hive Spark Drill Impala Storm HBase Solr Hunk DW Streams NoSQL SearchSQL Hadoop Tez Map Red.
20 Autodesk Data Architecture
21 1. The 3 trends will change your ‘information architecture’ 2. Logical Data Warehouse (LDW) is a key architectural pattern to address many of the challenges of the new information architecture 3. LDW requires a data integration/semantic layer 4. Data Virtualization is the recommended approach for this critical layer Summary
Thanks! www.denodo.com info@denodo.com © Copyright Denodo Technologies. All rights reserved Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm, without prior the written authorization from Denodo Technologies.

Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB

  • 1.
    Data Integration Alternatives PaulMoxon, Senior Director, Product Management
  • 2.
    Agenda1.Three Key TrendsAffecting IT 2.The Logical Data Warehouse 3.Data Integration Layer Alternatives 4.The Logical Data Warehouse Revisited
  • 3.
    Three Key TrendsAffecting IT
  • 4.
    4 1. Reduce corporatedata silos to gain efficiency and productivity 2. Towards a common data backbone for operational and informational use 3. Enterprises going with bimodal IT in their modernization efforts Three Key Trends
  • 5.
    5 1. Reduce corporatedata silos to gain efficiency and productivity 2. Towards a common data backbone for operational and informational use 3. Enterprises going with bimodal IT in their modernization efforts • Organizational structures create specialized data and application silos • The proliferation of silos has inhibited access to and the sharing of data across the organization • Consolidating and opening up these silos (while retaining ownership and control) will promote efficiency and productivity Trend I - Consolidation
  • 6.
    6 1. Reduce corporatedata silos to gain efficiency and productivity 2. Towards a common data backbone for operational and informational use 3. Enterprises going with bimodal IT in their modernization efforts • Access to data via logical layer for common and consistent view of data assets • Example: Customer Data • All analytics, reports, processes, applications (web, mobile, desktop) should see same customer data • Is this a Data Lake? • In reality there will be more than one data lake (separate or refined) Trend II – Common Data Backbone
  • 7.
    7 1. Reduce corporatedata silos to gain efficiency and productivity 2. Towards a common data backbone for operational and informational use 3. Enterprises going with bimodal IT in their modernization efforts • Bimodal IT has two IT ‘flavors’ • Type 1 – focused on stability and efficiency (traditional IT) • Type 2 – experimental and agile focused on TTM and rapid app evolution. Aligned with business. • Some have compared to ‘SoR’ and ‘SoE’ differentiation • Two need to live side-by-side and interact • New apps still need data from ‘SoR’ Trend III – Bimodal IT
  • 8.
    8 What Does ThisMean? • A data access layer is needed to ‘open up’ data silos  But retaining local ownership and control of the data • The access layer must provide access to all data sources and support different modes of access  Reporting/analytics, real-time applications access (mobile/web and ‘traditional’), etc. • New technologies will be an important part of the information infrastructure  Hadoop ecosystem, NoSQL, streaming data, “Data Lakes” • The traditional IT infrastructure is not going away soon  ‘Systems of Record’ still needed • The new and the old need to work together  Newer systems still needs to interact with ‘Systems of Record’ How does this affect the ‘Information Architecture’?
  • 9.
  • 10.
    10 Logical Data Warehouse Definition: “TheLogical Data Warehouse (LDW) is a new data management architecture for analytics combining the strengths of traditional repository warehouses with alternative data management and access strategy.” “The LDW is an evolution and augmentation of DW practices, not a replacement” “A repository-only style DW contains a single ontology/taxonomy, whereas in the LDW a semantic layer can contain many combination of use cases, many business definitions of the same information” “The LDW permits an IT organization to make a large number of datasets available … via query tools and applications” Gartner Hype Cycle for Enterprise Information Management, 2012.
  • 11.
    11 Architecture of theLogical Data Warehouse Data Warehouse Sensor Data Machine Data (Logs) Social Data Clickstream Data Internet Data Image and Video Enterprise Content (Unstructured) Big Data Enterprise Applications Traditional Enterprise Data Cloud Cloud Applications Metadata Management, Data Governance, Data Security NoSQL EDW In-Memory (SAP Hana, …) Analytical Appliances Cloud DW (Redshift,..) ODS Big Data E T L C D C S q o o p (Flume, Kafka, …) Real-Time Data Access (On-Demand / Streaming) Batch YARN / Workload Management HDFS Hive Spark Drill Impala Storm HBase Solr Hunk DW Streams NoSQL SearchSQL Hadoop Tez Map Red. DataIntegration/SemanticLayer Real-Time Decision Management Alerts Scorecards Dashboards Reporting Data Discovery Self-Service Search Predictive Analytics Statistical Analytics (R) Text Analytics Data Mining
  • 12.
  • 13.
  • 14.
    14 Three Integration/Semantic LayerAlternatives Application/BI Tool as Data Integration/Semantic Layer EDW as Data Integration/Semantic Layer Data Virtualization as Data Integration/Semantic Layer Application/BI Tool Data Virtualization EDW EDW ODS ODS EDW ODS
  • 15.
    15 Application/BI Tool asthe Data Integration Layer Application/BI Tool as Data Integration/Semantic Layer Application/BI Tool EDW ODS • Integration is delegated to end user tools and applications • e.g. BI Tools with ‘data blending’ • Results in duplication of effort – integration defined many times in different tools • Impact of change in data schema? • End user tools are not intended to be integration middleware • Not their primary purpose or expertise
  • 16.
    16 EDW as theData Integration Layer EDW as Data Integration/Semantic Layer EDW ODS • Access to ‘other’ data (query federation) via EDW • Teradata QueryGrid, IBM FluidQuery, SAP Smart Data Access, etc. • Often coupled with traditional ETL replication of data into EDW • EDW ‘center of data universe’ • Provides data integration and semantic layer • Appears attractive to organizations heavily invested in EDW • More than one EDW? EDW costs?
  • 17.
    17 Data Virtualization asthe Data Integration Layer Data Virtualization as Data Integration/Semantic Layer Data Virtualization EDW ODS • Move data integration and semantic layer to independent Data Virtualization platform • Purpose built for supporting data access across multiple heterogeneous data sources • Separate layer provides semantic models for underlying data • Physical to logical mapping • Enforces common and consistent security and governance policies • Gartner’s recommended approach
  • 18.
  • 19.
    19 Architecture of theLogical Data Warehouse Real-Time Decision Management Alerts Scorecards Dashboards Reporting Data Discovery Self-Service Search Predictive Analytics Statistical Analytics (R) Text Analytics Data Mining Data Warehouse Sensor Data Machine Data (Logs) Social Data Clickstream Data Internet Data Image and Video Enterprise Content (Unstructured) Big Data Enterprise Applications Traditional Enterprise Data Cloud Cloud Applications NoSQL EDW In-Memory (SAP Hana, …) Analytical Appliances Cloud DW (Redshift,..) ODS Big Data E T L C D C S q o o p (Flume, Kafka, …) Data Virtualization Real-Time Data Access (On-Demand / Streaming) Data Caching DataServices Data Search & Discovery Governance Security Optimization DataAbstraction DataTransformation DataFederation Batch YARN / Workload Management HDFS Hive Spark Drill Impala Storm HBase Solr Hunk DW Streams NoSQL SearchSQL Hadoop Tez Map Red.
  • 20.
  • 21.
    21 1. The 3trends will change your ‘information architecture’ 2. Logical Data Warehouse (LDW) is a key architectural pattern to address many of the challenges of the new information architecture 3. LDW requires a data integration/semantic layer 4. Data Virtualization is the recommended approach for this critical layer Summary
  • 22.
    Thanks! www.denodo.com info@denodo.com © CopyrightDenodo Technologies. All rights reserved Unless otherwise specified, no part of this PDF file may be reproduced or utilized in any for or by any means, electronic or mechanical, including photocopying and microfilm, without prior the written authorization from Denodo Technologies.