© 2016 IBM Corporation Introducing Big SQL Federation Createdby C. M. Saracco,IBM Silicon Valley Lab June 2016
© 2016 IBM Corporation2 Executive summary § What’s Big SQL federation? − Integration technology for Hadoop and remote data sources − Transparently query Big SQL (Hadoop) and RDBMS tables with standard SQL − Query optimization, security mapping, other critical features built in § Why federate? − Not always practical to move / replicate data from one source to another − Hadoop programmers need access to corporate RDBMS data to enhance analytics, integrate public and proprietary data, etc. § What’s supported? − Big SQL tables (and views) in DFS, HBase, or Hive warehouse − RDBMS tables (and views) in Oracle, Teradata, MS SQL Server, DB2, Informix, Netezza, . . . − Query data across all sources (project, restrict, join, union, wide range of sub-queries, wide range of built-in functions ) − INSERT INTO … SELECT FROM … − Issue data-source specific SQL − Collect statistics and inspect detailed data access plan − . . . .
© 2016 IBM Corporation3 Agenda §Overview − Key features − When to federate §Technology − Architecture − Set up, usage examples − Supported data sources §Summary
© 2016 IBM Corporation4 Big SQL query federation = virtualized data access Transparent § Appears to be one source § Programmers don’t need to know how / where data is stored Heterogeneous § Accesses data from diverse sources High Function § Full query support against all data § Capabilities of sources as well Autonomous § Non-disruptive to data sources, existing applications, systems. High Performance § Optimization of distributed queries SQL tools, applications Data sources Virtualized data
© 2016 IBM Corporation5 When to federate…. § Budget § Resources § Time § Ownership § Too ad hoc, temporary § Too proprietary § Too recent § Too big Physical integration not always a requirement/option Barriers
© 2016 IBM Corporation6 Agenda §Overview − Key features − When to federate §Technology − Architecture − Set up, usage examples − Supported data sources §Summary
© 2016 IBM Corporation7 Federation architecture and components Wrapper ServerServer Nickname Nickname Nickname Federated server: BigSQL database enabled for federation. Wrapper: library allowing access to a particular class of data sources or protocols (Net8, DRDA, etc). Contains information about data source characteristics Server: represents a specific data source Nickname: a local alias to data on a remote server (e.g, a specific table or view) Federation catalog 4Stores information about 4Wrappers,servers, nicknames 4Server attributes 4Nickname attributes 4Remote functions Federation server (Big SQL)
© 2016 IBM Corporation8 Federation in practice § Admin enables federation § Apps connect to Big SQL database § Nicknames look like tables to the app § Big SQL optimizer creates global data access plan with cost analysis, query push down § Query fragments executed remotely Nickname Nickname Table Cost-based optimizer Wrapper Client library Wrapper Client library Local + Remote Execution Plans Remote sources Federation server (Big SQL) Native dialect Connect to bigsql
© 2016 IBM Corporation9 Creating and using federated objects (example) -- Create wrapper to identify client library (Oracle Net8) CREATE WRAPPER ORA LIBRARY 'libdb2net8.so' -- Create server for Oracle data source CREATE SERVER ORASERV TYPE ORACLE VERSION 11 WRAPPER ORA AUTHORIZATION ”orauser” PASSWORD ”orauser” OPTIONS (NODE 'TNSNODENAME', PUSHDOWN 'Y', COLLATING_SEQUENCE 'N'); -- Map the local user 'orauser' to the Oracle user 'orauser' / password 'orauser' CREATE USER MAPPING FOR orauser SERVER ORASERV OPTIONS ( REMOTE_PASSWORD 'orauser'); -- Create nickname for Oracle table / view CREATE NICKNAME NICK1 FOR ORASERV.ORAUSER.TABLE1; -- Query the nickname SELECT * FROM NICK1 WHERE COL1 < 10;
© 2016 IBM Corporation10 Joining data across sources
© 2016 IBM Corporation11 Data sources supported by Big SQL Federation Server § Current list of supported data sources available at https://www-304.ibm.com/support/entdocview.wss?uid=swg27044495 Data Source Supported Versions Notes DB2® DB2 for Linux, UNIX, and Windows 9.7, 9.8, 10.1, 10.5 DB2 for z/OS 8.x, 9.x, and 10.x Oracle 11g, 11gR1, 11g R2, 12c Teradata 12, 13, 14 Not supported on POWER systems. Netezza 4.6, 5.0, 6.0, 7.2 Not supported on POWER systems. Informix 11.5 Microsoft SQL Server 2012, 2014
© 2016 IBM Corporation12 Agenda §Overview − Key features − When to federate §Technology − Architecture − Set up, usage examples − Supported data sources §Summary
© 2016 IBM Corporation13 Big SQL federation – Easily access information on demand – Combine Big Data in Hadoop with RDBMS data – Quickly extend your data warehouse Benefits – Cost-effective – Quick to provide fast time to value – Agile and flexible – Versatile – Low risk, seamless, and transparent

Big Data: SQL query federation for Hadoop and RDBMS data

  • 1.
    © 2016 IBMCorporation Introducing Big SQL Federation Createdby C. M. Saracco,IBM Silicon Valley Lab June 2016
  • 2.
    © 2016 IBMCorporation2 Executive summary § What’s Big SQL federation? − Integration technology for Hadoop and remote data sources − Transparently query Big SQL (Hadoop) and RDBMS tables with standard SQL − Query optimization, security mapping, other critical features built in § Why federate? − Not always practical to move / replicate data from one source to another − Hadoop programmers need access to corporate RDBMS data to enhance analytics, integrate public and proprietary data, etc. § What’s supported? − Big SQL tables (and views) in DFS, HBase, or Hive warehouse − RDBMS tables (and views) in Oracle, Teradata, MS SQL Server, DB2, Informix, Netezza, . . . − Query data across all sources (project, restrict, join, union, wide range of sub-queries, wide range of built-in functions ) − INSERT INTO … SELECT FROM … − Issue data-source specific SQL − Collect statistics and inspect detailed data access plan − . . . .
  • 3.
    © 2016 IBMCorporation3 Agenda §Overview − Key features − When to federate §Technology − Architecture − Set up, usage examples − Supported data sources §Summary
  • 4.
    © 2016 IBMCorporation4 Big SQL query federation = virtualized data access Transparent § Appears to be one source § Programmers don’t need to know how / where data is stored Heterogeneous § Accesses data from diverse sources High Function § Full query support against all data § Capabilities of sources as well Autonomous § Non-disruptive to data sources, existing applications, systems. High Performance § Optimization of distributed queries SQL tools, applications Data sources Virtualized data
  • 5.
    © 2016 IBMCorporation5 When to federate…. § Budget § Resources § Time § Ownership § Too ad hoc, temporary § Too proprietary § Too recent § Too big Physical integration not always a requirement/option Barriers
  • 6.
    © 2016 IBMCorporation6 Agenda §Overview − Key features − When to federate §Technology − Architecture − Set up, usage examples − Supported data sources §Summary
  • 7.
    © 2016 IBMCorporation7 Federation architecture and components Wrapper ServerServer Nickname Nickname Nickname Federated server: BigSQL database enabled for federation. Wrapper: library allowing access to a particular class of data sources or protocols (Net8, DRDA, etc). Contains information about data source characteristics Server: represents a specific data source Nickname: a local alias to data on a remote server (e.g, a specific table or view) Federation catalog 4Stores information about 4Wrappers,servers, nicknames 4Server attributes 4Nickname attributes 4Remote functions Federation server (Big SQL)
  • 8.
    © 2016 IBMCorporation8 Federation in practice § Admin enables federation § Apps connect to Big SQL database § Nicknames look like tables to the app § Big SQL optimizer creates global data access plan with cost analysis, query push down § Query fragments executed remotely Nickname Nickname Table Cost-based optimizer Wrapper Client library Wrapper Client library Local + Remote Execution Plans Remote sources Federation server (Big SQL) Native dialect Connect to bigsql
  • 9.
    © 2016 IBMCorporation9 Creating and using federated objects (example) -- Create wrapper to identify client library (Oracle Net8) CREATE WRAPPER ORA LIBRARY 'libdb2net8.so' -- Create server for Oracle data source CREATE SERVER ORASERV TYPE ORACLE VERSION 11 WRAPPER ORA AUTHORIZATION ”orauser” PASSWORD ”orauser” OPTIONS (NODE 'TNSNODENAME', PUSHDOWN 'Y', COLLATING_SEQUENCE 'N'); -- Map the local user 'orauser' to the Oracle user 'orauser' / password 'orauser' CREATE USER MAPPING FOR orauser SERVER ORASERV OPTIONS ( REMOTE_PASSWORD 'orauser'); -- Create nickname for Oracle table / view CREATE NICKNAME NICK1 FOR ORASERV.ORAUSER.TABLE1; -- Query the nickname SELECT * FROM NICK1 WHERE COL1 < 10;
  • 10.
    © 2016 IBMCorporation10 Joining data across sources
  • 11.
    © 2016 IBMCorporation11 Data sources supported by Big SQL Federation Server § Current list of supported data sources available at https://www-304.ibm.com/support/entdocview.wss?uid=swg27044495 Data Source Supported Versions Notes DB2® DB2 for Linux, UNIX, and Windows 9.7, 9.8, 10.1, 10.5 DB2 for z/OS 8.x, 9.x, and 10.x Oracle 11g, 11gR1, 11g R2, 12c Teradata 12, 13, 14 Not supported on POWER systems. Netezza 4.6, 5.0, 6.0, 7.2 Not supported on POWER systems. Informix 11.5 Microsoft SQL Server 2012, 2014
  • 12.
    © 2016 IBMCorporation12 Agenda §Overview − Key features − When to federate §Technology − Architecture − Set up, usage examples − Supported data sources §Summary
  • 13.
    © 2016 IBMCorporation13 Big SQL federation – Easily access information on demand – Combine Big Data in Hadoop with RDBMS data – Quickly extend your data warehouse Benefits – Cost-effective – Quick to provide fast time to value – Agile and flexible – Versatile – Low risk, seamless, and transparent