De-Mystifying the Apache Phoenix QueryServer Josh Elser MTS 2016-04-13
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved About me • (Recent) Apache Phoenix Committer • Apache Calcite Committer and PMC • Long-time NoSQL developer, re-learning SQL Apache Calcite and Apache Phoenix are projects at the Apache Software Foundation. These names are trademarks of the Foundation.
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda What? Why? How? Apache Phoenix QueryServer
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved “What” is Apache Phoenix?  Been called many things [1] – “We put the SQL back in NoSQL!” – “A SQL skin on HBase” – “A relational layer on HBase” – “Online transaction processing and operational analytics for Hadoop”  Built on HDFS and HBase – Clients use a JDBC driver – Lots of server-side “magic” through HBase Coprocessors  A query system capable of both OLAP and OLTP workloads – More or less [1] https://medium.com/salesforce-open-source/apache-phoenix-a-conversation-with-pmc-chair-james-taylor-cc0dd8c7c3e5
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved “What” is the Apache Phoenix QueryServer?  An HTTP abstraction of a JDBC Driver – Built on Apache Calcite’s Avatica sub-project  A standalone-service to be run on each node in a cluster – An HTTP server – Configurable serialization mechanism  A new JDBC Driver to use with the QueryServer – A glorified HTTP client – A new sqlline script
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved “What” is Apache Calcite?  SQL Parser – One SQL implementation usable by everyone  Cost-Based Optimizer – “Optimizations are easy”  Pluggable Data Sources – Implement your own SQL engine  Avatica – Calcite sub-project – Implements the JDBC-over-HTTP abstraction – Written to the JDBC spec, not database-specific The coolest project approximately one person can explain
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda What? Why? How? Apache Phoenix QueryServer
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved “Why” should I care?  A true “thin” client – No required connection to HBase/ZooKeeper/HDFS – Greatly simplifies definition of “Phoenix client”  Offload computational resources to cluster – QueryServers run on the cluster – Not your laptop or some “edge” node  Enables non-Java clients – The big one Because it’s friggin’ cool!
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved “Why” are non-Java clients important?  ”Native” bindings in any language – HTTP clients are easily implemented – Serialization approaches (often) have cross-language support  Access to data in HBase is suddenly easily accessible – Standardized table format through Phoenix – Well-defined APIs: Python Database API, Ruby ActiveRecord, etc  ODBC and BI Tools – The moonshot. – The hopes and dreams of services people everywhere. Not everyone wants to use Java.
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved “Why” not <insert rpc framework here> instead of HTTP?  HTTP is simple – “You have multiple versions of Thrift on the classpath” – “You have to use Protobuf 2.4”  Designed to be stateless – JDBC doesn’t make this easy – Can work around it via Avatica’s wire API  Statelessness makes scaling easier – Pull down any HTTP load balancer – Deploy more Avatica servers to scale up Because portability sucks
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda What? Why? How? Apache Phoenix QueryServer
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved “How” does it work?  HTTP Server – Jetty – Phoenix “thick” Driver  Serialization mechanism – Protocol Buffers – JSON  Metrics system – Dropwizard Metrics – Apache Hadoop Metrics2  Authentication – Kerberos via SPNEGO – HTTP Basic or Digest The QueryServer itself
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved “How” does the serialization work?  Google Protocol Buffers (v3) – “think XML, but smaller, faster, and simpler” [1] – 110% supported WRT compatibility – Native bindings in most every popular language – Clients can use any version of protobuf3  JSON – Nice for testing – 110% unsupported WRT compatibility – You will run into issue with mismatched client/server versions Please, please, please use Protocol Buffers [1] https://developers.google.com/protocol-buffers/
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved “How” do I make a client?  Choose a language – Find an HTTP client supported with that language – Install Protobuf bindings for that language  Read the Avatica docs [1] – Tell us when docs are incorrect/lacking/wrong/boring/lame  Write tests  Publish the client – And tell us! Sit down and write code [1] http://calcite.apache.org/avatica/docs/protobuf_reference.html
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved “How” do I get involved?  Provide servers for databases – A simple project for a specific database  Write some tests  Proofread the docs  Contribute a client  Answer questions on Stackoverflow/mailing lists Carpe diem
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Thanks! Email: elserj@apache.org Twitter: @josh_elser Mailing lists: Phoenix: dev@phoenix.apache.org, user@phoenix.apache.org, Calcite: dev@calcite.apache.org Project info: https://phoenix.apache.org/server.html https://calcite.apache.org/avatica/

De-Mystifying the Apache Phoenix QueryServer

  • 1.
  • 2.
    2 © HortonworksInc. 2011 – 2016. All Rights Reserved About me • (Recent) Apache Phoenix Committer • Apache Calcite Committer and PMC • Long-time NoSQL developer, re-learning SQL Apache Calcite and Apache Phoenix are projects at the Apache Software Foundation. These names are trademarks of the Foundation.
  • 3.
    3 © HortonworksInc. 2011 – 2016. All Rights Reserved Agenda What? Why? How? Apache Phoenix QueryServer
  • 4.
    4 © HortonworksInc. 2011 – 2016. All Rights Reserved “What” is Apache Phoenix?  Been called many things [1] – “We put the SQL back in NoSQL!” – “A SQL skin on HBase” – “A relational layer on HBase” – “Online transaction processing and operational analytics for Hadoop”  Built on HDFS and HBase – Clients use a JDBC driver – Lots of server-side “magic” through HBase Coprocessors  A query system capable of both OLAP and OLTP workloads – More or less [1] https://medium.com/salesforce-open-source/apache-phoenix-a-conversation-with-pmc-chair-james-taylor-cc0dd8c7c3e5
  • 5.
    5 © HortonworksInc. 2011 – 2016. All Rights Reserved “What” is the Apache Phoenix QueryServer?  An HTTP abstraction of a JDBC Driver – Built on Apache Calcite’s Avatica sub-project  A standalone-service to be run on each node in a cluster – An HTTP server – Configurable serialization mechanism  A new JDBC Driver to use with the QueryServer – A glorified HTTP client – A new sqlline script
  • 6.
    6 © HortonworksInc. 2011 – 2016. All Rights Reserved “What” is Apache Calcite?  SQL Parser – One SQL implementation usable by everyone  Cost-Based Optimizer – “Optimizations are easy”  Pluggable Data Sources – Implement your own SQL engine  Avatica – Calcite sub-project – Implements the JDBC-over-HTTP abstraction – Written to the JDBC spec, not database-specific The coolest project approximately one person can explain
  • 7.
    7 © HortonworksInc. 2011 – 2016. All Rights Reserved Agenda What? Why? How? Apache Phoenix QueryServer
  • 8.
    8 © HortonworksInc. 2011 – 2016. All Rights Reserved “Why” should I care?  A true “thin” client – No required connection to HBase/ZooKeeper/HDFS – Greatly simplifies definition of “Phoenix client”  Offload computational resources to cluster – QueryServers run on the cluster – Not your laptop or some “edge” node  Enables non-Java clients – The big one Because it’s friggin’ cool!
  • 9.
    9 © HortonworksInc. 2011 – 2016. All Rights Reserved “Why” are non-Java clients important?  ”Native” bindings in any language – HTTP clients are easily implemented – Serialization approaches (often) have cross-language support  Access to data in HBase is suddenly easily accessible – Standardized table format through Phoenix – Well-defined APIs: Python Database API, Ruby ActiveRecord, etc  ODBC and BI Tools – The moonshot. – The hopes and dreams of services people everywhere. Not everyone wants to use Java.
  • 10.
    10 © HortonworksInc. 2011 – 2016. All Rights Reserved “Why” not <insert rpc framework here> instead of HTTP?  HTTP is simple – “You have multiple versions of Thrift on the classpath” – “You have to use Protobuf 2.4”  Designed to be stateless – JDBC doesn’t make this easy – Can work around it via Avatica’s wire API  Statelessness makes scaling easier – Pull down any HTTP load balancer – Deploy more Avatica servers to scale up Because portability sucks
  • 11.
    11 © HortonworksInc. 2011 – 2016. All Rights Reserved Agenda What? Why? How? Apache Phoenix QueryServer
  • 12.
    12 © HortonworksInc. 2011 – 2016. All Rights Reserved “How” does it work?  HTTP Server – Jetty – Phoenix “thick” Driver  Serialization mechanism – Protocol Buffers – JSON  Metrics system – Dropwizard Metrics – Apache Hadoop Metrics2  Authentication – Kerberos via SPNEGO – HTTP Basic or Digest The QueryServer itself
  • 13.
    13 © HortonworksInc. 2011 – 2016. All Rights Reserved “How” does the serialization work?  Google Protocol Buffers (v3) – “think XML, but smaller, faster, and simpler” [1] – 110% supported WRT compatibility – Native bindings in most every popular language – Clients can use any version of protobuf3  JSON – Nice for testing – 110% unsupported WRT compatibility – You will run into issue with mismatched client/server versions Please, please, please use Protocol Buffers [1] https://developers.google.com/protocol-buffers/
  • 14.
    14 © HortonworksInc. 2011 – 2016. All Rights Reserved “How” do I make a client?  Choose a language – Find an HTTP client supported with that language – Install Protobuf bindings for that language  Read the Avatica docs [1] – Tell us when docs are incorrect/lacking/wrong/boring/lame  Write tests  Publish the client – And tell us! Sit down and write code [1] http://calcite.apache.org/avatica/docs/protobuf_reference.html
  • 15.
    15 © HortonworksInc. 2011 – 2016. All Rights Reserved “How” do I get involved?  Provide servers for databases – A simple project for a specific database  Write some tests  Proofread the docs  Contribute a client  Answer questions on Stackoverflow/mailing lists Carpe diem
  • 16.
    16 © HortonworksInc. 2011 – 2016. All Rights Reserved Thanks! Email: elserj@apache.org Twitter: @josh_elser Mailing lists: Phoenix: dev@phoenix.apache.org, user@phoenix.apache.org, Calcite: dev@calcite.apache.org Project info: https://phoenix.apache.org/server.html https://calcite.apache.org/avatica/