One Click Hadoop Clusters - Anywhere (Using Docker)

© Hortonworks Inc. 2011 – 2015. All Rights Reserved Goals and motivations • Full Hadoop stack provisioning – everywhere • Automate and unify the process • Zero-configuration approach • Same process through a cluster lifecycle (Dev, QA, UAT, Prod) • Provide tooling - UI, REST API and CLI/shell • Secure and multi-tenant • SLA policy based autoscaling

© Hortonworks Inc. 2011 – 2015. All Rights Reserved Docker • Container based virtualization • Lightweight and portable • Build once, run anywhere • Ease of packaging applications • Automated and scripted • Isolated

© Hortonworks Inc. 2011 – 2015. All Rights Reserved Consul – How it works • Consul servers/agents • Consistency through a quorum (RAFT) • Scalability due to gossip based protocol (SWIM) • Decentralized and fault tolerant • Highly available • Consistency over availability (CP) • Multiple interfaces - HTTP and DNS • Support for watches

© Hortonworks Inc. 2011 – 2015. All Rights Reserved Apache Ambari • Easy Hadoop cluster provisioning • Management and monitoring • Key feature - Blueprints • REST API, CLI shell • Extensible • Stacks • Services • Views

© Hortonworks Inc. 2011 – 2015. All Rights Reserved Apache Ambari – How it works • Ambari server/agents • Define a blueprint (blueprint.json) • Define a host mapping (hostmapping.json) • Post the cluster create

© Hortonworks Inc. 2011 – 2015. All Rights Reserved Cloudbreak Cloudbreak is a cloud-agnostic Hadoop as a Service API. Abstracts the provisioning and ease management and monitoring of on-demand clusters. Cloudbreak is a powerful left surf that breaks over a coral reef, a mile off southwest the island of Tavarua, Fiji.

© Hortonworks Inc. 2011 – 2015. All Rights Reserved Cloudbreak • Benefits • Zero configuration • Elastic • Secure • Infrastructure agnostic • Heterogenous clusters • Auto-scaling • Main REST resources • /template – specify an instance group infrastructure • /stack – creates an infrastructure based on a template • /blueprint – describes a Hadoop cluster • /cluster – creates a Hadoop cluster

© Hortonworks Inc. 2011 – 2015. All Rights Reserved Cloudbreak – How it works • Start VMs - with a running Docker daemon • Cloudbreak Bootstrap • Start Consul Cluster • Start Swarm Cluster (Consul for discovery) • Start Ambari servers/agents - Swarm API • Ambari services registered in Consul (Registrator) • Post Blueprint

© Hortonworks Inc. 2011 – 2015. All Rights Reserved Cloudbreak - Features • Extensible – easy to implement Service Provider Interface • Cloudbreak “recipes” • Automate host configuration • Pre/post Ambari lifecycle hooks • Services reconfiguration • Automate/execute custom actions • Side – effects • Ambari CLI/shell and Groovy based client • Cloud Foundry’s UAA Dockerized • Munchausen – bootstrap Swarm with Consul • Dockerized full Hadoop stack (Apache Hadoop 60K+, Ambari 12K+, Spark 10K+ downloads)

© Hortonworks Inc. 2011 – 2015. All Rights Reserved Cloudbreak - Hadoop as a Service API • Public tech preview • Microsoft Azure • Amazon AWS • Google Cloud Platform • OpenStack • Private tech preview – R&D • Bare metal • Rackspace Managed Cloud • HP Helion Public Cloud *integration SPI is available

© Hortonworks Inc. 2011 – 2015. All Rights Reserved Cloudbreak – SPI • Cloud providers have very different API, though model is very similar • Non – invasive implementation • One interface to implement - CloudPlatformConnector Network Security Group Image SubnetSubnet RulesRules Instance VolumeVolumes VolumeIP Address UserData Instance VolumeVolumes VolumeIP Address Instance VolumeVolumes VolumeIP Address

© Hortonworks Inc. 2011 – 2015. All Rights Reserved Periscope Periscope is a heuristic Hadoop scheduler associated with a QoS profile. Built on YARN schedulers, cloud and VM resource management API's it allows to associate SLA's to applications and customers. Periscope is a powerful, fast, thick and top- to-bottom right-hander, eastward from Sumbawa's famous west-coast.

© Hortonworks Inc. 2011 – 2015. All Rights Reserved Periscope • Benefits • Zero configuration • Metric and time based alarms • SLA policy based autoscaling • Secure • Hostgroup specific • Main REST resources • /clusters – specify a cluster to be monitored • /alerts– time and metric based • /policies – specify an SLA policy for a cluster based on an alarm • /applications – specify an SLA policy for an application (under development)

© Hortonworks Inc. 2011 – 2015. All Rights Reserved Periscope – How it works • Configures/monitors alarms in Ambari • Setup alarm, cooldown periods • Manages cluster sizes • Allow to associate SLA scaling policies to alarms • Orchestrates Cloudbreak to up/downscale the cluster

One Click Hadoop Clusters - Anywhere (Using Docker)

More Related Content

What's hot

Viewers also liked

Similar to One Click Hadoop Clusters - Anywhere (Using Docker)

More from DataWorks Summit

Recently uploaded

One Click Hadoop Clusters - Anywhere (Using Docker)

Editor's Notes