Selective Data Replication with Geographically Distributed Hadoop

Selective Data Replication with Geographically Distributed Hadoop Brett Rudenstein April 16, 2015 Brussels, Belgium

Dimensions of Scalability For distributed storage systems  Ability to support ever-increasing requirements for - Space: more data - Objects: more files - Load: more clients  RAM is limiting HDFS scale  Other dimensions of scalability - Geographic Scalability: scaling across multiple data centers  Scalability as a universal measure of a distributed system 2

Geographic Scalability Scaling file system across multiple data centers 3  Running Hadoop in multiple Data Centers - Distributed across the world - As a single cluster

Main Steps Four main stages to the goal  STAGE I: The role of the Coordination Engine  STAGE II: Replicated Virtual Namespace - Active-active  STAGE III: Geographically-distributed Hadoop - File system running in multiple data centers - Disaster recovery, load balancing, self-healing, simultaneous data ingest  STAGE IV: Selective data replication - Heterogeneous storage Zones 4

Requirements File System with hardware components distributed over the WAN  Operated and perceived by users as a single system - Unified file system view independent of where the data is physically stored  Strict Consistency - Everybody sees the same data - Seamless file level replication  Continuous Availability - All components are Active - Disaster recovery  Geographic Scalability: Support for multiple Data Centers 6

Architecture Principles Strict consistency of metadata with fast data ingest 1. Synchronous replication of metadata between data centers - Using Coordination Engine - Provides strict consistency of the namespace 2. Asynchronous replication of data over the WAN - Data replicated in the background - Allows fast LAN-speed data creation 7

Coordination Engine For Replicating Consistent State

Coordination Engine Determines the order of operations in the system  Coordination Engine ensures the order of events submitted to the engine by multiple proposers - Anybody can Propose - Engine chooses a single Agreement every time and guarantees: • Learners observe the agreements in the same order they were chosen • An agreement triggers a corresponding application action 9

Central Coordination Simple coordination without fault tolerance  Easy to Coordinate - Single NameNode as an example of a Central Coordination Engine (No HA) - Performance and availability bottleneck - Single point of failure 10

Distributed Coordination Engine Fault-tolerant coordination using multiple acceptors  Distributed Coordination Engine operates on participating nodes - Roles: Proposer, Learner, and Acceptor - Each node can combine multiple roles  Distributed coordination - Proposing nodes submit events as proposals to a quorum of acceptors - Acceptors agree on the order of each event in the global sequence of events - Learners learn agreements in the same deterministic order 11

Consensus Algorithms Consensus is the process of agreeing on one result among a group of participants  Coordination Engine guarantees the same state of the learners at a given GSN - Each agreement is assigned a unique Global Sequence Number (GSN) - GSNs form a monotonically increasing number series – the order of agreements - Learners have the same initial state, apply the same deterministic agreements in the same deterministic order - GSN represents “logical” time in coordinated systems  PAXOS is a consensus algorithm proven to tolerate a variety of failures - Quorum-based Consensus - Deterministic State Machine - Leslie Lamport: Part-Time Parliament (1990) 12

Coordinated Replication of HCFS Namespace

Replicated Virtual Namespace Coordination Engine provides equivalence of multiple namespace replicas  Coordinated Virtual Namespace controlled by Fusion Node - Is a client that acts as a proxy to other client interactions - Reads are not coordinated - Writes (Open, Close, Append, etc…) are coordinated  The namespace events are consistent with each other - Each fusion server maintains a log of changes that would occur in the namespace - Any Fusion Node can initiate an update, which is propagated to all other Fusion Nodes  Coordination Engine establishes the global order of namespace updates - Fusion servers ensure deterministic updates in the same deterministic order to underlying file system - Systems, which start from the same state and apply the same updates, are equivalent 14

Strict Consistency Model One-Copy Equivalence as known in replicated databases  Coordination Engine sequences file open and close proposals into the global sequence of agreements - Applied to individual replicated folder namespace in the order of their Global Sequence Number  Fusion Replicated Folders have identical states when they reach the same GSN  One-copy equivalence - Folders may have different states at a given moment of “clock” time as the rate of consuming agreements may vary - Provides same state in logical time 15 15

Fusion Geographically Distributed HCFS

Scaling Hadoop Across Data Centers Continuous Availability and Disaster Recovery over the WAN  The system should appear, act, and be operated as a single cluster - Instant and automatic replication of data and metadata  Parts of the cluster on different data centers should have equal roles - Data could be ingested or accessed through any of the centers  Data creation and access should typically be at LAN speed - Running time of a job executed on one data center as if there are no other centers  Failure scenarios: the system should provide service and remain consistent - Any Fusion node can fail and still provide replication - Fusion nodes can fail simultaneously on two or more data centers and still provide replication - WAN Partitioning does not cause a data center outage - RPO is as low as possible due to continuous replication as opposed to periodic 17

Foreign File Replication File is created on the client’s data center and replicated to the other asynchronously 18  Fusion workflow 1. Client makes a request to create a file 2. Fusion coordinates File Open to other clusters involved (membership) 3. File is added to underlying storage 4. IHC server pulls data from cluster and pushed to remote clusters 5. Fusion coordinates File Close to other clusters involved (membership)

Inter Hadoop Communication Service  Uses HCFS API and communicates directly with underlying storage systems - Isilon - MAPR - HDFS - S3  NameNode and DataNode operations are unchanged 19

Multi–Data Center Installation Do I need so many replicas? 20

Features Active/Active Selective Data Replication

Selective Data Replication Three main use cases for restricting data replication  “Saudi Arabia” case – Data must never leave a specific data center - This is needed to protect data from being replicated outside of a specific geo-location, a country, or a facility, e.g., customer data from a branch in Saudi Arabia of a global bank must never leave the country due to local regulations. - Virtual namespace: only replicated metadata that has its supporting data replicated  “/tmp” case – Data created in a directory by a native client should remain native - Transient data of a job running on a DC does not need to be replicated elsewhere as it is deleted upon job completion and nobody else needs it.  “Ingest Only” case – Data directly ingested into cluster at data origin - Data replicates to all other data centers - Temporary network partitioned cluster can still ingest data 22

SDR Implementation Example / cs-2015-01.log cs-2015-02.log shv-2015-03.txtuser/ tmp/ public/ Virtually replicated namespace Selectively replicated data cs-2015-01.log dc1 dc1 dc2 dc3 shv-2015-03.txt dc1 dc2 dc2 job-2015-04.xml dc3 dc3 dc3 job-2015-04.xml dc1 dc2 dc3

Heterogeneous Storage Zones Virtual Data Centers representing different types of block storage  Storage Types: Hard Drive, SSD, RAM  Virtual data center is a zone of similarly configured Data Nodes  Example: - Z1 archival zone: DataNodes with dense hard drive storage - Z2 active data zone: DataNodes with high-performance SSDs - Z3 real-time access zone: lots of RAM and cores, short-lived hot data  SDR policy defines three directories: - /archive – replicated only on Z1 - /active-data – replicated on Z2 and Z1 - /real-time – replicated everywhere 24

Simplified WAN configurations Reduced operational complexity  Fast network protocols can keep up with demanding network replication  Hadoop clusters do not require direct communication with each other. - No n x m communication among datanodes across datacenters - Reduced firewall / socks complexities  Reduced Attack Surface

Thank You. Questions? Come visit WANdisco at Booth 11 Selective Data Replication with Geographically Distributed Hadoop Brett Rudenstein

Selective Data Replication with Geographically Distributed Hadoop

More Related Content

What's hot

Viewers also liked

Similar to Selective Data Replication with Geographically Distributed Hadoop

More from DataWorks Summit

Recently uploaded

Selective Data Replication with Geographically Distributed Hadoop

Editor's Notes