Scaling Open Source Big Data Cloud Applications is Easy/Hard Paul Brebner Instaclustr—Technology Evangelist ©Instaclustr Pty Limited, 2022 DeveloperWeek 10 May 2022
Who am I? • Previously • R&D in distributed systems and performance engineering. • Last 5 years • Technology Evangelist for Instaclustr (soon NetApp) • 100+ Blogs, demo applications, talks • Open Source technologies including • Apache Cassandra, Spark, Kafka, Zookeeper • and Redis, OpenSearch, PostgreSQL, Kubernetes, Prometheus, OpenTracing, etc
Cloud Platform for Big Data Open Source Technologies Latest addition is Workflows with Uber’s Instaclustr Managed Platform ©Instaclustr Pty Limited, 2021
Cloud Platform for Big Data Open Source Technologies Latest addition is Workflows with Uber’s This talk focuses on Cassandra and Kafka ©Instaclustr Pty Limited, 2021
Scaling is Easy! Cassandra and Kafka Homogeneous distributed clusters à horizontally scalable www.cassandra.apache.org/_/cassandra-basics.html
But actually lots of moving parts (source: http://trumpetb.net/loco/rodsf.html)
Complications – DCs, Racks, Nodes, Partitions, Replication Factor, Time (for auto-scaling) Rows have a partition key and are stored in different partitions
Example 1 – Cassandra Auto-Scaling ©Instaclustr Pty Limited, 2021
Two Ways of Resizing Clusters 1 - Horizontal Scaling • Add nodes, no interruption • But scale up only (not down) • Takes time, puts extra load on cluster as data streams to extra nodes 2 - Vertical Scaling • Replace nodes with bigger (or smaller) node types (more/less cores) • Scale up and down • Takes time, temporary reduction in capacity • Choice of how many nodes are replaced concurrently – by “node” (1 node at a time) or by “rack” (all nodes in a rack) , or in-between
Cluster resizing time – by node vs. by rack – by rack is faster but …? Cluster = 6 nodes, 3 racks, 2 nodes per rack By node (concurrency 1) By rack (concurrency 2)
Resizing by node – capacity reduced by 1/6 total nodes each resize operation (simplified model)
Resizing by rack – capacity reduced by 2/6 nodes each resize operation
Comparison – resize by rack faster but has bigger capacity hit during resize
Observations • If the capacity during resize is exceeded latencies will increase • Made worse by Cassandra load balancing which assumes equal sized nodes • By node, more nodes in the Cluster reduces the impact of reduced cluster capacity during resizing (some clusters have 100s of nodes) – but will take longer • Many of our clusters have <= 6 nodes
Auto-scaling model - increasing load à linear regression over 1 hour extrapolated to future We predict the cluster will reach 100% capacity around the 280 minute mark (220 minutes in the future) Extrapolated Measured
Resize by Rack vs. Node - initiated in time to prevent overloading during resize operation Resize by rack must be initiated sooner c.f. resize by node, even thought it’s faster to resize, as it has less capacity during resize (67% c.f. 83% of initial capacity) By Rack By Node
Auto-scaling POC – worked! Monitoring API Linear Regression + Rules Provisioning API Rules generalized to allow for • scaling up and down • resizing by any number of nodes concurrently, up to rack size
Example 2 – Anomaly Detection ©Instaclustr Pty Limited, 2021 JoAnn Morgan Apollo 11 Mission Control
Multiple technologies: Kafka, Cassandra, Kubernetes
Massively Scalable Anomaly Detection – Tuning knobs (Orange h/w, yellow s/w) Scaling is (too) Easy! Initially just increased h/w resources
But scalability not great 0 1 2 3 4 5 6 7 8 0 100 200 300 400 500 600 700 Billions checks/day Total Cores Total Cores vs. Billions of checks/day (pre-tuning)
Tuning required! Scalability Post-tuning 0 2 4 6 8 10 12 14 16 18 20 0 100 200 300 400 500 600 700 Billions checks/day Total Cores Total Cores vs. Billions of checks/day (pre-tuning) Billions of checks/day (pre-tuning) Billions of checks/day (post-tuning)
Tuning – Optimize s/w resources (red arrows) 1 2 3 1. Minimize Kafka Consumers (thread pool 1) 2. Minimize Cassandra Connections 3. Maximize Cassandra client concurrency (thread pool 2)
Example 3 – What’s really going on - behind the Kafka partitions? ©Instaclustr Pty Limited, 2021 ©Instaclustr Pty Limited 2019, 2021, 2022
Kafka topic partitions enable consumer concurrency partitions >= consumers Partition n Topic “Parties” Partition 1 Producer Partition 2 Consumer Group Consumer Consumer Consumers share work within groups Consumer
High consumer/partition fan out Can be caused by: 1 Design – many topics and/or many consumers 2 Slow consumers à need more consumers to increase throughput
Kafka write architecture – partition replication
Benchmarking revealed that partitions and replication factor are the culprit 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1 10 100 1000 10000 TPS Partitions Kafka Partitions vs. Throughput Cluster: 3 nodes x 4 cores = 12 cores total Replication Factor 3 (TPS) Replication Factor 1 (TPS)
Implications? • Bigger Cluster (more nodes, bigger nodes) • Design to minimize topics and consumers • Optimize consumers for minimum time • Always benchmark with many partitions • Blame the Apache Zookeeper? • Responsible for Kafka control • From version 3.0 it’s being replaced by native KRaft protocol • Not yet production ready • May enable more partitions (but may not impact throughput)
Scaling is Mostly Easy! § Using Scalable Open Source Big Data Technologies § Hosted by suitable Cloud providers § With suitable monitoring, understanding of autoscaling and how different software “knobs” interact, and by scaling incrementally © Instaclustr Pty Limited, 2022
www.instaclustr.com info@instaclustr.com @instaclustr THANK YOU! For further Information see blogs www.instaclustr.com/paul-brebner/

OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard

  • 1.
    Scaling Open SourceBig Data Cloud Applications is Easy/Hard Paul Brebner Instaclustr—Technology Evangelist ©Instaclustr Pty Limited, 2022 DeveloperWeek 10 May 2022
  • 2.
    Who am I? •Previously • R&D in distributed systems and performance engineering. • Last 5 years • Technology Evangelist for Instaclustr (soon NetApp) • 100+ Blogs, demo applications, talks • Open Source technologies including • Apache Cassandra, Spark, Kafka, Zookeeper • and Redis, OpenSearch, PostgreSQL, Kubernetes, Prometheus, OpenTracing, etc
  • 3.
    Cloud Platform forBig Data Open Source Technologies Latest addition is Workflows with Uber’s Instaclustr Managed Platform ©Instaclustr Pty Limited, 2021
  • 4.
    Cloud Platform forBig Data Open Source Technologies Latest addition is Workflows with Uber’s This talk focuses on Cassandra and Kafka ©Instaclustr Pty Limited, 2021
  • 5.
    Scaling is Easy!Cassandra and Kafka Homogeneous distributed clusters à horizontally scalable www.cassandra.apache.org/_/cassandra-basics.html
  • 6.
    But actually lotsof moving parts (source: http://trumpetb.net/loco/rodsf.html)
  • 7.
    Complications – DCs,Racks, Nodes, Partitions, Replication Factor, Time (for auto-scaling) Rows have a partition key and are stored in different partitions
  • 8.
    Example 1 –Cassandra Auto-Scaling ©Instaclustr Pty Limited, 2021
  • 9.
    Two Ways ofResizing Clusters 1 - Horizontal Scaling • Add nodes, no interruption • But scale up only (not down) • Takes time, puts extra load on cluster as data streams to extra nodes 2 - Vertical Scaling • Replace nodes with bigger (or smaller) node types (more/less cores) • Scale up and down • Takes time, temporary reduction in capacity • Choice of how many nodes are replaced concurrently – by “node” (1 node at a time) or by “rack” (all nodes in a rack) , or in-between
  • 10.
    Cluster resizing time– by node vs. by rack – by rack is faster but …? Cluster = 6 nodes, 3 racks, 2 nodes per rack By node (concurrency 1) By rack (concurrency 2)
  • 11.
    Resizing by node– capacity reduced by 1/6 total nodes each resize operation (simplified model)
  • 12.
    Resizing by rack– capacity reduced by 2/6 nodes each resize operation
  • 13.
    Comparison – resizeby rack faster but has bigger capacity hit during resize
  • 14.
    Observations • If thecapacity during resize is exceeded latencies will increase • Made worse by Cassandra load balancing which assumes equal sized nodes • By node, more nodes in the Cluster reduces the impact of reduced cluster capacity during resizing (some clusters have 100s of nodes) – but will take longer • Many of our clusters have <= 6 nodes
  • 15.
    Auto-scaling model -increasing load à linear regression over 1 hour extrapolated to future We predict the cluster will reach 100% capacity around the 280 minute mark (220 minutes in the future) Extrapolated Measured
  • 16.
    Resize by Rackvs. Node - initiated in time to prevent overloading during resize operation Resize by rack must be initiated sooner c.f. resize by node, even thought it’s faster to resize, as it has less capacity during resize (67% c.f. 83% of initial capacity) By Rack By Node
  • 17.
    Auto-scaling POC –worked! Monitoring API Linear Regression + Rules Provisioning API Rules generalized to allow for • scaling up and down • resizing by any number of nodes concurrently, up to rack size
  • 18.
    Example 2 –Anomaly Detection ©Instaclustr Pty Limited, 2021 JoAnn Morgan Apollo 11 Mission Control
  • 19.
  • 20.
    Massively Scalable AnomalyDetection – Tuning knobs (Orange h/w, yellow s/w) Scaling is (too) Easy! Initially just increased h/w resources
  • 21.
    But scalability notgreat 0 1 2 3 4 5 6 7 8 0 100 200 300 400 500 600 700 Billions checks/day Total Cores Total Cores vs. Billions of checks/day (pre-tuning)
  • 22.
    Tuning required! ScalabilityPost-tuning 0 2 4 6 8 10 12 14 16 18 20 0 100 200 300 400 500 600 700 Billions checks/day Total Cores Total Cores vs. Billions of checks/day (pre-tuning) Billions of checks/day (pre-tuning) Billions of checks/day (post-tuning)
  • 23.
    Tuning – Optimizes/w resources (red arrows) 1 2 3 1. Minimize Kafka Consumers (thread pool 1) 2. Minimize Cassandra Connections 3. Maximize Cassandra client concurrency (thread pool 2)
  • 24.
    Example 3 –What’s really going on - behind the Kafka partitions? ©Instaclustr Pty Limited, 2021 ©Instaclustr Pty Limited 2019, 2021, 2022
  • 25.
    Kafka topic partitionsenable consumer concurrency partitions >= consumers Partition n Topic “Parties” Partition 1 Producer Partition 2 Consumer Group Consumer Consumer Consumers share work within groups Consumer
  • 26.
    High consumer/partition fanout Can be caused by: 1 Design – many topics and/or many consumers 2 Slow consumers à need more consumers to increase throughput
  • 27.
    Kafka write architecture– partition replication
  • 28.
    Benchmarking revealed thatpartitions and replication factor are the culprit 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1 10 100 1000 10000 TPS Partitions Kafka Partitions vs. Throughput Cluster: 3 nodes x 4 cores = 12 cores total Replication Factor 3 (TPS) Replication Factor 1 (TPS)
  • 29.
    Implications? • Bigger Cluster(more nodes, bigger nodes) • Design to minimize topics and consumers • Optimize consumers for minimum time • Always benchmark with many partitions • Blame the Apache Zookeeper? • Responsible for Kafka control • From version 3.0 it’s being replaced by native KRaft protocol • Not yet production ready • May enable more partitions (but may not impact throughput)
  • 30.
    Scaling is MostlyEasy! § Using Scalable Open Source Big Data Technologies § Hosted by suitable Cloud providers § With suitable monitoring, understanding of autoscaling and how different software “knobs” interact, and by scaling incrementally © Instaclustr Pty Limited, 2022
  • 31.