How Totango uses Apache Spark

Enterprise Grade Spark Processing at Totango 2015-11-10 Oren Raboy, VP Eng. @ Totango

AGENDA (PART 1) • Background about Totango and our data architecture • Spark in the Totango Architecture • Quality: Testing Spark code in production

About us: Founded: 2010 Offices: TLV, SF Team: ~60 Customers: ~200

We help online business make their customers more successful through use of data Totango:

~ 500M accounts ~ $5B revenue under management ~ 100M events per day Our Customers: The Worlds Leading Cloud Services

ANALYTICS • Usage Metrics • Trends over time • Trends across customers • Health score

• Alerts • Triggered Workflows • Email Campaigns AUTOMATION

Totango Data Architecture Collection Real-time processing Batch processing Pixel 3rd Party (SFDC) CSV Serving Layer • ‘Lambda Architecture’ • Hosted on AWS • AWS and Open-source technologies • Java with a dash of Python

Totango Data Architecture Pixel 3rd Party (SFDC) CSV • Hosted on AWS • ‘Lambda Architecture’ • AWS and Open-source technologies • Java with a dash of Python Kinesis Kinesis S3 ELB

Batch Processing • Executed once a day (midnight at customer’s local-time) • Each task calculates a set of account-metrics (e.g. Health, Change) • One Spark cluster runs all tasks for all customers • Pipeline executed by Pipeline Runner, using Spotify Luigi calc some metrics calc other metrics more merge results Some dependent computation Merge results Into final document Raw Events Account Documents

Environment • Multi tenant: Shared infrastructure for all Totango customers (Services) • Daily, hourly and on-demand schedule • Standalone Spark cluster on AWS EC2 instances • Input and Output on S3. Final results also indexed on Elasticsearch Service A calc some metrics calc other metrics more merge results Some dependent computation Merge results Into final document Raw Events Account Documents Service A calc some metrics calc other metrics more merge results Some dependent computation Merge results Into final document Raw Events Account Documents Service XYZ calc some metrics calc other metrics more merge results Some dependent computation Merge results Into final document Raw Events Account Documents

Requirements from infrastructure: • Reliability: Calculate metrics accurately at all times • Velocity: Frequent release of new data processing code Challenge: High quality and highly automated regression testing calc some metrics calc some metric more merge results Some dependent computation Merge results Into final document Raw Events Account Documents NEW VERSION How do we make sure the new version didn’t break anything?

calc some metrics merge results Some dependent computation Merge results Into final document Raw Events Account Documents NEW VERSION SHADOW OLD VERSION compare csv Testing In Production: How • Before deployment, run release-candidate ‘side by side’ older version. • New version runs in Shadow mode and does not propagate results • Compare old and new version results. Output unexpected diffs • Deploy to production only if no diffs across all customer data sets

1. 2. 3. 4. 5. Unit testing Test environment: Integration testing Side by side testing in production of new code New code rolled-out, old version side-by-side as backup Rollout complete! Deployment Flow • We know the new version works correctly • We do not need to think of all the corner test-cases • We do not need to write lots of regression tests

QUESTIONS? • labs.totango.com <-- engineering team blog • oren@totango.com <-- me! • Yes, we are hiring!

How Totango uses Apache Spark

More Related Content

What's hot

Viewers also liked

Similar to How Totango uses Apache Spark

Recently uploaded

How Totango uses Apache Spark

Editor's Notes