Ray: Enterprise-Grade, Distributed Python Dean Wampler Anyscale
@deanwampler Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.
Agenda Why Ray? Demo When to use Spark? When to use Ray? How to get started with Ray
Why Ray?
@deanwampler Usage% 2012 2014 2016 Ti 05 Hence, there is a pressing need for robust, easy to use solutions for distributed Python Model sizes and therefore compute requirements outstripping Moore’s Law Moore’s Law (2x every 18 months) 35x every 18 months! Python growth driven by ML/AI and other data science workloads
Reinforcement Learning: Motivation for Ray
@deanwampler Decisions (actions) Consequences (observations, rewards) environmentagent
@deanwampler Beating Lee Sedol…
@deanwampler ▪Observations ▪ Board state ▪Actions ▪ Where to place stones ▪Rewards ▪ 1 if win ▪ 0 otherwise AlphaGo (Silver et al. 2016)
@deanwampler
@deanwampler Diverse Compute Requirements Motivated Creation of Ray! Simulator (game engine, robot sim, factory floor sim…) Neural network “stuff” And repeated play, over and over again, to train for achieving the best reward Complex agent?
The Ray Ecosystem
@deanwampler Hyperparameter Tuning Training Simulation Model Serving
@deanwampler Microservices (simulators, too) REST API Gateway µ-service 1 µ-service 2 µ-service 3 Nice! (In theory…)
@deanwampler Microservices (simulators, too) REST API Gateway µ-service 1 µ-service 2 µ-service 3 Production is a pain API GatewayAPI Gateway µ-service 1 µ-service 2µ-service 2µ-service 2µ-service 2µ-service 2 µ-service 3µ-service 3µ-service 3 ! Each microservice has a different number of instances for scalability & resiliency ! But they have to be managed explicitly
@deanwampler Ray Cluster task/ actortask/ actor task/ actor task/ actortask/ actor task/ actor task/ actor task/ actor task/ actor task/ actor task/ actor task/ actor task/ actor task/ actor Microservices (simulators, too) REST API Gateway µ-service 1 µ-service 2 µ-service 3 Back to simplicity ! Back to one “logical” instance ! Ray handles scaling transparently
Demo!
When to use Spark When to use Ray
@deanwampler Where Spark Excels ▪ Massive-scale data sets ▪ Uniform, records with a schema ▪ Efficient, parallelized transformations ▪ SQL ▪ Batch analytics ▪ Stream processing ▪ Intuitive, high-level abstractions for data science & engineering tasks
@deanwampler Where Ray Excels ▪ Highly non-uniform data graphs ▪ Think typical “in-memory object models”, but distributed ▪ Handles distributed state intuitively ▪ Highly non-uniform task graphs ▪ Small to large scale tasks ▪ Intuitive API for the “90%” of cases ▪ Supports compute problems ranging from ▪ general services, games, and simulators ▪ to ▪ stochastic gradient descent, HPO, …
Getting started with Ray
@deanwampler If you’re already using these… ▪ asyncio ▪ joblib ▪ multiprocessing.Pool ▪ Use Ray’s implementations ▪ Drop-in replacements ▪ Change import statements ▪ Break the one-node limitation! For example, from this: from multiprocessing.pool import Pool To this: from ray.util.multiprocessing.pool import Pool
@deanwampler Ray Community and Resources ▪ ray.io - entry point for all things Ray ▪ Tutorials: anyscale.com/academy ▪ github.com/ray-project/ray ▪ anyscale.com/events
@deanwampler Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.
Ray: Enterprise-Grade, Distributed Python

Ray: Enterprise-Grade, Distributed Python