This document discusses Ray, a distributed Python framework designed to handle complex computing requirements driven by machine learning and AI workloads. It covers Ray's advantages over Spark in handling non-uniform data and tasks, and provides guidance on how to get started using Ray alongside existing Python libraries. Additionally, it emphasizes the importance of user feedback and offers resources for community engagement and learning about Ray.
@deanwampler Usage% 2012 2014 2016 Ti 05 Hence,there is a pressing need for robust, easy to use solutions for distributed Python Model sizes and therefore compute requirements outstripping Moore’s Law Moore’s Law (2x every 18 months) 35x every 18 months! Python growth driven by ML/AI and other data science workloads
@deanwampler Diverse Compute RequirementsMotivated Creation of Ray! Simulator (game engine, robot sim, factory floor sim…) Neural network “stuff” And repeated play, over and over again, to train for achieving the best reward Complex agent?
@deanwampler Microservices (simulators, too) REST APIGateway µ-service 1 µ-service 2 µ-service 3 Production is a pain API GatewayAPI Gateway µ-service 1 µ-service 2µ-service 2µ-service 2µ-service 2µ-service 2 µ-service 3µ-service 3µ-service 3 ! Each microservice has a different number of instances for scalability & resiliency ! But they have to be managed explicitly
@deanwampler Where Spark Excels ▪Massive-scale data sets ▪ Uniform, records with a schema ▪ Efficient, parallelized transformations ▪ SQL ▪ Batch analytics ▪ Stream processing ▪ Intuitive, high-level abstractions for data science & engineering tasks
21.
@deanwampler Where Ray Excels ▪Highly non-uniform data graphs ▪ Think typical “in-memory object models”, but distributed ▪ Handles distributed state intuitively ▪ Highly non-uniform task graphs ▪ Small to large scale tasks ▪ Intuitive API for the “90%” of cases ▪ Supports compute problems ranging from ▪ general services, games, and simulators ▪ to ▪ stochastic gradient descent, HPO, …
@deanwampler If you’re alreadyusing these… ▪ asyncio ▪ joblib ▪ multiprocessing.Pool ▪ Use Ray’s implementations ▪ Drop-in replacements ▪ Change import statements ▪ Break the one-node limitation! For example, from this: from multiprocessing.pool import Pool To this: from ray.util.multiprocessing.pool import Pool
24.
@deanwampler Ray Community andResources ▪ ray.io - entry point for all things Ray ▪ Tutorials: anyscale.com/academy ▪ github.com/ray-project/ray ▪ anyscale.com/events