The Future of Computing is Distributed

Distributed computing is distributed Ion Stoica December 4, 2020

ARPA Network (1970s) HPC (1980s) Web (1990s) Big Data (2000s) Distributed systems not new…

Distributed computing still the exception… Inaccessible to most developers Few universities teach distributed computing This will change… Distributed computing will be the norm, rather than the exception

What is different this time? The rise of deep learning (DL) The end of Moore’s Law Apps becoming AI centric

(https://openai.com/blog/ai-and-compute//) DL demands growing faster than ever 35x every 18 months

(https://openai.com/blog/ai-and-compute//) Not only esoteric apps ResNet 50 ResNeXt 101 im age processing translation

GPT-3 (175B) Turing Proj. (17B) GPT-2 (8.3 B) GPT-1 (1.5B) BERT GPT-1 Transformer ResNet-50 20x every 18 months Not only processing, but memory

GPT-3 (175B) Turing Proj. (17B) GPT-2 (8.3 B) GPT-1 (1.5B) BERT GPT-1 Transformer ResNet-50 200x every 18 m onths! Not only processing, but memory

What is different? The rise of deep learning (DL) The end of Moore’s Law Apps becoming AI centric

The end of Moore’s Law From 2x every 18 months to 1.05x every 18 months.

(https://openai.com/blog/ai-and-compute//) Hardware cannot keep up 35x every 18 months CPU

(https://openai.com/blog/ai-and-compute//) Hardware cannot keep up 35x every 18 months Moore’s Law (2x every 18 months) CPU

What about specialized hardware? Trade generality for performance Not enough! Just extending Moore’s Law

(https://openai.com/blog/ai-and-compute//) Specialized hardware not enough 35x every 18 months Moore’s Law (2x every 18 months) CPU GPU TPU

Turing Proj. (17B) GPT-2 (8.3 B) GPT-1 (1.5B) BERT GPT-1 Transformer ResNet-50 20x every 18 months (https://devblogs.nvidia.com/training-bert-with-gpus/) Memory dwarfed by demand GPU memory increased by just 1.45x every 18 months No way out but distributed!

Apps becoming AI centric... ... and integrating other distributed workloads

HPC Big Data Micro- services Before: Isolated distributed workloads

HPC AI Big Data Micro- services AI integrates with other dist. workloads

HPC AI Training, Simulations Big Data Micro- services Distributed training using MPI Leverage industrial simulators for RL AI integrates with other dist. workloads

HPC AI Training, Simulations Micro- services Big Data Log processing, Featurization Online learning, RL AI integrates with other dist. workloads

HPC AI Training, Simulations Big Data Log processing, Featurization Micro- services Serving, Business logic Inference, online learning, web backends AI integrates with other dist. workloads

(https://devblogs.nvidia.com/training-bert-with-gpus/)Distributed apps becoming the norm No way to scale AI, but going distributed HPC AI Training, Simulations Big Data Log processing, Featurization Micro- services Serving, Business logic Apps becoming AI centric (and integrating with other distributed workloads)

No general solution for e2e AI apps DL Big Data MicroservicesHPC ?

Natural solution... Stitch together existing frameworks Distributed systems Model Serving Training Distributed systems Distributed systems Hyperparam. Tuning Distributed systems Data processing Simulation Distributed systems Business logic Distributed systems

RAY Universal framework for distributed computing

Model Serving Training Hyperparam. Tuning Data processing Simulation Business Logic Libraries Universal framework for distributed computing (Python and Java) *

Three key ideas Execute remotely functions as tasks, and instantiate remotely classes as actors ○ Support both stateful and stateless computations Asynchronous execution using futures ○ Enable parallelism Distributed (immutable) object store ○ Eﬃcient communication (send arguments by reference)

def read_array(file): # read ndarray “a” # from “file” return a def add(a, b): return np.add(a, b) a = read_array(file1) b = read_array(file2) sum = add(a, b) Function class Counter(object): def __init__(self): self.value = 0 def inc(self): self.value += 1 return self.value c = Counter() c.inc() c.inc() Class

@ray.remote def read_array(file): # read ndarray “a” # from “file” return a @ray.remote def add(a, b): return np.add(a, b) a = read_array(file1) b = read_array(file2) sum = add(a, b) @ray.remote class Counter(object): def __init__(self): self.value = 0 def inc(self): self.value += 1 return self.value c = Counter() c.inc() c.inc() Function → Task Class → Actor

@ray.remote def read_array(file): # read ndarray “a” # from “file” return a @ray.remote def add(a, b): return np.add(a, b) id1 = read_array.remote(file1) id2 = read_array.remote(file2) id = add.remote(id1, id2) sum = ray.get(id) @ray.remote(num_gpus=1) class Counter(object): def __init__(self): self.value = 0 def inc(self): self.value += 1 return self.value c = Counter.remote() id4 = c.inc.remote() id5 = c.inc.remote() Function → Task Class → Actor

@ray.remote def read_array(file): # read ndarray “a” # from “file” return a @ray.remote def add(a, b): return np.add(a, b) id1 = read_array.remote(file1) id2 = read_array.remote(file2) id = add.remote(id1, id2) sum = ray.get(id) file1 file2 Node 1 Node 2 • Blue variables are Object IDs • Similar to futures read_array id1 Return id1 (future) immediately, before read_array() finishes Task API

@ray.remote def read_array(file): # read ndarray “a” # from “file” return a @ray.remote def add(a, b): return np.add(a, b) id1 = read_array.remote(file1) id2 = read_array.remote(file2) id = add.remote(id1, id2) sum = ray.get(id) ﬁle1 ﬁle2 Node 1 Node 2 read_array id1 read_array id2 Dynamic task graph: build at runtime • Blue variables are Object IDs • Similar to futures Task API

@ray.remote def read_array(file): # read ndarray “a” # from “file” return a @ray.remote def add(a, b): return np.add(a, b) id1 = read_array.remote(file1) id2 = read_array.remote(file2) id = add.remote(id1, id2) sum = ray.get(id) • Blue variables are Object IDs • Similar to futures file1 file2 Node 1 Node 2 read_array id1 read_array id2 add id Node 3 Every task scheduled, but not finished yet Task API

@ray.remote def read_array(file): # read ndarray “a” # from “file” return a @ray.remote def add(a, b): return np.add(a, b) id1 = read_array.remote(file1) id2 = read_array.remote(file2) id = add.remote(id1, id2) sum = ray.get(id) • Blue variables are Object IDs • Similar to futures ﬁle1 ﬁle2 Node 1 Node 2 read_array id1 read_array id2 add id Node 3 ray.get() block until result available Task API

@ray.remote def read_array(file): # read ndarray “a” # from “file” return a @ray.remote def add(a, b): return np.add(a, b) id1 = read_array.remote(file1) id2 = read_array.remote(file2) id = add.remote(id1, id2) sum = ray.get(id) • Blue variables are Object IDs • Similar to futures ﬁle1 Node 1 Node 2 Node 3 read_array ﬁle2 read_array add sumTask graph executed to compute sum Task API

universal framework for distributed computing Native Libraries Most popular scalable RL library ● PyTorch and TF support ● Largest # of algorithms Best Distributed Library Ecosystem

universal framework for distributed computing Native Libraries A popular hyperparameter tuning library: “For me, and I say this as a Hyperopt maintainer, Tune is the clear winner down the road. Tune is fairly well architected and it integrates with everything else, and it’s built on top of Ray so it has all the benefits stemming from that as well. … In 2020, I would certainly bet on Tune.” -- Max Pumperla, HyperOpt creator Best Distributed Library Ecosystem

universal framework for distributed computing Native Libraries Just launched. Promising start but a long way to go. Best Distributed Library Ecosystem

universal framework for distributed computing Native Libraries Best Distributed Library Ecosystem

universal framework for distributed computing Native Libraries 3rd Party Libraries Best Distributed Library Ecosystem ●Horovod: most popular distributed training library. ●Optuna and hyperopt: popular hyperparameter search libraries

universal framework for distributed computing Native Libraries 3rd Party Libraries The two most popular NLP libraries using Ray to scale up Best Distributed Library Ecosystem

universal framework for distributed computing Native Libraries 3rd Party Libraries ModelArts Best Distributed Library Ecosystem Major ML cloud platforms embedding Ray/RLlib/Tune

universal framework for distributed computing Native Libraries 3rd Party Libraries ModelArts Best Distributed Library Ecosystem Dask running on Ray ● Faster, more resilient, and more scalable

universal framework for distributed computing Native Libraries 3rd Party Libraries ModelArts Popular experiment tracking platform ● One-line integration with Ray Tune. Best Distributed Library Ecosystem

universal framework for distributed computing Native Libraries 3rd Party Libraries ModelArts Intel’s unified Data Analytics and AI platform. ● Integrated Ray together with Spark, TF, etc ● Use cases include streaming ML with customers such as Burger King and BMW. Best Distributed Library Ecosystem

universal framework for distributed computing Native Libraries 3rd Party Libraries ModelArts Best Distributed Library Ecosystem

universal framework for distributed computing Native Libraries 3rd Party Libraries ModelArts Your apps here! Best Distributed Library Ecosystem

Refactored actor management Placement groups Java API Direct calls between workers Azure support in cluster launcher Original dashboard Refactor worker to C++ Initial port to Bazel Significant community contributions

Summary Ray: universal framework for distributed computing Comprehensive ecosystem of scalable libraries universal framework for distributed computing Native Libraries 3rd Party Libraries ModelArts Your apps here! https://github.com/ray-project/ray

The Future of Computing is Distributed

More Related Content

What's hot

Similar to The Future of Computing is Distributed

More from Alluxio, Inc.

Recently uploaded

The Future of Computing is Distributed