Editor at Large

PyTorch team unveils framework for programming clusters

news

Oct 22, 20252 mins

Monarch framework, currently experimental, allows Python programmers to program distributed systems as if they were just one machine.

container orchestration, clusters, clustering, Kubernetes

The PyTorch team at Meta, stewards of the PyTorch open source machine learning framework, has unveiled Monarch, a distributed programming framework intended to bring the simplicity of PyTorch to entire clusters. Monarch pairs a Python-based front end, supporting integration with existing code and libraries such as PyTorch, and a Rust-based back end, which facilitates performance, scalability, and robustness, the team said. .

Announced October 22, Monarch is a framework based on scalable actor messaging that lets users program distributed systems the way a single machine would be programmed, thus hiding the complexity of distributed computing, the PyTorch team said. Monarch is currently in an experimental stage; installation instructions can be found at meta-pytorch.org.

Monarch organizes processes, actors, and hosts into a scalable multidimensional array, or mesh, that can be manipulated directly. Users can operate on entire meshes, or slices of them, with simple APIs, with Monarch handling distribution and vectorization automatically. Developers can write code as if nothing fails, according to the PyTorch team. But when something does fail, Monarch fails fast by stopping the whole program. Later on, users can add fine-grained fault handling where needed, catching and recovering from failures.

Monarch splits control plane messaging from data plane transfers, enabling direct GPU-to-GPU memory transfers across a cluster. Commands are sent through one path while data moves through another. Monarch integrates with PyTorch to provide tensors that are sharded across clusters of GPUs. Tensor operations look local but are executed across distributed large clusters, with Monarch handling the complexity of coordination across thousands of GPUs, the PyTorch team said.

The PyTorch team warned that in Monarch’s current stage of development, users should expect bugs, incomplete features, and APIs that may change in future versions.

by Paul Krill

Editor at Large

Follow Paul Krill on X

Paul Krill is editor at large at InfoWorld. Paul has been covering computer technology as a news and feature reporter for more than 35 years, including 30 years at InfoWorld. He has specialized in coverage of software development tools and technologies since the 1990s, and he continues to lead InfoWorld’s news coverage of software development platforms including Java and .NET and programming languages including JavaScript, TypeScript, PHP, Python, Ruby, Rust, and Go. Long trusted as a reporter who prioritizes accuracy, integrity, and the best interests of readers, Paul is sought out by technology companies and industry organizations who want to reach InfoWorld’s audience of software developers and other information technology professionals. Paul has won a “Best Technology News Coverage” award from IDG.

Show me more

Topics

About

Policies

Our Network

More

PyTorch team unveils framework for programming clusters

Monarch framework, currently experimental, allows Python programmers to program distributed systems as if they were just one machine.

More from this author

.NET 10 RC 2 features .NET MAUI, Android updates

Anthropic extends Claude Code to browsers

Visual Studio Code taps AI for merge conflict resolution

OpenAI Codex adds SDK, admin tools, Slack integration

Google’s Jules coding agent adds CLI, API

Oracle Java Management Service adds application analyzer

React JS library moving from Meta to the Linux Foundation

JetBrains Amper build tool adds Compose Hot Reload

Show me more

Using the SkiaSharp graphics library in .NET

A practitioner’s primer on deterministic application modernization

How to use keyed services in ASP.NET Core

X-ray vision for your async activity in Python 3.14

Why it's so hard to redistribute standalone Python apps

3 things we've learned about using genAI in coding so far