Skip to content

Conversation

@specture724
Copy link
Collaborator

No description provided.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a CUDA out-of-memory (OOM) issue related to unnecessary process group creation when using store-based barriers for full broadcast operations.

Key Changes:

  • Avoids creating a redundant process group when ranks is None or empty by conditionally calling dist.new_group() only when a subset of ranks is specified
  • Updates type hints to reflect that ranks_group can be None in _detect_bucket_size and _update_per_bucket methods

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@blahgeek blahgeek merged commit 88370e2 into MoonshotAI:main Dec 23, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants