- Notifications
You must be signed in to change notification settings - Fork 4.7k
Pull requests: deepspeedai/DeepSpeed
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
fix(issue-7701): un-ignore .cuh under deepspeed/ops so multi_tensor_a…
#7739 opened Dec 20, 2025 by leejianwoo-collab Loading…
Fix OnebitLamb NaN propagation with empty parameters
#7736 opened Dec 18, 2025 by Rakshit-gen Loading…
Introduce Megatron-style parallel state management
#7726 opened Dec 15, 2025 by eternalNight • Draft
let allgather and alltoall execute in parallel when both attention and MOE used TP
#7723 opened Dec 11, 2025 by taozhiwei Loading…
Add single parameter allgather optimization for zero3
#7661 opened Oct 31, 2025 by aeeeeeep Loading…
HF2UCP: Converting a
pytorch_model.bin or .safetensors checkpoint to UCP #7212 opened Apr 10, 2025 by Schwidola0607 Loading…
[bugfix] update results of state_dict loading, embedding resizing to secondary partitions (hpz)
#7130 opened Mar 11, 2025 by cyr0930 Loading…
Fix, pipeline model with moe cause error when send grad
#7055 opened Feb 19, 2025 by wukong1992 Loading…
Add
pyproject.toml with legacy build backend to keep most logic in setup.py #7033 opened Feb 13, 2025 by loadams Loading…
4 of 5 tasks
Enabled high-performance Automatic Tensor Parallelism (auto TP) for the MoE models on multiple GPUs/HPUs
#6964 opened Jan 21, 2025 by gyou2021 Loading…
Update sharded_moe.py to support top2 gate with Tutel
#6948 opened Jan 14, 2025 by xenshinu Loading…
Fix: forbid repeated deepspeed.initialize on training objects
#6874 opened Dec 16, 2024 by traincheck-team Loading…
Training ops kernels: Speeding up the Llama-based MoE architectures
#6734 opened Nov 8, 2024 by RezaYazdaniAminabadi • Draft
Previous Next
ProTip! Find all pull requests that aren't related to any open issues with -linked:issue.