- Notifications
You must be signed in to change notification settings - Fork 5.9k
heter for collective #37613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
heter for collective #37613
Conversation
| Thanks for your contribution! |
sandyhouse left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
看着都是动态图的,不能支持静态图吧?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Broadcast?Broadcast是一个单词。另外,这个接口没有实现,为什么还要添加这个接口呢?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- 同上。
- gloo接口为什么需要传入ring_id?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BroadCast -> Broadcast?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BroadCast -> Broadcast?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上
paddle/fluid/imperative/reducer.cc Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Abort?
sandyhouse left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
zhiqiu left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for const_cast
PR types
New features
PR changes
Others
Describe
Heterogenous mix training represents the model training with heterogenous hardwares. Dygraph mode is only supported now. GPU/NPU/XPU are targeting devices for this prototype work.
The basic idea is very similar as the use of hierarchical communication topology. The low layer reduce the data within each node, while the upper layer reduce across all global nodes.