- Notifications
You must be signed in to change notification settings - Fork 5.9k
[Fleet Executor] Construct runtime graph #37158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| Thanks for your contribution! |
d3910f3 to c7b7dca Compare 91ed45e to 26c9f7b Compare | 在关键的地方加一写VLOG(3)的输出用来debug吧。比如推导依赖的部分,比如interceptor_id 与 task_id 、rank等映射的部分。 |
python/paddle/fluid/tests/unittests/test_fleet_executor_multi_devices.py Outdated Show resolved Hide resolved
26c9f7b to b4e48fd Compare b4e48fd to 1bdb86e Compare 1bdb86e to 451a1ac Compare There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
后面复用distributed_strategy是不是更好些,可能还会有sharding_degree
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
因为distributed_strategy.proto在framework目录下,和这个proto不在一个文件夹,在当前文件夹下的CMakeList里调用generic.cmake里定义的proto_library函数,会将protobuf的搜索路径设置为当前文件夹,同时protobuf的import不支持相对路径,所以暂时没想到怎么直接引用distributed_strategy.proto里的定义。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dp、pp、mp以后的顺序可能会变
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
建议把dp_degree、pp_degree、mp_degree这几个封装为一个结构体,当做笛卡尔坐标系,然后加上进程rank号和笛卡尔坐标系的相互转换,可能简洁一点点。然后顺序的问题加个映射也很容易解决
{x, y, z} = rank2coord(pid);
left_x = (x - 1 + xranks) % xranks; left_rank = coord2rank({left_x, y, z})
2a44f1f to 849eb85 Compare
wangxicoding left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
New Features
PR changes
Others
Describe
创建运行时图