Skip to content

Conversation

@zhiqiu
Copy link
Contributor

@zhiqiu zhiqiu commented Mar 30, 2022

PR types

Bug fixes

PR changes

Others

Describe

add depend when doing fuse_all_optimizer on program

If fuse_all_optimizer is applied to a program, it rewrites the program with fused tensors as below,

fused_grads = coalesce_tensor(grad0, grad1, ...) fused_params = coalesce_tensor(param0, param1, ...) ... grad0 = grad_op0(...) grad1 = grad_op1(...) ... fused_params = adam(fused_params, fused_grads, ...) 

If the operators in the program are executed op-by-op, it is ok.
However, it the program is executed by an Executor that may run op parallel, the adam and grad_op0, grad_op1 may run in any order since there are no explicit dependencies. And it is wrong.

This PR adds explicit dependency in that case, by adding an empty depend op.

fused_grads = coalesce_tensor(grad0, grad1, ...) fused_params = coalesce_tensor(param0, param1, ...) ... grad0 = grad_op0(...) grad1 = grad_op1(...) ... fused_params = depend(fused_params, grad0, grad1, ...) fused_params = adam(fused_params, fused_grads, ...) 
@zhiqiu zhiqiu requested a review from sneaxiy March 30, 2022 15:03
@zhiqiu zhiqiu merged commit 3b00dc9 into PaddlePaddle:develop Mar 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants