Skip to content

Conversation

@typhoonzero
Copy link
Contributor

As discussed in #7947 (comment), we can delete fan_in attribute in listen_and_serv op, so RPC calls can be retried when the network sucks.

@CLAassistant
Copy link

CLAassistant commented Feb 11, 2018

CLA assistant check
All committers have signed the CLA.

@typhoonzero typhoonzero changed the title [WIP] Enhancement/remove fan_in in RPC server side Enhancement/remove fan_in in RPC server side Feb 13, 2018
@typhoonzero typhoonzero changed the title Enhancement/remove fan_in in RPC server side Enhancement/transpiler rename grad vars to add trainer id, so RPC call can be retried. Feb 13, 2018
Copy link
Contributor

@gongweibao gongweibao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

optimize_ops, params_grads, pservers=pserver_endpoints, trainers=trainers)
optimize_ops,
params_grads,
0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to get trainer_id from ENV or command-line arguments.
I will fix it in a new PR.

# step3
optimize_block = pserver_program.create_block(0)
# step 4
# Create a union-find data struct from optimize ops,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed documentation!

Copy link
Contributor

@helinwang helinwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@typhoonzero typhoonzero merged commit c490f1b into PaddlePaddle:develop Feb 22, 2018
@typhoonzero typhoonzero deleted the no_counter_on_pserver branch February 24, 2018 03:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

4 participants