Skip to content

集群训练打开sparse update后报错 Check failed: numPorts > 0 (0 vs. 0) #8875

@SearchVera

Description

@SearchVera

未打开sparse update可以跑通完成训练,打开sparse update修改了两个地方

  1. 网络结构配置sparse_update=True

`

hd1 = paddle.layer.fc( input=feature, size=hidden_layer_size, act=paddle.activation.Tanh(), layer_attr=paddle.attr.Extra(drop_rate=0.5), #param_attr=paddle.attr.Param(initial_std=1.0 / hidden_layer_size)) param_attr=paddle.attr.Param(initial_std=1.0 / hidden_layer_size, sparse_update=True))` 
  1. 提交命令use_remote_sparse设为1

paddle cluster_train \ --config ${model_config_file} \ --use_remote_sparse 1 \

打开后报错:
image

更新算法使用的AdaGrad:

adagrad_optimizer = paddle.optimizer.AdaGrad( learning_rate=1e-3, regularization=paddle.optimizer.L2Regularization(rate=1e-3))

Metadata

Metadata

Assignees

Labels

User用于标记用户问题

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions