Skip to content

Conversation

@putcn
Copy link
Contributor

@putcn putcn commented Aug 24, 2017

Since the work mode between trainer and master changed from "master actively find trainer and deliver training job" to "master passively listen to trainer's request for task and deliver training job", this PR is for the design doc change according to the implementation update.

@putcn putcn requested review from helinwang and typhoonzero August 24, 2017 18:50
Copy link
Contributor

@helinwang helinwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Very appreciated!!!

@helinwang
Copy link
Contributor

CI unit test failed:

 [13:16:19]	.E [13:16:19]	====================================================================== [13:16:19]	ERROR: test_uniform_random_gpu (__main__.UniformRandomTest) [13:16:19]	---------------------------------------------------------------------- [13:16:19]	Traceback (most recent call last): [13:16:19]	File "test_uniform_random_op.py", line 13, in test_uniform_random_gpu [13:16:19]	self.uniform_random_test(place=core.GPUPlace(0)) [13:16:19]	File "test_uniform_random_op.py", line 28, in uniform_random_test [13:16:19]	ctx = core.DeviceContext.create(place) [13:16:19]	RuntimeError: basic_string::_M_construct null not valid [13:16:19] [13:16:19]	---------------------------------------------------------------------- [13:16:19]	Ran 2 tests in 0.510s 

Can you maybe try rebase / pull to the latest develop branch code (possibly contains the fix to the problem).

@putcn putcn merged commit 81b3bfb into PaddlePaddle:develop Aug 25, 2017
@putcn putcn deleted the cluster-train-doc-update branch August 31, 2017 03:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants