Skip to content

Conversation

@LiYuRio
Copy link
Contributor

@LiYuRio LiYuRio commented Dec 14, 2021

PR types

Bug fixes

PR changes

Others

Describe

修复之前多卡下FleetExecutor停不下来的问题,修改对象的持有关系,将MessageBus从单例变为FleetExecutor的成员,由FleetExecutor管理它的生命周期,向Carrier传递MessageBus指针,Interceptor持有拥有者Carrier的指针,一个FleetExecutor未来会拥有多个Carrier,最终在FleetExecutor的析构函数中向Carrier上的源Interceptor发送停止的控制消息。

c7de75c79446a8876311169db385c7fe

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@LiYuRio LiYuRio force-pushed the dev_fix_fleet_executor_stop branch from 9411ab4 to 8088ffe Compare December 14, 2021 09:31
Copy link
Contributor

@FeixLiu FeixLiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

用GPT对一下pp=dp=mp=2的精度再合吧,以后的pr都跑一下精度再合?

@LiYuRio LiYuRio force-pushed the dev_fix_fleet_executor_stop branch from 8088ffe to 06b5458 Compare December 16, 2021 06:34
@LiYuRio LiYuRio force-pushed the dev_fix_fleet_executor_stop branch from 06b5458 to 1c03784 Compare December 16, 2021 06:37
<< request->src_id() << " to interceptor " << request->dst_id()
<< ", with the message: " << request->message_type();
FleetExecutor::GetCarrier().EnqueueInterceptorMessage(*request);
response->set_rst(true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

突然想到,这其实可以set_rst为EnqueueInterceptorMessage的返回值,下个pr可以改一下。

Copy link
Contributor

@wangxicoding wangxicoding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@FeixLiu
Copy link
Contributor

FeixLiu commented Dec 17, 2021

rebase一下最新develop分支跑一下精度吧 🧐

Copy link
Contributor

@FeixLiu FeixLiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@chenwhql chenwhql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for PADDLE_ENFORCE

@FeixLiu FeixLiu merged commit 843435f into PaddlePaddle:develop Dec 17, 2021
@LiYuRio LiYuRio deleted the dev_fix_fleet_executor_stop branch December 31, 2021 10:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

4 participants