memory optimization for dynamic RNN #8041

QiJune · 2018-02-01T09:52:37Z

Have tested in machine translation demo.

In the first batch training, memory reduced from 5728446208 to 1115982080, saves 80.5% memory.

It seems that delete_var operator has no contribution for the result. I will remove it.

dzhwinter

My point is same with you. This PR will save a huge memory for while_grad, so we will merge it early.
Please fix the issues later.

dzhwinter · 2018-02-01T13:19:12Z

paddle/operators/while_op.cc

 cur_scope.Rename(new_inside_name, inside_grad_name);
 }
+ dev_ctx.Wait();
+ const_cast<framework::Scope &>(scope).DeleteScope(&cur_scope);


Nice catch!
This fix will solve the headache OOM.

dzhwinter · 2018-02-01T13:21:15Z

paddle/framework/scope.h

 /// Drop all kids scopes belonged to this scope.
 void DropKids();

+ void EraseVars(std::vector<std::string>& var_names);


Since our every variable will be released after we destruct the scope, In my humble view, this interface should never be used.

dzhwinter

LGTM!

tonyyang-svail · 2018-02-01T18:26:49Z

paddle/operators/while_op.cc

 sum_op->Run(cur_scope, dev_place);
 cur_scope.Rename(new_inside_name, inside_grad_name);
 }
+ dev_ctx.Wait();


what is the cost of adding this dev_ctx.Wait();?

I have not make detailed test yet. But if we do not delete the step scope, we can not training a larger RNN model because of OutOfMemory.

QiJune added 7 commits February 1, 2018 12:19

init

6de62d5

add delete operator

bc801ad

debug

8ec6ebb

add wait

52d5f50

clean code

f123cc0

fix bug

e8d9098

fix bug

48ec945

QiJune requested review from dzhwinter, reyoung, tonyyang-svail and wangkuiyi February 1, 2018 09:53

refine code

85f812a

dzhwinter previously approved these changes Feb 1, 2018

View reviewed changes

remove unused code

49b768f

QiJune dismissed dzhwinter’s stale review via 49b768f February 1, 2018 13:52

dzhwinter approved these changes Feb 1, 2018

View reviewed changes

tonyyang-svail reviewed Feb 1, 2018

View reviewed changes

QiJune merged commit c1ac5b6 into PaddlePaddle:develop Feb 2, 2018

QiJune mentioned this pull request Feb 2, 2018

[WIP]Debug RNN memory optimization #8022

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

memory optimization for dynamic RNN #8041

memory optimization for dynamic RNN #8041

Uh oh!

QiJune commented Feb 1, 2018 •

edited

Loading

dzhwinter left a comment

dzhwinter Feb 1, 2018

dzhwinter Feb 1, 2018

dzhwinter left a comment

tonyyang-svail Feb 1, 2018

QiJune Feb 2, 2018

Labels

3 participants

Uh oh!

memory optimization for dynamic RNN #8041

memory optimization for dynamic RNN #8041

Uh oh!

Conversation

QiJune commented Feb 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

dzhwinter left a comment

Choose a reason for hiding this comment

dzhwinter Feb 1, 2018

Choose a reason for hiding this comment

dzhwinter Feb 1, 2018

Choose a reason for hiding this comment

dzhwinter left a comment

Choose a reason for hiding this comment

tonyyang-svail Feb 1, 2018

Choose a reason for hiding this comment

QiJune Feb 2, 2018

Choose a reason for hiding this comment

Labels

3 participants

QiJune commented Feb 1, 2018 •

edited

Loading