You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: paddlenlp/trainer/training_args.py
+7-2Lines changed: 7 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -293,6 +293,7 @@ class TrainingArguments:
293
293
enable_stage1_allgather_overlap, overlap stage1 V2 allgather with next step forward computation. There are some constraints for the overlap, such as the logging_step should be bigger than 1 for allgather overlap forward compute and no other sync could be called during the training for allgather overlap.
294
294
disable_stage1_reduce_avg, replace reduce_avg with original reduce_sum+scale in stage1, which can be used for accuracy verification.
295
295
enable_release_grads, reduce peak memory usage by releasing gradients after each iteration. The creation of gradients will be postponed until backward propagation of the next iteration.
296
+
enable_fuse_optimizer_states, fuse optimizer states to a single storage.
296
297
recompute (`bool`, *optional*, defaults to `False`):
297
298
Recompute the forward pass to calculate gradients. Used for saving memory.
298
299
Only support for networks with transformer blocks.
0 commit comments