【Hackathon 6th No.35】support kwargs for recompute when use_reentrant == True -part #63337
Add this suggestion to a batch that can be applied as a single commit. This suggestion is invalid because no changes were made to the code. Suggestions cannot be applied while the pull request is closed. Suggestions cannot be applied while viewing a subset of changes. Only one suggestion per line can be applied in a batch. Add this suggestion to a batch that can be applied as a single commit. Applying suggestions on deleted lines is not supported. You must change the existing code in this line in order to create a valid suggestion. Outdated suggestions cannot be applied. This suggestion has been applied or marked resolved. Suggestions cannot be applied from pending reviews. Suggestions cannot be applied on multi-line comments. Suggestions cannot be applied while the pull request is queued to merge. Suggestion cannot be applied right now. Please check back later.
PR Category
Auto Parallel
PR Types
Improvements
Description
当下 use_reentrant == True 时会使用 PyLayer 来实现。但 PyLayer 目前不支持以 dict 形式传入 Tensor 类型参数(因为以 dict 形式传入的 Tensor 不会创建反向节点、反向边)
为了提升分布式训练的易用性,本 PR支持当 use_reentrant == True 时 recompute 使用 dict 形式传入 Tensor 类型参数。主要思路为 将
position-args + keyword-args重排成position-args性能测试数据如下:
测试环境:4 卡 3090,Llama2 模型 num_hidden_layer hack 为 4
收集第30个step的性能数据:
Llama2 测试脚本如下:
修改方式
paddlenlp/transformers/llama/modeling_auto.py中所有启用recompute的地方(一共3处)GPT 运行脚本如下:
paddlenlp/transformers/gpt/modeling_auto.py修改如下: