Skip to content

Conversation

@yuanlehome
Copy link
Collaborator

@yuanlehome yuanlehome commented Oct 15, 2024

PR types

New features

PR changes

Models

Description

  • chatglm_v2 support block_attn mode, but the accuracy needs to be aligned
  • 修复先前disable掉的诸多单测
  • 略微优化下组网代码
  • 添加USE_FASTER_TOP_P_SAMPLING环境变量用于使用性能更好的top_p_sampling算子
@paddle-bot
Copy link

paddle-bot bot commented Oct 15, 2024

Thanks for your contribution!

@yuanlehome yuanlehome marked this pull request as draft October 15, 2024 07:39
@codecov
Copy link

codecov bot commented Oct 15, 2024

Codecov Report

Attention: Patch coverage is 0% with 104 lines in your changes missing coverage. Please review.

Project coverage is 52.89%. Comparing base (7551730) to head (d19ed92).
Report is 263 commits behind head on develop.

Files with missing lines Patch % Lines
...p/experimental/transformers/chatglm_v2/modeling.py 0.00% 84 Missing ⚠️
...erimental/transformers/fused_transformer_layers.py 0.00% 15 Missing ⚠️
...enlp/experimental/transformers/generation_utils.py 0.00% 5 Missing ⚠️
Additional details and impacted files
@@ Coverage Diff @@ ## develop #9271 +/- ## =========================================== + Coverage 52.80% 52.89% +0.08%  =========================================== Files 660 660 Lines 106869 106929 +60 =========================================== + Hits 56434 56561 +127  + Misses 50435 50368 -67 

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@yuanlehome yuanlehome reopened this Oct 24, 2024
@yuanlehome yuanlehome marked this pull request as ready for review October 24, 2024 06:55
@yuanlehome yuanlehome changed the title [LLM INFER] chatglm_v2 support block_attn [LLM INFER] Fix some bugs and chatglm_v2 support block_attn Oct 24, 2024
else:
return 8192 # Maximum sequence length.
total_max_length: int = field(
default=4096, metadata={"help": "Super parameter. Maximum sequence length(encoder+decoder)."}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个跟npu相关同学确认的吗?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已确认,没问题。

arange_tensor_encoder = paddle.arange(self.config.total_max_length, dtype=self.config.dtype)
alibi = alibi_slopes[None, :, None, None] * arange_tensor_encoder
alibi = (alibi_slopes[None, :, None, None] * arange_tensor_encoder).astype(self.config.dtype)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emm,这个 config dtype保险吗?用户可以改这个值。要不用里面一个tensor的dtype。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个dtype确实需要与config.dtype保持一致的


model = Model.from_pretrained(
predictor_args.total_max_length = config.seq_length
if predictor_args.block_attn:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emm,我建议吧 block_attn 放到config的属性里面,然后 ChatGLMv2InferenceModel 里面自己控制。
这里改的话,后期这样修改的模型太多了。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

但严格来说其实这个不属于每个模型的Config,如果加入如LlamaConfig的话,每个模型的Config里都需要加,先保持这样吧,后面重构的时候,会看下有没有更好的方式

@qingqing01 qingqing01 merged commit 2e8b220 into PaddlePaddle:develop Oct 25, 2024
2 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants