Skip to content

Conversation

@dynamicheart
Copy link
Contributor

PR types

Bug Fixes

PR changes

Models

Description

问题:使用LLamaTokenizerFast出现问题

复现代码

from paddlenlp.transformers import LlamaTokenizerFast tokenizer = LlamaTokenizerFast.from_pretrained('meta-llama/Llama-2-13b') output = tokenizer._decode([0, 3, 2, 1]) 

结果:

Traceback (most recent call last): File "test_tokenizer.py", line 6, in <module> output = tokenizer._decode([0, 3, 2, 1]) File "/workspace/mnt/PaddleNLP/paddlenlp/transformers/tokenizer_utils_fast.py", line 653, in _decode else self.clean_up_tokenization_spaces AttributeError: 'LlamaTokenizerFast' object has no attribute 'clean_up_tokenization_spaces' 

修复参考链接:https://github.com/huggingface/transformers/blob/e5d14f39ad82475f238cad41279a5d61ac5db287/src/transformers/tokenization_utils_base.py/#L1616

@paddle-bot
Copy link

paddle-bot bot commented Oct 23, 2024

Thanks for your contribution!

@CLAassistant
Copy link

CLAassistant commented Oct 23, 2024

CLA assistant check
All committers have signed the CLA.

@paddle-bot paddle-bot bot added the XPU label Oct 23, 2024
@codecov
Copy link

codecov bot commented Oct 23, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 52.74%. Comparing base (3443b9f) to head (ee6d354).
Report is 262 commits behind head on develop.

Additional details and impacted files
@@ Coverage Diff @@ ## develop #9304 +/- ## =========================================== + Coverage 52.62% 52.74% +0.12%  =========================================== Files 661 661 Lines 107365 107366 +1 =========================================== + Hits 56499 56631 +132  + Misses 50866 50735 -131 

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@DrownFish19 DrownFish19 changed the title fix tokenizerfast missing attr [Tokenizer] Fix TokenizerFast missing clean_up_tokenization_spaces Oct 23, 2024
@DrownFish19 DrownFish19 merged commit 0102f31 into PaddlePaddle:develop Oct 23, 2024
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3 participants