Skip to content

Conversation

@DrownFish19
Copy link
Collaborator

PR types

Bug fixes

PR changes

Others

Description

Fix decode output with space in decode_token.
For example, new_text = " 123"(space + 123), and prefix_text = " "(space),the clean_up_tokenization_spaces will make new_text to "123".

@paddle-bot
Copy link

paddle-bot bot commented Aug 26, 2024

Thanks for your contribution!

@codecov
Copy link

codecov bot commented Aug 26, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 53.96%. Comparing base (56d293d) to head (0cd1c52).
Report is 316 commits behind head on develop.

Additional details and impacted files
@@ Coverage Diff @@ ## develop #9010 +/- ## =========================================== - Coverage 54.08% 53.96% -0.13%  =========================================== Files 650 652 +2 Lines 103915 104929 +1014 =========================================== + Hits 56200 56621 +421  - Misses 47715 48308 +593 

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@DrownFish19 DrownFish19 changed the title [Tokenizer] Fix decode output with space in decode_token. [Tokenizer] Fix decode output with space in decode_token Aug 27, 2024
Copy link
Contributor

@ZHUI ZHUI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@DrownFish19 DrownFish19 merged commit c93bada into PaddlePaddle:develop Sep 19, 2024
@DrownFish19 DrownFish19 deleted the dev_20240826_fix_decode_token branch September 19, 2024 08:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants