- Notifications
You must be signed in to change notification settings - Fork 1.9k
Add maxTokenCount parameter to CountTokens method #7392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Updated the CountTokens method in Tokenizer.cs to include a maxTokenCount parameter for limiting token counts. Added tests in TiktokenTests.cs and TokenizerTests.cs to verify the new functionality and ensure correct behavior with the maximum token count.
@dotnet-policy-service agree company="Microsoft" |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@ ## main #7392 +/- ## ======================================= Coverage 68.88% 68.88% ======================================= Files 1473 1473 Lines 270814 270827 +13 Branches 27884 27884 ======================================= + Hits 186553 186572 +19 + Misses 76985 76983 -2 + Partials 7276 7272 -4
Flags with carried forward coverage won't be shown. Click here to find out more.
|
| Adding a parameter like this, even an optional one, is a binary breaking change. |
To make this change compatibly it must an overload. |
| We need to be careful though when adding the new overloads because some tokenizers implementing such functionality already
|
| @pgroene just curious what tokenizer you are using? and is this a blocking issue for you or can you wait? I am asking to ensure we'll get it right and ensure this is going to work with all tokenizers without any breaking change. |
| We are using the tiktokentokenizer. |
| I talked offline to @pgroene and I suggested to him to use the following API which should give him the same result:
I am closing this PR for now and we already have issue to track adding the other API but it is not a priority now as there is workaround for it. |
Updated the CountTokens method in Tokenizer.cs to include a maxTokenCount parameter for limiting token counts. Added tests in TiktokenTests.cs and TokenizerTests.cs to verify the new functionality and ensure correct behavior with the maximum token count.
We are excited to review your PR.
So we can do the best job, please check:
Fixes #nnnnin your description to cause GitHub to automatically close the issue(s) when your PR is merged.fixes #7391