- Notifications
You must be signed in to change notification settings - Fork 1k
Pull requests: huggingface/tokenizers
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Use
unicode-normalization instead of unicode-normalization-alignments #1912 opened Dec 14, 2025 by IvanIsCoding Loading…
Providing byte level offsets for effective alignment in Cross-Tokenizer On-Policy Distillation Feature Request
#1880 opened Oct 30, 2025 by JqzChandler Loading…
feat: allow BPETrainer to be seeded with a set of initial tokens
#1862 opened Sep 6, 2025 by henrycharlesworth Loading…
Fix unsigned integer underflow issue with truncation
#1859 opened Sep 1, 2025 by maxdebayser Loading…
Adding multiprocessing for sentencepiece_extractor
#1804 opened Jun 19, 2025 by AamodThakur Loading…
Expose
Encoding attributes via the buffer protocol interface #1789 opened Jun 4, 2025 by mariosasko Loading…
Pre-tokenizers that support multi-word/non-whitespace BPE in single pass
#1753 opened Mar 22, 2025 by mjbommar Loading…
Previous Next
ProTip! Updated in the last three days: updated:>2025-12-14.