Skip to content

Conversation

@itazap
Copy link
Contributor

@itazap itazap commented Jan 27, 2026

Following transformers v5, we no longer have "slow" tokenizers that use a Trie - by default we use fast tokenizers. This script assumes always slow, so it is updated to work with fast!

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@itazap
Copy link
Contributor Author

itazap commented Jan 27, 2026

@sayakpaul here is the fix around the update_trie() issue you were facing. in v5 transformers we only kept "fast" tokenizers (which don't have update_trie()) --> so we can pin v5 for transformers to address the tests ! 😊

@yiyixuxu yiyixuxu requested a review from sayakpaul January 27, 2026 18:12
tokenizer._update_trie()
# set correct total vocab size after removing tokens
tokenizer._update_total_vocab_size()
# Fast tokenizers: serialize, filter tokens, reload
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it work with transformers (< v5) as well?

If not, maybe we could keep maintaining two code paths? One for v5 and another one for < v5? This way, in the next release cycle, we can pin transformers ver to >=5.0.0.

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good! I added it back

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this work! I left one comment regarding versioning. LMK what you think.

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!

@sayakpaul
Copy link
Member

@bot /style

@github-actions
Copy link
Contributor

github-actions bot commented Jan 28, 2026

Style bot fixed some files and pushed the changes.

@sayakpaul sayakpaul merged commit 2ac39ba into huggingface:main Jan 28, 2026
9 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants