Skip to content

Add Tokenizer custom token mapper support#2184

Merged
iffyio merged 1 commit intoapache:mainfrom
askalt:askalt/tokenizer-iterator
Feb 6, 2026
Merged

Add Tokenizer custom token mapper support#2184
iffyio merged 1 commit intoapache:mainfrom
askalt:askalt/tokenizer-iterator

Conversation

@askalt
Copy link
Contributor

@askalt askalt commented Jan 24, 2026

This patch adds a method to map tokens with provided mapper during
tokenization. This way tokens could be replaced without an additional
pass.

@askalt askalt force-pushed the askalt/tokenizer-iterator branch from 1af5611 to 2bd5c9c Compare January 26, 2026 07:16
Copy link

@novartole novartole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the patch! Please have a look at my suggestions.

@askalt askalt force-pushed the askalt/tokenizer-iterator branch 2 times, most recently from b4445d5 to 8324527 Compare January 28, 2026 06:56
Copy link

@novartole novartole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@askalt askalt force-pushed the askalt/tokenizer-iterator branch from 8324527 to 284f512 Compare February 2, 2026 07:13
@askalt
Copy link
Contributor Author

askalt commented Feb 2, 2026

@iffyio Could you review please?

@askalt askalt force-pushed the askalt/tokenizer-iterator branch from 284f512 to 141363b Compare February 4, 2026 07:50
@askalt askalt force-pushed the askalt/tokenizer-iterator branch 3 times, most recently from 589f832 to e83a0d4 Compare February 6, 2026 12:53
Copy link
Contributor

@iffyio iffyio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @askalt!

@iffyio iffyio changed the title add iterator over tokens in Tokenizer Add Tokenizer support to provide custom token mapper Feb 6, 2026
@iffyio iffyio changed the title Add Tokenizer support to provide custom token mapper Add Tokenizer custom token mapper support Feb 6, 2026
@iffyio
Copy link
Contributor

iffyio commented Feb 6, 2026

@askalt could you take a look at the conflicts on the branch when you get some time?

This patch adds a method to map tokens with provided mapper during
tokenization. This way tokens could be replaced without an additional
pass.
@askalt askalt force-pushed the askalt/tokenizer-iterator branch from e83a0d4 to cc288fe Compare February 6, 2026 16:12
@askalt
Copy link
Contributor Author

askalt commented Feb 6, 2026

@askalt could you take a look at the conflicts on the branch when you get some time?

Yep, done, thank you for review!

@iffyio iffyio added this pull request to the merge queue Feb 6, 2026
Merged via the queue into apache:main with commit 60abfec Feb 6, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants