Add gpt-5.3 model support to TiktokenTokenizer#7579
Conversation
Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This pull request adds support for the GPT-5.3 model family to the TiktokenTokenizer, following the established pattern used for previous GPT model versions (gpt-5.2, gpt-5.1, etc.). The changes enable tokenization for both the base "gpt-5.3" model and its variants (e.g., "gpt-5.3-mini") using the O200kBase encoding.
Changes:
- Added gpt-5.3 model mappings to TiktokenTokenizer for both prefix and exact matching
- Added GPT5_3 static tokenizer property and comprehensive test coverage for the new model
- Included test cases for both base and mini variants in the test suite
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| src/Microsoft.ML.Tokenizers/Model/TiktokenTokenizer.cs | Added "gpt-5.3-" prefix mapping and "gpt-5.3" exact name mapping to model encoding arrays, both using O200kBase encoding |
| test/Microsoft.ML.Tokenizers.Tests/TiktokenTests.cs | Added GPT5_3 static property and test data entries for gpt-5.3 and gpt-5.3-mini variants in TestAllSupportedModelNames and TestCreationUsingModel methods |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #7579 +/- ##
=======================================
Coverage 69.05% 69.06%
=======================================
Files 1483 1483
Lines 274362 274365 +3
Branches 28270 28270
=======================================
+ Hits 189466 189482 +16
+ Misses 77510 77498 -12
+ Partials 7386 7385 -1
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
Adds support for the gpt-5.3 model family to TiktokenTokenizer using O200kBase encoding.
Changes
gpt-5.3andgpt-5.3-mappings to model encoding dictionariesgpt-5.3base and mini variantsUsage
Follows the same pattern established for gpt-5.2 support.
Warning
Firewall rules blocked me from connecting to one or more addresses (expand for details)
I tried to connect to the following addresses, but was blocked by firewall rules:
dnceng.pkgs.visualstudio.com/home/REDACTED/work/machinelearning/machinelearning/.dotnet/dotnet dotnet test test/Microsoft.ML.Tokenizers.Tests/Microsoft.ML.Tokenizers.Tests.csproj --filter FullyQualifiedName~TiktokenTests --no-build(dns block)openaipublic.blob.core.windows.net/home/REDACTED/work/machinelearning/machinelearning/.dotnet/dotnet /home/REDACTED/work/machinelearning/machinelearning/.dotnet/dotnet exec --runtimeconfig /home/REDACTED/work/machinelearning/machinelearning/artifacts/bin/Microsoft.ML.Tokenizers.Tests/Debug/net8.0/Microsoft.ML.Tokenizers.Tests.runtimeconfig.json --depsfile /home/REDACTED/work/machinelearning/machinelearning/artifacts/bin/Microsoft.ML.Tokenizers.Tests/Debug/net8.0/Microsoft.ML.Tokenizers.Tests.deps.json /home/REDACTED/work/machinelearning/machinelearning/artifacts/bin/Microsoft.ML.Tokenizers.Tests/Debug/net8.0/testhost.dll --port 45083 --endpoint 127.0.0.1:045083 --role client --parentprocessid 6859 --telemetryoptedin false(dns block)If you need me to access, download, or install something from one of these locations, you can either:
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.