Skip to content

Add gpt-5.3 model support to TiktokenTokenizer#7579

Merged
tarekgh merged 2 commits intomainfrom
copilot/add-gpt-5-3-support
Feb 13, 2026
Merged

Add gpt-5.3 model support to TiktokenTokenizer#7579
tarekgh merged 2 commits intomainfrom
copilot/add-gpt-5-3-support

Conversation

Copy link
Contributor

Copilot AI commented Feb 12, 2026

Adds support for the gpt-5.3 model family to TiktokenTokenizer using O200kBase encoding.

Changes

  • TiktokenTokenizer.cs: Added gpt-5.3 and gpt-5.3- mappings to model encoding dictionaries
  • TiktokenTests.cs: Added test coverage for gpt-5.3 base and mini variants

Usage

var tokenizer = TiktokenTokenizer.CreateForModel("gpt-5.3");
var tokens = tokenizer.EncodeToIds("Hello, world!");

// Also supports variants
var miniTokenizer = TiktokenTokenizer.CreateForModel("gpt-5.3-mini");

Follows the same pattern established for gpt-5.2 support.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • dnceng.pkgs.visualstudio.com
    • Triggering command: /home/REDACTED/work/machinelearning/machinelearning/.dotnet/dotnet dotnet test test/Microsoft.ML.Tokenizers.Tests/Microsoft.ML.Tokenizers.Tests.csproj --filter FullyQualifiedName~TiktokenTests --no-build (dns block)
  • openaipublic.blob.core.windows.net
    • Triggering command: /home/REDACTED/work/machinelearning/machinelearning/.dotnet/dotnet /home/REDACTED/work/machinelearning/machinelearning/.dotnet/dotnet exec --runtimeconfig /home/REDACTED/work/machinelearning/machinelearning/artifacts/bin/Microsoft.ML.Tokenizers.Tests/Debug/net8.0/Microsoft.ML.Tokenizers.Tests.runtimeconfig.json --depsfile /home/REDACTED/work/machinelearning/machinelearning/artifacts/bin/Microsoft.ML.Tokenizers.Tests/Debug/net8.0/Microsoft.ML.Tokenizers.Tests.deps.json /home/REDACTED/work/machinelearning/machinelearning/artifacts/bin/Microsoft.ML.Tokenizers.Tests/Debug/net8.0/testhost.dll --port 45083 --endpoint 127.0.0.1:045083 --role client --parentprocessid 6859 --telemetryoptedin false (dns block)

If you need me to access, download, or install something from one of these locations, you can either:


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Copilot AI changed the title [WIP] Add gpt-5.3 support similar to gpt-5.2 Add gpt-5.3 model support to TiktokenTokenizer Feb 12, 2026
Copilot AI requested a review from stephentoub February 12, 2026 22:12
@stephentoub stephentoub marked this pull request as ready for review February 12, 2026 22:13
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds support for the GPT-5.3 model family to the TiktokenTokenizer, following the established pattern used for previous GPT model versions (gpt-5.2, gpt-5.1, etc.). The changes enable tokenization for both the base "gpt-5.3" model and its variants (e.g., "gpt-5.3-mini") using the O200kBase encoding.

Changes:

  • Added gpt-5.3 model mappings to TiktokenTokenizer for both prefix and exact matching
  • Added GPT5_3 static tokenizer property and comprehensive test coverage for the new model
  • Included test cases for both base and mini variants in the test suite

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/Microsoft.ML.Tokenizers/Model/TiktokenTokenizer.cs Added "gpt-5.3-" prefix mapping and "gpt-5.3" exact name mapping to model encoding arrays, both using O200kBase encoding
test/Microsoft.ML.Tokenizers.Tests/TiktokenTests.cs Added GPT5_3 static property and test data entries for gpt-5.3 and gpt-5.3-mini variants in TestAllSupportedModelNames and TestCreationUsingModel methods

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Member

@tarekgh tarekgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@codecov
Copy link

codecov bot commented Feb 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 69.06%. Comparing base (3604580) to head (cff6f1e).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #7579   +/-   ##
=======================================
  Coverage   69.05%   69.06%           
=======================================
  Files        1483     1483           
  Lines      274362   274365    +3     
  Branches    28270    28270           
=======================================
+ Hits       189466   189482   +16     
+ Misses      77510    77498   -12     
+ Partials     7386     7385    -1     
Flag Coverage Δ
Debug 69.06% <100.00%> (+<0.01%) ⬆️
production 63.32% <100.00%> (+<0.01%) ⬆️
test 89.51% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...Microsoft.ML.Tokenizers/Model/TiktokenTokenizer.cs 80.00% <100.00%> (+0.04%) ⬆️
...est/Microsoft.ML.Tokenizers.Tests/TiktokenTests.cs 99.09% <100.00%> (+<0.01%) ⬆️

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants