Feat: Speculatice Decoding export with quantization support by h-guo18 · Pull Request #913 · NVIDIA/Model-Optimizer

h-guo18 · 2026-02-21T04:34:38Z

What does this PR do?

Type of change: ?

Overview:

Main changes:

Refactored speculative decoding export logics into class EagleExporter to improve cohesion;
Separated speculative decoding export entrance with quantization export (export_hf_checkpoint()) due to their fundamental differences:
- Quantization export base model's state_dict and config, while speculative decoding only export drafter's.
- Most of the model-specific logics of quantization export (e.g. diffusers, vlms) are not needed for speculative decoding export.
- Quantization export produce different format than speculative decoding checkpoint. (The former produce tokenizer config, generation config, e.t.c, while the later does not need. )

Usage

To export an regular bf16 eagle checkpoint without quantization, the commands are the same:

python scripts/export_hf_checkpoint.py --model_path <x> --export_path <x>

To run PTQ on online-trained eagle checkpoint and export it:

python hf_ptq.py --pyt_ckpt_path <x> --qformat fp8 --export_path <x>

The above two commands will produce drafter ckpt for deployment, in the same foramt.

Testing

Tested setting:

Base model: llama3.1-8b
Algorithms: eagle
Export path tested:
- (Unquantized online ckpt) python scripts/export_hf_checkpoint.py --model_path <x> --export_path <x>
- (PTQ) export python hf_ptq.py --pyt_ckpt_path <x> --qformat fp8 --export_path <x>
Tested deployment on vllm. Got normal AR.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes/No
Did you write any new necessary tests?: Yes/No
Did you add or update any necessary documentation?: Yes/No
Did you update Changelog?: Yes/No

Additional Information

copy-pr-bot · 2026-02-21T04:34:41Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-02-21T04:34:44Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

This pull request introduces a class-based refactoring of the speculative decoding export pipeline, replacing legacy procedural functions with EagleExporter and EagleMedusaExporter classes. A new public export_speculative_decoding API exports spec-optimized models independently, with updated key naming schemes, configuration templates, and early-exit integration points in the HF export flow.

Changes

Cohort / File(s)	Summary
Speculative Decoding Export Infrastructure `modelopt/torch/export/plugins/hf_spec_configs.py`, `modelopt/torch/export/plugins/hf_spec_export.py`, `modelopt/torch/export/unified_export_hf.py`	Introduces EagleExporter and EagleMedusaExporter classes replacing legacy functions. Adds config templates (llama_eagle_template_config, kimik2_eagle_template_config), new public export_speculative_decoding API, and helper functions has_spec_opt and has_quant_opt. Refactors key naming to layer-based format and removes older monolithic export logic.
Integration Points `modelopt/torch/speculative/plugins/transformers.py`, `examples/llm_ptq/hf_ptq.py`	Adds get_exporter() method and _draft_model_config property to HFEagleModel. Integrates early-exit logic in export_quantized to detect and route spec-optimized models to export_speculative_decoding.
Example Update `examples/speculative_decoding/scripts/export_hf_checkpoint.py`	Updates API usage from export_hf_checkpoint to export_speculative_decoding.

Sequence Diagram

sequenceDiagram
    participant Export as Export Flow
    participant Check as has_spec_opt()
    participant Exporter as EagleExporter/<br/>EagleMedusaExporter
    participant SaveState as Save State Dict
    participant SaveConfig as Save Config

    Export->>Check: Check if spec-optimized
    alt Spec-Optimized Model
        Check-->>Export: True
        Export->>Exporter: Create exporter instance
        Exporter->>Exporter: extract_state_dict()
        Exporter-->>SaveState: Filtered state dict
        SaveState->>SaveState: model.safetensors
        Exporter->>Exporter: export_config()
        Exporter-->>SaveConfig: Resolved config
        SaveConfig->>SaveConfig: config.json
    else Standard Model
        Check-->>Export: False
        Export->>Export: Continue standard export
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title is partially related to the changeset. It mentions 'Speculative Decoding export with quantization support,' which is indeed a primary feature added. However, there is a typo: 'Speculatice' should be 'Speculative.' Despite this minor typo, the title accurately describes the main objective.
Docstring Coverage	✅ Passed	Docstring coverage is 84.21% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch haoguo/eagle-export

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-02-21T04:47:20Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.10%. Comparing base (9e23c6c) to head (bf1c486).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #913   +/-   ##
=======================================
  Coverage   73.10%   73.10%           
=======================================
  Files         205      205           
  Lines       22281    22281           
=======================================
  Hits        16288    16288           
  Misses       5993     5993

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (1)

modelopt/torch/export/plugins/hf_spec_export.py (1)
185-214: Validation bypass is documented as temporary.

The _check_valid_sd = lambda *args, **kwargs: None on line 194 effectively disables state dict validation for parallel draft exports. The NOTE: tmp: comment indicates this is intentional but temporary.

Consider tracking this with a TODO or issue reference to ensure validation is properly implemented for parallel draft exports before the feature is considered stable.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/export/plugins/hf_spec_export.py` around lines 185 - 214, The
code currently disables state-dict validation by setting self._check_valid_sd =
lambda *args, **kwargs: None in the EagleMedusaExporter __init__, which is
marked only as a temporary NOTE; replace this silent bypass with a tracked TODO
and a visible reminder: restore validation by implementing proper checks for
parallel_draft_step in extract_state_dict and call the original
EagleExporter._check_valid_sd (or raise/log a clear warning/error) until full
validation is implemented; specifically update the EagleMedusaExporter class to
remove the no-op lambda, add a TODO/issue-ID comment referencing the missing
validation work, and ensure any call sites (e.g., extract_state_dict) invoke the
proper _check_valid_sd behavior so state-dict validation is not permanently
skipped.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/llm_ptq/hf_ptq.py`:
- Around line 571-574: The early return after calling has_spec_opt(full_model)
and export_speculative_decoding(full_model, export_dir=export_path) skips the
subsequent tokenizer save and timing/export message; update the
speculative-decoding branch so it either (a) calls the same tokenizer save
routine (e.g., tokenizer.save_pretrained or the existing tokenizer save logic)
and prints the export/timing confirmation before returning, or (b) moves the
return to after those steps, and if skipping is intentional add a concise
comment explaining why; reference has_spec_opt, export_speculative_decoding,
full_model and export_path so the change is applied to the correct branch.

In `@modelopt/torch/export/plugins/hf_spec_export.py`:
- Around line 180-182: Fix the typo in the docstring of export_quant_config:
change "hf_quant_coinfig.json" to "hf_quant_config.json" in the docstring for
the function export_quant_config which returns copy(self.hf_quant_config).
- Around line 144-178: In export_config, using copy(template_config) creates
only a shallow copy so nested dicts (e.g., eagle config data) are mutated on
assignment; replace the shallow copy with a deep copy (use copy.deepcopy) when
copying the selected template (referencing template_config,
llama_eagle_template_config, kimik2_eagle_template_config in the export_config
method) so modifications to nested keys do not alter the original imported
templates across multiple calls; ensure the copy module's deepcopy is
imported/used accordingly.

In `@modelopt/torch/export/unified_export_hf.py`:
- Around line 994-996: The comment above the state-dict export is incorrect:
change the misleading "Export config.json" comment that precedes the lines using
exporter.extract_state_dict(), drafter_sd, and save_file(...,
"model.safetensors") to accurately describe exporting the model state dict
(e.g., "Export model state dict to model.safetensors"), leaving the actual
config.json export block (using save_file for config.json) unchanged.

---

Nitpick comments:
In `@modelopt/torch/export/plugins/hf_spec_export.py`:
- Around line 185-214: The code currently disables state-dict validation by
setting self._check_valid_sd = lambda *args, **kwargs: None in the
EagleMedusaExporter __init__, which is marked only as a temporary NOTE; replace
this silent bypass with a tracked TODO and a visible reminder: restore
validation by implementing proper checks for parallel_draft_step in
extract_state_dict and call the original EagleExporter._check_valid_sd (or
raise/log a clear warning/error) until full validation is implemented;
specifically update the EagleMedusaExporter class to remove the no-op lambda,
add a TODO/issue-ID comment referencing the missing validation work, and ensure
any call sites (e.g., extract_state_dict) invoke the proper _check_valid_sd
behavior so state-dict validation is not permanently skipped.

examples/llm_ptq/hf_ptq.py

modelopt/torch/export/plugins/hf_spec_export.py

modelopt/torch/export/unified_export_hf.py

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

h-guo18 changed the title ~~Feat: quantized eagle export~~ Feat: Eagle export with quantization support Feb 21, 2026

h-guo18 changed the title ~~Feat: Eagle export with quantization support~~ Feat: Speculatice Decoding export with quantization support Feb 21, 2026

h-guo18 added 2 commits February 21, 2026 18:35

refactor eagle export with quant support

2a7fb3a

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

refactor export

1b73de3

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

h-guo18 force-pushed the haoguo/eagle-export branch from d9926e9 to 1b73de3 Compare February 21, 2026 18:36

h-guo18 marked this pull request as ready for review February 21, 2026 18:37

h-guo18 requested review from a team as code owners February 21, 2026 18:37

h-guo18 requested review from ChenhanYu and cjluo-nv February 21, 2026 18:37

h-guo18 marked this pull request as draft February 21, 2026 18:37

coderabbitai bot reviewed Feb 21, 2026

View reviewed changes

examples/llm_ptq/hf_ptq.py Show resolved Hide resolved

modelopt/torch/export/plugins/hf_spec_export.py Show resolved Hide resolved

modelopt/torch/export/plugins/hf_spec_export.py Outdated Show resolved Hide resolved

modelopt/torch/export/unified_export_hf.py Outdated Show resolved Hide resolved

fix test; address coderabbit comments

bf1c486

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

h-guo18 marked this pull request as ready for review February 21, 2026 19:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Feat: Speculatice Decoding export with quantization support#913

Feat: Speculatice Decoding export with quantization support#913
h-guo18 wants to merge 3 commits intomainfrom
haoguo/eagle-export

h-guo18 commented Feb 21, 2026 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Feb 21, 2026

Uh oh!

coderabbitai bot commented Feb 21, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Uh oh!

codecov bot commented Feb 21, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

h-guo18 commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Uh oh!

copy-pr-bot bot commented Feb 21, 2026

Uh oh!

coderabbitai bot commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Uh oh!

codecov bot commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

h-guo18 commented Feb 21, 2026 •

edited

Loading

coderabbitai bot commented Feb 21, 2026 •

edited

Loading

codecov bot commented Feb 21, 2026 •

edited

Loading