Skip to content

Comments

Feat: Speculatice Decoding export with quantization support#913

Open
h-guo18 wants to merge 3 commits intomainfrom
haoguo/eagle-export
Open

Feat: Speculatice Decoding export with quantization support#913
h-guo18 wants to merge 3 commits intomainfrom
haoguo/eagle-export

Conversation

@h-guo18
Copy link
Contributor

@h-guo18 h-guo18 commented Feb 21, 2026

What does this PR do?

Type of change: ?

Overview:

Main changes:

  • Refactored speculative decoding export logics into class EagleExporter to improve cohesion;

  • Separated speculative decoding export entrance with quantization export (export_hf_checkpoint()) due to their fundamental differences:

    • Quantization export base model's state_dict and config, while speculative decoding only export drafter's.
    • Most of the model-specific logics of quantization export (e.g. diffusers, vlms) are not needed for speculative decoding export.
    • Quantization export produce different format than speculative decoding checkpoint. (The former produce tokenizer config, generation config, e.t.c, while the later does not need. )

Usage

To export an regular bf16 eagle checkpoint without quantization, the commands are the same:

python scripts/export_hf_checkpoint.py --model_path <x> --export_path <x>

To run PTQ on online-trained eagle checkpoint and export it:

python hf_ptq.py --pyt_ckpt_path <x> --qformat fp8 --export_path <x>

The above two commands will produce drafter ckpt for deployment, in the same foramt.

Testing

Tested setting:

  • Base model: llama3.1-8b
  • Algorithms: eagle
  • Export path tested:
    • (Unquantized online ckpt) python scripts/export_hf_checkpoint.py --model_path <x> --export_path <x>
    • (PTQ) export python hf_ptq.py --pyt_ckpt_path <x> --qformat fp8 --export_path <x>
  • Tested deployment on vllm. Got normal AR.

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes/No
  • Did you write any new necessary tests?: Yes/No
  • Did you add or update any necessary documentation?: Yes/No
  • Did you update Changelog?: Yes/No

Additional Information

@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 21, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 21, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This pull request introduces a class-based refactoring of the speculative decoding export pipeline, replacing legacy procedural functions with EagleExporter and EagleMedusaExporter classes. A new public export_speculative_decoding API exports spec-optimized models independently, with updated key naming schemes, configuration templates, and early-exit integration points in the HF export flow.

Changes

Cohort / File(s) Summary
Speculative Decoding Export Infrastructure
modelopt/torch/export/plugins/hf_spec_configs.py, modelopt/torch/export/plugins/hf_spec_export.py, modelopt/torch/export/unified_export_hf.py
Introduces EagleExporter and EagleMedusaExporter classes replacing legacy functions. Adds config templates (llama_eagle_template_config, kimik2_eagle_template_config), new public export_speculative_decoding API, and helper functions has_spec_opt and has_quant_opt. Refactors key naming to layer-based format and removes older monolithic export logic.
Integration Points
modelopt/torch/speculative/plugins/transformers.py, examples/llm_ptq/hf_ptq.py
Adds get_exporter() method and _draft_model_config property to HFEagleModel. Integrates early-exit logic in export_quantized to detect and route spec-optimized models to export_speculative_decoding.
Example Update
examples/speculative_decoding/scripts/export_hf_checkpoint.py
Updates API usage from export_hf_checkpoint to export_speculative_decoding.

Sequence Diagram

sequenceDiagram
    participant Export as Export Flow
    participant Check as has_spec_opt()
    participant Exporter as EagleExporter/<br/>EagleMedusaExporter
    participant SaveState as Save State Dict
    participant SaveConfig as Save Config

    Export->>Check: Check if spec-optimized
    alt Spec-Optimized Model
        Check-->>Export: True
        Export->>Exporter: Create exporter instance
        Exporter->>Exporter: extract_state_dict()
        Exporter-->>SaveState: Filtered state dict
        SaveState->>SaveState: model.safetensors
        Exporter->>Exporter: export_config()
        Exporter-->>SaveConfig: Resolved config
        SaveConfig->>SaveConfig: config.json
    else Standard Model
        Check-->>Export: False
        Export->>Export: Continue standard export
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title is partially related to the changeset. It mentions 'Speculative Decoding export with quantization support,' which is indeed a primary feature added. However, there is a typo: 'Speculatice' should be 'Speculative.' Despite this minor typo, the title accurately describes the main objective.
Docstring Coverage ✅ Passed Docstring coverage is 84.21% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch haoguo/eagle-export

Comment @coderabbitai help to get the list of available commands and usage tips.

@h-guo18 h-guo18 changed the title Feat: quantized eagle export Feat: Eagle export with quantization support Feb 21, 2026
@codecov
Copy link

codecov bot commented Feb 21, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.10%. Comparing base (9e23c6c) to head (bf1c486).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #913   +/-   ##
=======================================
  Coverage   73.10%   73.10%           
=======================================
  Files         205      205           
  Lines       22281    22281           
=======================================
  Hits        16288    16288           
  Misses       5993     5993           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@h-guo18 h-guo18 changed the title Feat: Eagle export with quantization support Feat: Speculatice Decoding export with quantization support Feb 21, 2026
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
@h-guo18 h-guo18 force-pushed the haoguo/eagle-export branch from d9926e9 to 1b73de3 Compare February 21, 2026 18:36
@h-guo18 h-guo18 marked this pull request as ready for review February 21, 2026 18:37
@h-guo18 h-guo18 requested review from a team as code owners February 21, 2026 18:37
@h-guo18 h-guo18 marked this pull request as draft February 21, 2026 18:37
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
modelopt/torch/export/plugins/hf_spec_export.py (1)

185-214: Validation bypass is documented as temporary.

The _check_valid_sd = lambda *args, **kwargs: None on line 194 effectively disables state dict validation for parallel draft exports. The NOTE: tmp: comment indicates this is intentional but temporary.

Consider tracking this with a TODO or issue reference to ensure validation is properly implemented for parallel draft exports before the feature is considered stable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/export/plugins/hf_spec_export.py` around lines 185 - 214, The
code currently disables state-dict validation by setting self._check_valid_sd =
lambda *args, **kwargs: None in the EagleMedusaExporter __init__, which is
marked only as a temporary NOTE; replace this silent bypass with a tracked TODO
and a visible reminder: restore validation by implementing proper checks for
parallel_draft_step in extract_state_dict and call the original
EagleExporter._check_valid_sd (or raise/log a clear warning/error) until full
validation is implemented; specifically update the EagleMedusaExporter class to
remove the no-op lambda, add a TODO/issue-ID comment referencing the missing
validation work, and ensure any call sites (e.g., extract_state_dict) invoke the
proper _check_valid_sd behavior so state-dict validation is not permanently
skipped.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/llm_ptq/hf_ptq.py`:
- Around line 571-574: The early return after calling has_spec_opt(full_model)
and export_speculative_decoding(full_model, export_dir=export_path) skips the
subsequent tokenizer save and timing/export message; update the
speculative-decoding branch so it either (a) calls the same tokenizer save
routine (e.g., tokenizer.save_pretrained or the existing tokenizer save logic)
and prints the export/timing confirmation before returning, or (b) moves the
return to after those steps, and if skipping is intentional add a concise
comment explaining why; reference has_spec_opt, export_speculative_decoding,
full_model and export_path so the change is applied to the correct branch.

In `@modelopt/torch/export/plugins/hf_spec_export.py`:
- Around line 180-182: Fix the typo in the docstring of export_quant_config:
change "hf_quant_coinfig.json" to "hf_quant_config.json" in the docstring for
the function export_quant_config which returns copy(self.hf_quant_config).
- Around line 144-178: In export_config, using copy(template_config) creates
only a shallow copy so nested dicts (e.g., eagle config data) are mutated on
assignment; replace the shallow copy with a deep copy (use copy.deepcopy) when
copying the selected template (referencing template_config,
llama_eagle_template_config, kimik2_eagle_template_config in the export_config
method) so modifications to nested keys do not alter the original imported
templates across multiple calls; ensure the copy module's deepcopy is
imported/used accordingly.

In `@modelopt/torch/export/unified_export_hf.py`:
- Around line 994-996: The comment above the state-dict export is incorrect:
change the misleading "Export config.json" comment that precedes the lines using
exporter.extract_state_dict(), drafter_sd, and save_file(...,
"model.safetensors") to accurately describe exporting the model state dict
(e.g., "Export model state dict to model.safetensors"), leaving the actual
config.json export block (using save_file for config.json) unchanged.

---

Nitpick comments:
In `@modelopt/torch/export/plugins/hf_spec_export.py`:
- Around line 185-214: The code currently disables state-dict validation by
setting self._check_valid_sd = lambda *args, **kwargs: None in the
EagleMedusaExporter __init__, which is marked only as a temporary NOTE; replace
this silent bypass with a tracked TODO and a visible reminder: restore
validation by implementing proper checks for parallel_draft_step in
extract_state_dict and call the original EagleExporter._check_valid_sd (or
raise/log a clear warning/error) until full validation is implemented;
specifically update the EagleMedusaExporter class to remove the no-op lambda,
add a TODO/issue-ID comment referencing the missing validation work, and ensure
any call sites (e.g., extract_state_dict) invoke the proper _check_valid_sd
behavior so state-dict validation is not permanently skipped.

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
@h-guo18 h-guo18 marked this pull request as ready for review February 21, 2026 19:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant