[PyTorch] Introduce quantizer roles by negvet · Pull Request #2620 · NVIDIA/TransformerEngine

negvet · 2026-01-23T15:31:22Z

Description

Introducing QuantizerRole

@dataclasses.dataclass(frozen=True)
class QuantizerRole:
    module_type: str = ""   # e.g. "linear", "grouped_linear", "dpa"
    tensor_type: str = ""   # e.g. "input", "weight", "grad_output", "qkv", "s"
    name: str = ""          # instance name, e.g. "qkv", "proj", "fc1", "fc2"

This is an API that allows to go down to "set this LayerNormLinear in this transformer layer to be less aggressively quantized." (fine-grained, per-module/per-tensor quantization control mechanism)

Quantizer factory uses roles to dispatch according to its needs.

TE module/op emits a list of QuantizerRole:

Linear, LayerNormLinear, LayerNormMLP emit module_type="linear" with tensor_type in {"input", "weight", "grad_output"}.
GroupedLinear emits module_type="grouped_linear".

CustomRecipe accepts a qfactory callable that receives QuantizerRole and returns a quantizer.

Factories can be composed - e.g., dispatch (to different sub-factories as an option) based on module_type (dpa vs linear) and then refine based on tensor_type.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Please list the changes introduced in this PR:

Change A
Change B

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

…ipe state Signed-off-by: Evgeny <etsykunov@nvidia.com>

Signed-off-by: Evgeny <etsykunov@nvidia.com>

for more information, see https://pre-commit.ci

greptile-apps · 2026-01-23T15:34:41Z

Greptile Summary

This PR introduces QuantizerRole, a frozen dataclass that enables fine-grained, per-module/per-tensor quantization control through the new CustomRecipe API. The role object encapsulates three optional fields: module_type (e.g., "linear", "grouped_linear"), tensor_type (e.g., "input", "weight", "grad_output"), and name (instance identifier).

Key changes:

Added QuantizerRole dataclass in quantization.py with module_type, tensor_type, and name fields
Introduced CustomRecipe class accepting a qfactory callable that receives QuantizerRole and returns quantizer instances
Implemented get_quantizer_roles() method across all TE modules (Linear, GroupedLinear, LayerNormLinear, LayerNormMLP, attention modules) and fusible operations
Created CustomRecipeState to handle role-based quantizer dispatch
Added factory examples in quantization_recipes_base.py that mirror built-in recipes (current_scaling_quantizer_factory, mxfp8_quantizer_factory, float8_block_scaling_quantizer_factory, nvfp4_quantizer_factory)
Provided advanced example in quantization_factory_examples.py showing mixed quantization (NVFP4 for Linear, MXFP8 for GroupedLinear)
Comprehensive tests validate factory equivalence with built-in recipes and role-based dispatch behavior
Module files renamed for clarity: quantization_nvfp4.py → quantization_ref_nvfp4.py, quantization_current_scaling.py → quantization_ref_current_scaling.py

The implementation allows users to create custom quantization strategies like "use NVFP4 for all linear layers except attention projection layers, which should use MXFP8" by inspecting role fields in the factory function. The API is marked as experimental with appropriate warnings.

Confidence Score: 5/5

This PR is safe to merge with minimal risk
The implementation is well-designed, thoroughly tested, and maintains backward compatibility. The API is marked as experimental with appropriate warnings. All changes follow consistent patterns across modules, and comprehensive tests validate equivalence with built-in recipes.
No files require special attention

Important Files Changed

Filename	Overview
transformer_engine/common/recipe/init.py	Added `CustomRecipe` class with `qfactory` callable parameter for fine-grained quantization control
transformer_engine/pytorch/quantization.py	Introduced `QuantizerRole` dataclass and `CustomRecipeState` for role-based quantizer dispatch
transformer_engine/pytorch/module/base.py	Added `get_quantizer_roles()` method and role properties for per-module quantizer configuration
transformer_engine/pytorch/custom_recipes/quantization_recipes_base.py	New file with factory functions mirroring built-in recipes (current scaling, MXFP8, FP8 block, NVFP4)
transformer_engine/pytorch/custom_recipes/quantization_factory_examples.py	Added example factory showing mixed quantization (NVFP4 for Linear, MXFP8 for GroupedLinear)
tests/pytorch/test_custom_recipe.py	Comprehensive tests covering role-based dispatch, factory equivalence, and multiple module types
tests/pytorch/distributed/run_numerics_exact.py	Updated factory to use `QuantizerRole` object with `role.tensor_type` instead of string parsing

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[User creates CustomRecipe with qfactory] --> B[autocast context with recipe]
    B --> C[Module forward pass begins]
    C --> D[Module emits QuantizerRole objects]
    D --> E{CustomRecipe?}
    E -->|Yes| F[Call qfactory for each role]
    E -->|No| G[Use built-in recipe state]
    F --> H[QuantizerRole inspection]
    H --> I{Dispatch logic}
    I -->|module_type='linear'| J[Return NVFP4Quantizer]
    I -->|module_type='grouped_linear'| K[Return MXFP8Quantizer]
    I -->|tensor_type='grad_output'| L[Return E5M2 quantizer]
    I -->|Other roles| M[Return default quantizer]
    J --> N[Quantizer used for tensor operations]
    K --> N
    L --> N
    M --> N
    G --> N
    N --> O[Forward/backward computation]
    
    style A fill:#e1f5ff
    style F fill:#fff4e1
    style H fill:#ffe1f5
    style N fill:#e1ffe1

_{Last reviewed commit: 41656ab}

Signed-off-by: Evgeny <etsykunov@nvidia.com>

timmoon10

Overall this design is quite clean and generalizable.

transformer_engine/pytorch/quantization.py

transformer_engine/pytorch/custom_recipes/quantization_nvfp4.py

timmoon10 · 2026-02-20T02:59:37Z

transformer_engine/pytorch/module/linear.py

+            base = [
+                QuantizerRole(module_type="linear", tensor_type="input", name=name),
+                QuantizerRole(module_type="linear", tensor_type="weight", name=name),
+                QuantizerRole(module_type="linear", tensor_type="output", name=name),
+            ]
+        else:
+            base = [
+                QuantizerRole(module_type="linear", tensor_type="grad_output", name=name),
+                QuantizerRole(module_type="linear", tensor_type="grad_input", name=name),
+            ]


"output" and "grad_input" roles don't make sense. In reality, we are implicitly assuming that the tensor will be consumed by another linear-like layer.

Suggested change

base = [

QuantizerRole(module_type="linear", tensor_type="input", name=name),

QuantizerRole(module_type="linear", tensor_type="weight", name=name),

QuantizerRole(module_type="linear", tensor_type="output", name=name),

]

else:

base = [

QuantizerRole(module_type="linear", tensor_type="grad_output", name=name),

QuantizerRole(module_type="linear", tensor_type="grad_input", name=name),

]

base = [

QuantizerRole(module_type="linear", tensor_type="input", name=name),

QuantizerRole(module_type="linear", tensor_type="weight", name=name),

QuantizerRole(module_type="linear", tensor_type="input", name=name),

]

else:

base = [

QuantizerRole(module_type="linear", tensor_type="grad_output", name=name),

QuantizerRole(module_type="linear", tensor_type="grad_output", name=name),

]

Alternatively, if we want to use the output in FP8 DPA, the right role would be module_type="dpa" and module_type="input". We should probably make this configurable. I kind of like that this design is exposing the hidden assumptions we've been making.

I agree about "output" and "grad_input" roles. Setting roles for those slots to None (the safest) and enabling the configuration. Also configured it in MHA.

timmoon10 · 2026-02-20T03:10:24Z

tests/pytorch/test_custom_recipe.py

+    assert counts["input"] == 1
+    assert counts["weight"] == 1
+    assert counts["output"] == 1
+    assert counts["grad_output"] == 1
+    assert counts["grad_input"] == 1


Suggested change

assert counts["input"] == 1

assert counts["weight"] == 1

assert counts["output"] == 1

assert counts["grad_output"] == 1

assert counts["grad_input"] == 1

assert counts["input"] == 2

assert counts["weight"] == 1

assert counts["output"] == 0

assert counts["grad_output"] == 2

assert counts["grad_input"] == 0

Signed-off-by: Evgeny Tsykunov <etsykunov@nvidia.com>

for more information, see https://pre-commit.ci

Signed-off-by: Evgeny <etsykunov@nvidia.com>

for more information, see https://pre-commit.ci

greptile-apps

_{15 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

timmoon10 · 2026-02-20T22:59:33Z

transformer_engine/pytorch/quantization.py

+    def is_gemm(self) -> bool:
+        """Whether this role belongs to a GEMM-based module."""
+        return self.module_type in self.GEMM_MODULE_TYPES
+


I think this is baking in assumptions about what formats are similar (our recent experiences with grouped tensors makes me wonder if the requirements for "linear" and "grouped_linear" will diverge in the future), and it's also not giving us that much convenience.

Suggested change

def is_gemm(self) -> bool:

"""Whether this role belongs to a GEMM-based module."""

return self.module_type in self.GEMM_MODULE_TYPES

Sure, removed

Signed-off-by: Evgeny <etsykunov@nvidia.com>

for more information, see https://pre-commit.ci

greptile-apps

_{17 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

Signed-off-by: Evgeny <etsykunov@gmail.com>

for more information, see https://pre-commit.ci

greptile-apps

_{24 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps

_{24 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

Signed-off-by: Evgeny <etsykunov@gmail.com>

for more information, see https://pre-commit.ci

greptile-apps

_{24 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

negvet and others added 4 commits January 23, 2026 15:14

Enable semantic roles emitted by module/op and comsumed by custom rec…

cd8b8ad

…ipe state Signed-off-by: Evgeny <etsykunov@nvidia.com>

Update quantization factories

fddeba4

Signed-off-by: Evgeny <etsykunov@nvidia.com>

Fix tests

82b84ff

Signed-off-by: Evgeny <etsykunov@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

4346231

for more information, see https://pre-commit.ci

negvet requested review from cyanguwa and timmoon10 January 23, 2026 15:32

This comment was marked as off-topic.

Sign in to view

negvet added 2 commits January 27, 2026 10:57

Swap tensor:module

a81f54a

Signed-off-by: Evgeny <etsykunov@nvidia.com>

Better naming

700ea04

Signed-off-by: Evgeny <etsykunov@nvidia.com>

This comment was marked as outdated.

Sign in to view

Introduce QuantizerRole frozen data class instead of a string

d7ca20b

Signed-off-by: Evgeny <etsykunov@nvidia.com>

This comment was marked as resolved.

Sign in to view

Shrink module_type vocabulary

ed59556

Signed-off-by: Evgeny <etsykunov@nvidia.com>

This comment was marked as resolved.

Sign in to view

timmoon10 reviewed Feb 20, 2026

View reviewed changes

negvet and others added 2 commits February 20, 2026 14:31

Merge branch 'main' into semantic_quantizer_roles

ade46a6

Signed-off-by: Evgeny Tsykunov <etsykunov@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

b1a4aed

for more information, see https://pre-commit.ci

This comment was marked as resolved.

Sign in to view

negvet and others added 5 commits February 20, 2026 15:05

Fix numerics exact test

6e1ee37

Signed-off-by: Evgeny <etsykunov@nvidia.com>

Set defaults, make custom recipe forward compatible

b9753f2

Signed-off-by: Evgeny <etsykunov@nvidia.com>

remove position from QuantizerRole

ad67247

Signed-off-by: Evgeny <etsykunov@nvidia.com>

Set good defaults

e6be76a

Signed-off-by: Evgeny <etsykunov@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

a86fdad

for more information, see https://pre-commit.ci

greptile-apps bot reviewed Feb 20, 2026

View reviewed changes

timmoon10 reviewed Feb 20, 2026

View reviewed changes

negvet mentioned this pull request Feb 23, 2026

Add NVTE_BACKWARD_MODE=default|unquant|dequant #2644

Open

13 tasks

negvet added 3 commits February 24, 2026 15:13

Resolve naming: make every module/op distinguishable via name

d323f66

Signed-off-by: Evgeny <etsykunov@nvidia.com>

Configure output/grad_input roles, defaults to None

c9eae0f

Signed-off-by: Evgeny <etsykunov@nvidia.com>

Remove is_gemm()

ea3c135

Signed-off-by: Evgeny <etsykunov@nvidia.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

aaf980f

for more information, see https://pre-commit.ci

greptile-apps bot reviewed Feb 24, 2026

View reviewed changes

negvet changed the title ~~[PyTorch] Introduce semantic quantizer roles~~ [PyTorch] Introduce quantizer roles Feb 25, 2026

Evgeny and others added 3 commits February 25, 2026 14:30

Enable base recipes via CustomRecipe and quantization factories

aad3512

Signed-off-by: Evgeny <etsykunov@gmail.com>

Add factory example - NVFP4 for Linear, MXFP8 for GroupedLinear

8d7c91f

Signed-off-by: Evgeny <etsykunov@gmail.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

736cd72

for more information, see https://pre-commit.ci

greptile-apps bot reviewed Feb 25, 2026

View reviewed changes

Merge branch 'main' into semantic_quantizer_roles

ddf727c

greptile-apps bot reviewed Feb 25, 2026

View reviewed changes

Evgeny and others added 2 commits February 25, 2026 16:43

Fix custom recipe test

b6bfdf8

Signed-off-by: Evgeny <etsykunov@gmail.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

41656ab

for more information, see https://pre-commit.ci

negvet requested review from ptrendx and timmoon10 February 25, 2026 16:45

greptile-apps bot reviewed Feb 25, 2026

View reviewed changes

	def is_gemm(self) -> bool:
	"""Whether this role belongs to a GEMM-based module."""
	return self.module_type in self.GEMM_MODULE_TYPES

Conversation

negvet commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Changes

Checklist:

Uh oh!

greptile-apps bot commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

This comment was marked as off-topic.

This comment was marked as outdated.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

timmoon10 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

timmoon10 Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

negvet Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

timmoon10 Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

timmoon10 Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

negvet Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

negvet commented Jan 23, 2026 •

edited

Loading

greptile-apps bot commented Jan 23, 2026 •

edited

Loading

timmoon10 Feb 20, 2026 •

edited

Loading

negvet Feb 25, 2026 •

edited

Loading