-
Notifications
You must be signed in to change notification settings - Fork 442
feat: support qwen3 next with muon #2875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
RissyRan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! A few minor comments. Thank you for the change!
| model_name_arg = sys.argv[1] | ||
| scan_layers_arg = sys.argv[2].lower() == "true" | ||
| get_model_mdn(model_name_arg, scan_layers_arg, verbose=True) | ||
| get_model_mdn(model_name_arg, scan_layers_arg, verbose=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: extra line, similar for other files.
| An instance of `MuonDimensionNumbers` if a specific mapping is found, | ||
| `None` for excluded parameters, or a default `mdn` for standard weights. | ||
| 1. Exclusions: Skip vectors/biases/embeddings (AdamW). | ||
| 2. MoE: Handle both DeepSeek style (MoeBlock_0) and Qwen3-Next style (routed_experts). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: quote the checkpoint name MoE: Handle both DeepSeek style ("MoeBlock_0") and Qwen3-Next style ("routed_experts").
| return any(x in path for x in tuples) | ||
|
|
||
|
|
||
| def transform_logic(path: Tuple[str, ...]) -> Optional[mdn]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shuningjin Could we have a refactor PR to pass in model name as follow up? I have seen checkpoint name divergence. It will be better we transform weights based on model, instead of checkpoint path.
RissyRan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you pass the screenshot or log, to train Qwen3 with Muon optimizer end-to-end?
Description
support qwen3-next use muon optimizer
Tests
tests/muon_test.py

Checklist
Before submitting this PR, please make sure (put X in square brackets):
gemini-reviewlabel.