Conversation
Move batching and shuffling logic from SFTConfig into iterator functions. train_sft now accepts Iterable[List[Trajectory]] instead of individual trajectories, simplifying the API and making batch management more explicit. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
| optimizer = self._state.trainer.optimizer | ||
|
|
||
| # Set SFT-specific hyperparameters | ||
| sft_weight_decay = 0.01 |
There was a problem hiding this comment.
Should we set this to 0 to be consistent with existing behavior in OpenPipe Platform?
There was a problem hiding this comment.
Yeah, good point! Changed.
|
|
||
| # Optimizer step at the end of each batch | ||
| optimizer.step() | ||
| optimizer.zero_grad() |
There was a problem hiding this comment.
Remove optimizer.zero_grad()
We have clear the grad before the training loop
| # Set SFT-specific hyperparameters | ||
| sft_weight_decay = 0.01 | ||
| for param_group in optimizer.param_groups: | ||
| param_group["weight_decay"] = sft_weight_decay |
There was a problem hiding this comment.
Should we set adam_beta1 and adam_beta2 as well?
There was a problem hiding this comment.
We change weight_decay because it is different for SFT and RL. But adam beta is the same, so there is no need to override it.
| total_trajectories = row_count * epochs | ||
| skip_trajectories = initial_step * batch_size | ||
|
|
||
| if skip_trajectories >= total_trajectories: |
There was a problem hiding this comment.
Remove skip_trajectories?
I don't think we support continuous training with our current design
There was a problem hiding this comment.
I just added the final_step suuport, and this way it should work with continuous training.
A user can break down training on a large dataset into several calls, and benchmark a checkpoint after each step.
It is just two arguments of the train_sft_from_file util function that help to correctly split the dataset and calculate LR rates.
| """Model-specific configuration for chat templates and training defaults.""" | ||
|
|
||
|
|
||
| def detect_chat_template_parts( |
There was a problem hiding this comment.
Let's also support
- Qwen 2.5
- Qwen 3 Instruct family?
docs/fundamentals/sft-training.mdx
Outdated
| from art.local import LocalBackend | ||
| # from art.serverless.backend import ServerlessBackend | ||
|
|
||
| TEACHER_MODEL = "qwen/qwen3-235b-a22b-2507" |
There was a problem hiding this comment.
Maybe GLM-5 here instead?
docs/fundamentals/sft-training.mdx
Outdated
| {"role": "user", "content": "Explain recursion with a simple example."}, | ||
| {"role": "assistant", "content": teacher_response}, | ||
| ], | ||
| reward=0.0, |
There was a problem hiding this comment.
Hmm, perhaps we should make reward an optional field in Trajectory if we want to use it for SFT as well. Doesn't make sense to ask developers to set it to 0 for no reason.
There was a problem hiding this comment.
I can set the default to 0.0. That keeps it required, but removes the need to specify it for SFT.
The main downside is that people might forget to set a reward for RL, so we should make sure it’s surfaced clearly.
There was a problem hiding this comment.
When using RULER, you don't want to set the reward manually either. As long as our docs are clear, should be fine to set a default.
| await model.register(backend) | ||
|
|
||
| # Phase 1: SFT warmup from a dataset | ||
| await train_sft_from_file( |
There was a problem hiding this comment.
What will model step be after train_sft_from_file completes?
There was a problem hiding this comment.
I think we decided to increment it by a single checkpoint step, even though it includes multiple optimizer steps. It’s not very intuitive, but it will be more consistent with how RL works.
There was a problem hiding this comment.
Could you document that here, specifying that the model will train for 49 steps in the following loop?
| train_groups = await art.gather_trajectory_groups( | ||
| [ | ||
| art.TrajectoryGroup(rollout(model, scenario) for _ in range(8)) | ||
| for scenario in scenarios |
There was a problem hiding this comment.
For completeness, could you also import or declare the scenarios variable somewhere?
| chunks_per_epoch = math.ceil(dataset_size / items_per_chunk) | ||
|
|
||
| # Convert initial_step (batch-based) to initial_chunk for skipping | ||
| initial_chunk = initial_step // chunk_size |
There was a problem hiding this comment.
Isn't this supposed to be initial_step // (chunk_size * batch_size)?
There was a problem hiding this comment.
Both initial_step and chunk_size are already in batch units:
initial_step is: "Global batch step to resume from"
chunk_size is: "Number of batches to process per train_sft call"
So initial_step // chunk_size = batches / batches-per-chunk = chunk index.
src/art/utils/sft.py
Outdated
| return learning_rates | ||
|
|
||
|
|
||
| def prepare_sft( |
There was a problem hiding this comment.
Nit: Do we intentionally want to keep this?
Uh oh!
There was an error while loading. Please reload this page.