Skip to content

Comments

WIP: SFT (local backend)#530

Merged
Kovbo merged 80 commits intomainfrom
sft-local-backend
Feb 18, 2026
Merged

WIP: SFT (local backend)#530
Kovbo merged 80 commits intomainfrom
sft-local-backend

Conversation

@Kovbo
Copy link
Collaborator

@Kovbo Kovbo commented Jan 22, 2026

screencapture-localhost-3002-fundamentals-sft-training-2026-02-16-15_27_52

optimizer = self._state.trainer.optimizer

# Set SFT-specific hyperparameters
sft_weight_decay = 0.01
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we set this to 0 to be consistent with existing behavior in OpenPipe Platform?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good point! Changed.


# Optimizer step at the end of each batch
optimizer.step()
optimizer.zero_grad()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove optimizer.zero_grad()
We have clear the grad before the training loop

# Set SFT-specific hyperparameters
sft_weight_decay = 0.01
for param_group in optimizer.param_groups:
param_group["weight_decay"] = sft_weight_decay
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we set adam_beta1 and adam_beta2 as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We change weight_decay because it is different for SFT and RL. But adam beta is the same, so there is no need to override it.

total_trajectories = row_count * epochs
skip_trajectories = initial_step * batch_size

if skip_trajectories >= total_trajectories:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove skip_trajectories?
I don't think we support continuous training with our current design

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just added the final_step suuport, and this way it should work with continuous training.
A user can break down training on a large dataset into several calls, and benchmark a checkpoint after each step.
It is just two arguments of the train_sft_from_file util function that help to correctly split the dataset and calculate LR rates.

"""Model-specific configuration for chat templates and training defaults."""


def detect_chat_template_parts(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also support

  • Qwen 2.5
  • Qwen 3 Instruct family?

from art.local import LocalBackend
# from art.serverless.backend import ServerlessBackend

TEACHER_MODEL = "qwen/qwen3-235b-a22b-2507"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe GLM-5 here instead?

{"role": "user", "content": "Explain recursion with a simple example."},
{"role": "assistant", "content": teacher_response},
],
reward=0.0,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, perhaps we should make reward an optional field in Trajectory if we want to use it for SFT as well. Doesn't make sense to ask developers to set it to 0 for no reason.

cc @corbt @bradhilton

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can set the default to 0.0. That keeps it required, but removes the need to specify it for SFT.
The main downside is that people might forget to set a reward for RL, so we should make sure it’s surfaced clearly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using RULER, you don't want to set the reward manually either. As long as our docs are clear, should be fine to set a default.

await model.register(backend)

# Phase 1: SFT warmup from a dataset
await train_sft_from_file(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will model step be after train_sft_from_file completes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we decided to increment it by a single checkpoint step, even though it includes multiple optimizer steps. It’s not very intuitive, but it will be more consistent with how RL works.

Copy link
Collaborator

@arcticfly arcticfly Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you document that here, specifying that the model will train for 49 steps in the following loop?

train_groups = await art.gather_trajectory_groups(
[
art.TrajectoryGroup(rollout(model, scenario) for _ in range(8))
for scenario in scenarios
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For completeness, could you also import or declare the scenarios variable somewhere?

chunks_per_epoch = math.ceil(dataset_size / items_per_chunk)

# Convert initial_step (batch-based) to initial_chunk for skipping
initial_chunk = initial_step // chunk_size
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this supposed to be initial_step // (chunk_size * batch_size)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both initial_step and chunk_size are already in batch units:
initial_step is: "Global batch step to resume from"
chunk_size is: "Number of batches to process per train_sft call"

So initial_step // chunk_size = batches / batches-per-chunk = chunk index.

return learning_rates


def prepare_sft(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Do we intentionally want to keep this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, no, removing!

@Kovbo Kovbo merged commit d9e7603 into main Feb 18, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants