WIP: SFT (local backend)#530

Merged

Kovbo merged 80 commits intomainfrom

sft-local-backend

Feb 18, 2026

Collaborator

Kovbo commented Jan 22, 2026 •

edited

Loading

screencapture-localhost-3002-fundamentals-sft-training-2026-02-16-15_27_52

angkywilliam and others added 30 commits

November 13, 2025 13:11


          SFT data iterator

498d3df


          Add SFT LR utils

3bd818f


          train_sft skeleton

66ec620


          SFT Shape 0.1

4aeda2f


          Add shuffle to SFTConfig

4ff152b


          change SFT args order

b6f0380


          Refactor SFT to accept batched trajectories

e32db37

Move batching and shuffling logic from SFTConfig into iterator
functions. train_sft now accepts Iterable[List[Trajectory]] instead
of individual trajectories, simplifying the API and making batch
management more explicit.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>


          Tokenize SFT Batch

9138b07


          Add num_trainable_tokens to SFTBatch

18a7897


          draft train_sft

90bf94b


          Flatten trajectory for train_sft

12e2142


          Tokenize SFT Batches support flat list and add padding

4ea6c5e


          Fix max_length duplicate name issue

f7bb203


          Remove unused file

d59e524


          remove unused typing

7f6309a


          sft iterator

5ec5575


          SFT Iterator

d6688cf


          Use Unsloth for train on response

6c63af5


          Merge branch 'main' of github.com:OpenPipe/ART into sft

d2b39d5


          refactoring

ca5177b


          implement local backend SFT training

c3a06b4


          Add SFT to Local Backend

9cf747d


          avg loss

28205cb


          refactor, sft works good

64454b1


          Merge branch 'sft' of github.com:OpenPipe/ART into sft

739eb45


          Merge remote-tracking branch 'origin/main' into sft

9918f65


          remove logging

fb706f9


          move tokenizer, update backend

08d87d1


          update lr schedule and tests

0573bc8


          refactor sft training from file

904c3ff

Kovbo added 9 commits

February 4, 2026 23:30


          add openpipe qwen back

f38ff55


          lint fix

e8c9f9a


          calculate pbar


          rename to training_data_url


          accept model run_id from server

ced5ce6


          update optimizer hparams

c3bf7c3


          add claude command

d897dd6


          remove queue, add skills

892ce97


          add docs and colab example

264ec5c

angkywilliam reviewed

View reviewed changes

src/art/unsloth/service.py

+                      optimizer = self._state.trainer.optimizer
+                      # Set SFT-specific hyperparameters
+                      sft_weight_decay = 0.01

Collaborator

angkywilliam Feb 14, 2026

Should we set this to 0 to be consistent with existing behavior in OpenPipe Platform?

Collaborator Author

Kovbo Feb 14, 2026

Yeah, good point! Changed.

src/art/unsloth/service.py

+                          # Optimizer step at the end of each batch
+                          optimizer.step()
+                          optimizer.zero_grad()

Collaborator

angkywilliam Feb 14, 2026

Remove optimizer.zero_grad()
We have clear the grad before the training loop

src/art/unsloth/service.py

+                      # Set SFT-specific hyperparameters
+                      sft_weight_decay = 0.01
+                      for param_group in optimizer.param_groups:
+                          param_group["weight_decay"] = sft_weight_decay

Collaborator

angkywilliam Feb 14, 2026

Should we set adam_beta1 and adam_beta2 as well?

Collaborator Author

Kovbo Feb 14, 2026

We change weight_decay because it is different for SFT and RL. But adam beta is the same, so there is no need to override it.

src/art/unsloth/service.py Show resolved Hide resolved

src/art/utils/sft.py

+                  total_trajectories = row_count * epochs
+                  skip_trajectories = initial_step * batch_size
+                  if skip_trajectories >= total_trajectories:

Collaborator

angkywilliam Feb 14, 2026

Remove skip_trajectories?
I don't think we support continuous training with our current design

Collaborator Author

Kovbo Feb 14, 2026

I just added the final_step suuport, and this way it should work with continuous training.
A user can break down training on a large dataset into several calls, and benchmark a checkpoint after each step.
It is just two arguments of the train_sft_from_file util function that help to correctly split the dataset and calculate LR rates.

src/art/utils/model_config.py

		"""Model-specific configuration for chat templates and training defaults."""


		def detect_chat_template_parts(

Collaborator

angkywilliam Feb 14, 2026

Let's also support

Qwen 2.5
Qwen 3 Instruct family?

Kovbo added 3 commits

February 14, 2026 02:28


          move zero_grad

e798e64


          Merge branch 'main' of github.com:OpenPipe/ART into sft-local-backend

a1dcf1d


          add final step arg

0fe0948

arcticfly reviewed

View reviewed changes

docs/fundamentals/sft-training.mdx Outdated Show resolved Hide resolved

docs/fundamentals/sft-training.mdx Outdated

+              from art.local import LocalBackend
+              # from art.serverless.backend import ServerlessBackend
+              TEACHER_MODEL = "qwen/qwen3-235b-a22b-2507"

Collaborator

arcticfly Feb 16, 2026

Maybe GLM-5 here instead?

docs/fundamentals/sft-training.mdx Outdated

+                              {"role": "user", "content": "Explain recursion with a simple example."},
+                              {"role": "assistant", "content": teacher_response},
+                          ],
+                          reward=0.0,

Collaborator

arcticfly Feb 16, 2026

Hmm, perhaps we should make reward an optional field in Trajectory if we want to use it for SFT as well. Doesn't make sense to ask developers to set it to 0 for no reason.

cc @corbt @bradhilton

Collaborator Author

Kovbo Feb 16, 2026

I can set the default to 0.0. That keeps it required, but removes the need to specify it for SFT.
The main downside is that people might forget to set a reward for RL, so we should make sure it’s surfaced clearly.

Collaborator

arcticfly Feb 17, 2026

When using RULER, you don't want to set the reward manually either. As long as our docs are clear, should be fine to set a default.

docs/fundamentals/sft-training.mdx

+                  await model.register(backend)
+                  # Phase 1: SFT warmup from a dataset
+                  await train_sft_from_file(

Collaborator

arcticfly Feb 16, 2026

What will model step be after train_sft_from_file completes?

Collaborator Author

Kovbo Feb 16, 2026

I think we decided to increment it by a single checkpoint step, even though it includes multiple optimizer steps. It’s not very intuitive, but it will be more consistent with how RL works.

Collaborator

arcticfly Feb 17, 2026 •

edited

Loading

Could you document that here, specifying that the model will train for 49 steps in the following loop?

docs/fundamentals/sft-training.mdx

+                      train_groups = await art.gather_trajectory_groups(
+                          [
+                              art.TrajectoryGroup(rollout(model, scenario) for _ in range(8))
+                              for scenario in scenarios

Collaborator

arcticfly Feb 16, 2026

For completeness, could you also import or declare the scenarios variable somewhere?

Kovbo added 10 commits

February 16, 2026 12:02


          update docs

2ccd819


          Merge branch 'sft-local-backend' of github.com:OpenPipe/ART into sft-…

78fc058

…local-backend


          update docs and trajectories

60d0cac


          lint fix

f0ded2d


          add cli skills

e6fb81f


          add chunking

8797dff


          lint fix

413ef3b


          remove inline trajectories from skills

1c1372c


          update chunking

a68f925


          change default chunk to 10

8b9c8a2

angkywilliam reviewed

View reviewed changes

src/art/utils/sft.py

+                  chunks_per_epoch = math.ceil(dataset_size / items_per_chunk)
+                  # Convert initial_step (batch-based) to initial_chunk for skipping
+                  initial_chunk = initial_step // chunk_size

Collaborator

angkywilliam Feb 18, 2026

Isn't this supposed to be initial_step // (chunk_size * batch_size)?

Collaborator Author

Kovbo Feb 18, 2026

Both initial_step and chunk_size are already in batch units:
initial_step is: "Global batch step to resume from"
chunk_size is: "Number of batches to process per train_sft call"

So initial_step // chunk_size = batches / batches-per-chunk = chunk index.

angkywilliam reviewed

View reviewed changes

src/art/utils/sft.py Outdated

		return learning_rates


		def prepare_sft(

Collaborator

angkywilliam Feb 18, 2026

Nit: Do we intentionally want to keep this?

Collaborator Author

Kovbo Feb 18, 2026

Oh, no, removing!


          remove leftovers

8904bd1

angkywilliam approved these changes

View reviewed changes

Kovbo merged commit d9e7603 into main

2 checks passed

FurtherAI mentioned this pull request

fix: override numpy<2 for megatron-core compatibility #565

Closed

5 tasks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet