feat: add top-k entropy approximation for memory-efficient GRPO training by saurabhbikram · Pull Request #555 · OpenPipe/ART

saurabhbikram · 2026-02-12T08:31:34Z

Summary

When training models with large vocabularies (128k+ tokens, e.g. Qwen3), computing entropy over the full vocabulary during GRPO is a major memory bottleneck. This PR adds a top_k_entropy config parameter:

top_k_entropy=0 (default): computes full-vocabulary entropy — identical to existing behavior, no regression.
top_k_entropy=N (e.g. 256): computes entropy over only the top-k logits instead of materializing a [B, chunk_size, V] log-probs tensor, dramatically reducing peak GPU memory.
Reference model always passes top_k_entropy=0 since entropy is unused in KL divergence.

Files changed

src/art/unsloth/train.py — adds top_k_entropy parameter threading through calculate_logprobs → _calculate_logprobs, adds top-k branch alongside preserved full-vocab entropy default

When training models with large vocabularies (128k+ tokens), computing entropy over the full vocabulary is a major memory bottleneck. This adds a `top_k_entropy` config parameter (default 0 = disabled) that computes entropy over only the top-k logits, dramatically reducing memory usage. Also skips entropy computation entirely for reference model logprobs since entropy is unused in the KL divergence calculation. https://claude.ai/code/session_017Y9KNNQX2RyVWnqpj3A4hh

Preserve full-vocab entropy as default (top_k_entropy=0), only use top-k approximation when explicitly configured. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

saurabhbikram force-pushed the top-k-entropy-grpo branch from 524672e to 480300f Compare February 12, 2026 08:43

feat: add top-k entropy approximation for memory-efficient GRPO training

14b8711

Preserve full-vocab entropy as default (top_k_entropy=0), only use top-k approximation when explicitly configured. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

saurabhbikram force-pushed the top-k-entropy-grpo branch from 480300f to 14b8711 Compare February 12, 2026 08:44

saurabhbikram marked this pull request as ready for review February 12, 2026 08:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

feat: add top-k entropy approximation for memory-efficient GRPO training#555

feat: add top-k entropy approximation for memory-efficient GRPO training#555
saurabhbikram wants to merge 2 commits intoOpenPipe:mainfrom
nansen-ai:top-k-entropy-grpo

saurabhbikram commented Feb 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

saurabhbikram commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Files changed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

saurabhbikram commented Feb 12, 2026 •

edited

Loading