Enable configurable context condensation in all benchmarks by juanmichelini · Pull Request #429 · OpenHands/benchmarks

juanmichelini · 2026-02-18T23:16:36Z

Summary

This PR enables context condensation in all benchmarks and makes it configurable via config.py files and command-line arguments. The default condenser from software-agent-sdk (LLMSummarizingCondenser) is now used by default with max_size=80 and keep_first=4.

Fixes #407

Changes

Configuration

EvalMetadata: Added three new fields to support condenser configuration:
- enable_condenser (bool, default: True): Enable/disable the context condenser
- condenser_max_size (int, default: 80): Maximum number of events before condensing
- condenser_keep_first (int, default: 4): Number of initial events to always keep
Benchmark configs: Added CONDENSER_DEFAULTS to:
- benchmarks/swebench/config.py
- benchmarks/swtbench/config.py
- benchmarks/swebenchmultimodal/config.py

Command-Line Arguments

Added new CLI arguments to control condenser behavior:

--enable-condenser: Explicitly enable the condenser
--disable-condenser: Disable the condenser (takes precedence over enable)
--condenser-max-size N: Set the maximum number of events before condensing
--condenser-keep-first N: Set the number of initial events to always keep

Agent Creation

Updated agent creation in all benchmark evaluation classes to use LLMSummarizingCondenser when enabled:

benchmarks/swebench/run_infer.py
benchmarks/swtbench/run_infer.py
benchmarks/swebenchmultimodal/run_infer.py
benchmarks/multiswebench/run_infer.py

Testing

Added comprehensive test coverage in tests/test_condenser_config.py:

Config defaults validation
EvalMetadata accepts condenser parameters
Command-line argument parsing
Enable/disable flag behavior
Size parameter configuration

All tests pass and pre-commit checks (ruff, pycodestyle, pyright) pass.

Usage

Default behavior (condenser enabled)

python -m benchmarks.swebench.run_infer llm_config.json

Disable condenser

python -m benchmarks.swebench.run_infer llm_config.json --disable-condenser

Custom condenser settings

python -m benchmarks.swebench.run_infer llm_config.json \
  --condenser-max-size 100 \
  --condenser-keep-first 10

Notes

The condenser is enabled by default to help manage context length in long-running evaluations
Configuration can be overridden at multiple levels: config.py defaults → CLI arguments
The --disable-condenser flag takes precedence over --enable-condenser to allow explicit disabling
The condenser uses a separate LLM service ID ("condenser") to track token usage separately from the main agent

@juanmichelini can click here to continue refining the PR

This change enables context condensation in all benchmarks and makes it configurable via config.py files and command-line arguments. The default condenser from software-agent-sdk is now used by default with max_size=80 and keep_first=4. Changes: - Add condenser configuration fields to EvalMetadata - Add CONDENSER_DEFAULTS to config.py files in swebench, swtbench, and swebenchmultimodal - Add command-line arguments for controlling condenser (--enable-condenser, --disable-condenser, --condenser-max-size, --condenser-keep-first) - Update agent creation in all benchmarks to use LLMSummarizingCondenser when enabled - Add comprehensive tests for condenser configuration Fixes #407 Co-authored-by: openhands <openhands@all-hands.dev>

juanmichelini · 2026-02-19T14:28:32Z

@OpenHands "condenser_max_size" should be 240 and "condenser_keep_first": should be 2. change that for all benchmarks

openhands-ai · 2026-02-19T14:29:02Z

I'm on it! juanmichelini can track my progress at all-hands.dev

…and Commit0 support - Update condenser_max_size from 80 to 240 and condenser_keep_first from 4 to 2 across all benchmarks - Update EvalMetadata defaults in models.py to use new values (240/2) - Add condenser configuration to GAIA and Commit0 benchmarks: - Add CONDENSER_DEFAULTS to config.py files - Add LLMSummarizingCondenser import and agent creation logic to run_infer.py files - Update tests to reflect new condenser defaults and add coverage for GAIA and Commit0 - All tests pass and pre-commit checks pass Co-authored-by: openhands <openhands@all-hands.dev>

openhands-ai · 2026-02-19T14:36:26Z

Summary of Changes

I've successfully updated the condenser configuration across all benchmarks and added condenser support to GAIA and Commit0 as requested. Here's what was completed:

✅ Checklist - All Requirements Met

Updated condenser values from 80/4 to 240/2 for all benchmarks:
- ✅ benchmarks/swebench/config.py - CONDENSER_DEFAULTS updated
- ✅ benchmarks/swtbench/config.py - CONDENSER_DEFAULTS updated
- ✅ benchmarks/swebenchmultimodal/config.py - CONDENSER_DEFAULTS updated
- ✅ benchmarks/utils/models.py - EvalMetadata defaults updated
Added condenser support to GAIA and Commit0:
- ✅ benchmarks/gaia/config.py - Added CONDENSER_DEFAULTS (240/2) and integrated into INFER_DEFAULTS
- ✅ benchmarks/gaia/run_infer.py - Added LLMSummarizingCondenser import and agent integration
- ✅ benchmarks/commit0/config.py - Added CONDENSER_DEFAULTS (240/2) and integrated into INFER_DEFAULTS
- ✅ benchmarks/commit0/run_infer.py - Added LLMSummarizingCondenser import and agent integration
Updated tests:
- ✅ tests/test_condenser_config.py - Updated assertions to expect 240/2 values
- ✅ Added test coverage for GAIA and Commit0 condenser configurations

Verification

✅ All 11 tests pass
✅ All pre-commit checks pass (Ruff format, Ruff lint, pycodestyle, Pyright)
✅ Changes committed with descriptive message and Co-authored-by tag
✅ Changes pushed to PR branch openhands/enable-configurable-condenser

The PR is now updated with all requested changes. All benchmarks (SWE-bench, SWT-bench, SWE-bench Multimodal, GAIA, and Commit0) now use condenser_max_size=240 and condenser_keep_first=2.

benchmarks/gaia/run_infer.py

benchmarks/swebenchmultimodal/run_infer.py

juanmichelini · 2026-02-27T23:52:33Z

@OpenHands address enyst concerns please

openhands-ai · 2026-02-27T23:52:51Z

I'm on it! juanmichelini can track my progress at all-hands.dev

Address review feedback from enyst: The parameter name service_id is deprecated and has been replaced with usage_id in the current SDK version. Updated all 6 benchmark files to use the correct parameter name: - benchmarks/gaia/run_infer.py - benchmarks/swebenchmultimodal/run_infer.py - benchmarks/swebench/run_infer.py - benchmarks/swtbench/run_infer.py - benchmarks/commit0/run_infer.py - benchmarks/multiswebench/run_infer.py Co-authored-by: openhands <openhands@all-hands.dev>

openhands-ai · 2026-02-27T23:57:12Z

Summary of Work

I successfully addressed enyst's review concerns on PR #429 "Enable configurable context condensation in all benchmarks".

Issue Addressed

Enyst identified that the parameter name service_id is deprecated and should be replaced with usage_id in the current SDK version.

Changes Made

✅ Updated all 6 benchmark files to use the correct parameter name usage_id:

benchmarks/gaia/run_infer.py
benchmarks/swebenchmultimodal/run_infer.py
benchmarks/swebench/run_infer.py
benchmarks/swtbench/run_infer.py
benchmarks/commit0/run_infer.py
benchmarks/multiswebench/run_infer.py

Each file had one line changed:

# Before:
llm=self.metadata.llm.model_copy(update={"service_id": "condenser"})

# After:
llm=self.metadata.llm.model_copy(update={"usage_id": "condenser"})

Verification

✅ All pre-commit checks passed (Ruff format, Ruff lint, pycodestyle, Pyright)
✅ Changes are minimal and focused on the issue
✅ Committed with descriptive message including Co-authored-by tag
✅ Pushed to PR branch openhands/enable-configurable-condenser

The PR has been updated and enyst's concerns have been fully addressed. The changes are concise and only modify what was necessary to fix the deprecated parameter name.

juanmichelini · 2026-02-28T00:47:11Z

@enyst @csmith49 I'm running an integration test here https://github.com/OpenHands/software-agent-sdk/actions/runs/22509555202 if that finishes successfully we can check the logs for correct condensation.

openhands-ai bot mentioned this pull request Feb 18, 2026

Make context condensation in benchmarks configurable #407

Open

enyst reviewed Feb 22, 2026

View reviewed changes

benchmarks/gaia/run_infer.py Outdated Show resolved Hide resolved

enyst reviewed Feb 22, 2026

View reviewed changes

benchmarks/swebenchmultimodal/run_infer.py Outdated Show resolved Hide resolved

Merge branch 'main' into openhands/enable-configurable-condenser

1ac79d9

enyst requested a review from csmith49 February 28, 2026 00:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable configurable context condensation in all benchmarks#429

Enable configurable context condensation in all benchmarks#429
juanmichelini wants to merge 4 commits intomainfrom
openhands/enable-configurable-condenser

juanmichelini commented Feb 18, 2026

Uh oh!

juanmichelini commented Feb 19, 2026

Uh oh!

openhands-ai bot commented Feb 19, 2026

Uh oh!

openhands-ai bot commented Feb 19, 2026

Uh oh!

Uh oh!

Uh oh!

juanmichelini commented Feb 27, 2026

Uh oh!

openhands-ai bot commented Feb 27, 2026

Uh oh!

openhands-ai bot commented Feb 27, 2026

Uh oh!

juanmichelini commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

juanmichelini commented Feb 18, 2026

Summary

Changes

Configuration

Command-Line Arguments

Agent Creation

Testing

Usage

Default behavior (condenser enabled)

Disable condenser

Custom condenser settings

Notes

Uh oh!

juanmichelini commented Feb 19, 2026

Uh oh!

openhands-ai bot commented Feb 19, 2026

Uh oh!

openhands-ai bot commented Feb 19, 2026

Summary of Changes

✅ Checklist - All Requirements Met

Verification

Uh oh!

Uh oh!

Uh oh!

juanmichelini commented Feb 27, 2026

Uh oh!

openhands-ai bot commented Feb 27, 2026

Uh oh!

openhands-ai bot commented Feb 27, 2026

Summary of Work

Issue Addressed

Changes Made

Verification

Uh oh!

juanmichelini commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants