refactor(llm): use litellm params for reasoning support by enyst · Pull Request #1990 · OpenHands/software-agent-sdk

enyst · 2026-02-10T18:12:16Z

Summary

derive reasoning_effort support from get_supported_openai_params and remove hardcoded list
normalize litellm_proxy-prefixed models before checking supported params
update Nova 2 Lite reasoning expectations in tests

Testing

uv run pre-commit run --files openhands-sdk/openhands/sdk/llm/utils/model_features.py
uv run pre-commit run --files tests/sdk/llm/test_model_features.py

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.12-nodejs22`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:52bc1a0-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-52bc1a0-python \
  ghcr.io/openhands/agent-server:52bc1a0-python

All tags pushed for this build

ghcr.io/openhands/agent-server:52bc1a0-golang-amd64
ghcr.io/openhands/agent-server:52bc1a0-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:52bc1a0-golang-arm64
ghcr.io/openhands/agent-server:52bc1a0-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:52bc1a0-java-amd64
ghcr.io/openhands/agent-server:52bc1a0-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:52bc1a0-java-arm64
ghcr.io/openhands/agent-server:52bc1a0-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:52bc1a0-python-amd64
ghcr.io/openhands/agent-server:52bc1a0-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:52bc1a0-python-arm64
ghcr.io/openhands/agent-server:52bc1a0-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:52bc1a0-golang
ghcr.io/openhands/agent-server:52bc1a0-java
ghcr.io/openhands/agent-server:52bc1a0-python

About Multi-Architecture Support

Each variant tag (e.g., 52bc1a0-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., 52bc1a0-python-amd64) are also available if needed

Derive reasoning_effort support from get_supported_openai_params and update Nova 2 Lite expectations in tests. Co-authored-by: openhands <openhands@all-hands.dev>

all-hands-bot

🟡 Acceptable - Works but introduces risks without adequate safeguards

Core Trade-off: You replaced a simple, explicit list with a runtime dependency on LiteLLM. This is DRY and reduces maintenance, but sacrifices control and introduces failure modes. The old approach was "worse is better" - ugly but bulletproof. The new approach is elegant but brittle.

Verdict: ❌ Needs rework - The bare except Exception: is unacceptable, cache is unbounded, and error paths are untested.

Key Insight: Dynamic lookups are only better than static lists when the external source is more reliable than you are—prove that LiteLLM is.

openhands-sdk/openhands/sdk/llm/utils/model_features.py

all-hands-bot

🟡 Acceptable - Works but introduces risks without adequate safeguards

Core Trade-off: You replaced a simple, explicit list with a runtime dependency on LiteLLM. This is DRY and reduces maintenance, but sacrifices control and introduces failure modes. The old approach was "worse is better" - ugly but bulletproof. The new approach is elegant but brittle.

Verdict: ❌ Needs rework - The bare except Exception: is unacceptable, cache is unbounded, and error paths are untested.

Key Insight: Dynamic lookups are only better than static lists when the external source is more reliable than you are—prove that LiteLLM is.

openhands-sdk/openhands/sdk/llm/utils/model_features.py

all-hands-bot

Test comment for line 83

openhands-sdk/openhands/sdk/llm/utils/model_features.py

all-hands-bot

Test comment for line 4

openhands-sdk/openhands/sdk/llm/utils/model_features.py

all-hands-bot

Test comment for test file line 35

tests/sdk/llm/test_model_features.py

all-hands-bot

Additional Critical Issues

Following up on my initial review with remaining inline comments:

openhands-sdk/openhands/sdk/llm/utils/model_features.py

tests/sdk/llm/test_model_features.py

all-hands-bot

Additional Critical Issues

Following up on my initial review with remaining inline comments:

openhands-sdk/openhands/sdk/llm/utils/model_features.py

tests/sdk/llm/test_model_features.py

all-hands-bot

🟠 Testing Gaps - Critical Missing Coverage

You added complex logic (prefix stripping, provider splitting, exception handling, LRU cache) but only test the happy path through get_features().

Missing test coverage for _supports_reasoning_effort():

Error handling: What happens when get_supported_openai_params raises an exception? Your bare except Exception: hides this.
LRU cache behavior: Does it actually work? Test: Call same model twice, verify only one LiteLLM call made.
Edge cases:
- Empty string: _supports_reasoning_effort("")
- None: _supports_reasoning_effort(None)
- Whitespace-only: _supports_reasoning_effort(" ")
- Malformed provider: _supports_reasoning_effort("a/b/c/d")
Prefix handling: Does litellm_proxy/ prefix stripping work correctly?
- _supports_reasoning_effort("litellm_proxy/o1")
- _supports_reasoning_effort("litellm_proxy/openai/o1")
Custom provider splitting: Does it handle multiple slashes correctly?

Tests should prove the code works, not just assert it was called. Add unit tests for _supports_reasoning_effort directly (yes, it's private, but it has complex logic that deserves dedicated tests).

Bottom line: Without these tests, you're flying blind. When LiteLLM changes or edge cases hit production, you'll have no safety net.

Co-authored-by: OpenHands Bot <contact@all-hands.dev>

enyst · 2026-02-11T21:34:07Z

@OpenHands The goal of this PR is good, but the implementation is bad. Remove the cached thing.

LiteLLM has a supports_reasoning method or something named like that, right? Make the project with uv, find litellm codebase in .venv, investigate.

Lets use that for detecting which models support reasoning. Commit, run precommit, push.

openhands-ai · 2026-02-11T21:37:18Z

Final summary (new changes since the last PR state)

Checklist against the request

Removed the cached thing: deleted the @lru_cache-based reasoning-support detection.
Investigated LiteLLM in the uv-managed .venv: confirmed litellm.utils.supports_reasoning exists (and inspected its signature/source).
Used LiteLLM’s built-in method for detection: _supports_reasoning_effort() now delegates to supports_reasoning(...) after normalizing the model string (including litellm_proxy/ prefix handling).
Committed the change: refactor(llm): use litellm supports_reasoning (commit c6847155).
Ran pre-commit: uv run pre-commit run --all-files passes.
Pushed to origin/cleanup-reasoning-models (PR updated).

Conciseness / extraneous changes check

The diff is limited to one file: openhands-sdk/openhands/sdk/llm/utils/model_features.py.
No unrelated refactors/tests/docs were changed.

Branch is clean and fully pushed (no pending local changes).

View full conversation

openhands-sdk/openhands/sdk/llm/utils/model_features.py

enyst · 2026-02-12T01:34:37Z

HUMAN:

So my problem here with your code is this: I think litellm does more complex code to find out what the real provider is. We always send to litellm "full model name" which includes provider, so litellm figured it out. We never did extract provider ourselves.

Why it matters: apart from the obvious nice cases, there are cases like "openrouter/anthropic/sonnet-4-5", or "bedrock/anthropic/haiku-3-5:v1", or "openai/llama-4" (which is NOT openai, but it is openai-compatible provider, including custom, local, provider). Litellm knows how to deal with them and extract provider correctly.

Investigate its code. Find out if litellm has some method we can call so that it parses provider from our model, instead of duplicating it ourselves.

ANSWER:

I investigated LiteLLM 1.80.10 (the version in this repo's uv .venv). There is a built-in provider+model parser, and supports_reasoning() already uses it.

What LiteLLM uses internally

litellm.utils.supports_reasoning() is implemented as:

supports_reasoning() → _supports_factory(model, custom_llm_provider, key="supports_reasoning")
_supports_factory() calls litellm.get_llm_provider(model=..., custom_llm_provider=...) first.

Source (from .venv/site-packages/litellm/utils.py):

model, custom_llm_provider, _, _ = litellm.get_llm_provider(
    model=model, custom_llm_provider=custom_llm_provider
)

So the intended public entrypoint for parsing a "full model name" is:

litellm.get_llm_provider(...)

It returns a tuple:

(model, custom_llm_provider, dynamic_api_key, api_base)

and the implementation contains a lot of special-case logic beyond a naive split("/", 1):

supports provider prefixes in the model string
handles OpenAI-compatible endpoints via api_base
has provider-specific routing rules (azure non-openai models, anthropic_text vs anthropic, cohere_chat vs cohere, etc.)

Implication for our code

We should not pre-split the provider ourselves just to call supports_reasoning().

Instead we can pass the full model string (after stripping our own litellm_proxy/ prefix if needed), and call:

supports_reasoning(model=full_model_string, custom_llm_provider=None)

LiteLLM will internally call get_llm_provider() and do the complex parsing.

About the examples you mentioned

Your examples like:

openrouter/anthropic/sonnet-4-5
bedrock/anthropic/haiku-3-5:v1
openai/llama-4

are exactly the kind of thing get_llm_provider() is meant to normalize (and this is the same function supports_reasoning() relies on). So the safest approach is to delegate parsing to LiteLLM by passing the full model string.

(As a side note: when I ran supports_reasoning() on those exact strings locally, it returned False for them in 1.80.10, but that's about capability metadata, not parsing — parsing is still happening via get_llm_provider().)

enyst · 2026-02-14T00:35:10Z

@OpenHands CI is failing please fix. Make sure you read your AGENTS.md to use uv correctly!

openhands-ai · 2026-02-14T00:35:18Z

I'm on it! enyst can track my progress at all-hands.dev

openhands-ai · 2026-02-14T00:45:04Z

Looks like there are a few issues preventing this PR from being merged!

GitHub Actions are failing:
- Run tests

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #1990 at branch `cleanup-reasoning-models`

Feel free to include any additional details that might help me get this PR into a better state.

_{^{You can manage your notification settings}}

…openhands <openhands@all-hands.dev>

openhands-ai · 2026-02-14T00:49:23Z

Final summary (new changes since my last summary)

Double-check: request + instructions

Fixed a CI failure: addressed the failing unit test test_gemini_2_5_pro_defaults_reasoning_effort_low_when_none.
Followed repo guidance: used uv run ... for pre-commit + pytest as per AGENTS.md.
Changes are concise: only touched one production file plus a small repo hygiene fix to ignore agent temp output.
Commit hygiene: commit message includes Co-authored-by: openhands <openhands@all-hands.dev>.
Pushed to the existing PR branch (cleanup-reasoning-models) (already pushed).

What changed (per git diff)

openhands-sdk/openhands/sdk/llm/options/chat_options.py
- Ensured Gemini 2.5 Pro models ("gemini-2.5-pro" substring) default to reasoning_effort="low" when LLM.reasoning_effort is None or "none".
- This defaulting now happens even if get_features(model).supports_reasoning_effort is false (which was causing the test to fail).
- Kept the “strip temperature/top_p” behavior only when the model is actually flagged as supporting reasoning effort.
.gitignore
- Added .agent_tmp/ to prevent agent-generated temporary artifacts from showing up as untracked files.

Validation performed

uv run pre-commit run --files openhands-sdk/openhands/sdk/llm/options/chat_options.py
uv run pytest -q tests/sdk/llm/test_chat_options.py::test_gemini_2_5_pro_defaults_reasoning_effort_low_when_none

Commit / push

Commit: 96990b70 — fix(llm): default gemini-2.5-pro reasoning_effort
Pushed to: cleanup-reasoning-models (updates PR refactor(llm): use litellm params for reasoning support #1990)

View full conversation

github-actions · 2026-02-14T00:51:25Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
openhands-sdk/openhands/sdk/llm/options
chat_options.py	44	3	93%	80–82
openhands-sdk/openhands/sdk/llm/utils
model_features.py	57	3	94%	34, 76–77
TOTAL	17871	5371	69%

… <openhands@all-hands.dev>

refactor(llm): use litellm params for reasoning support

cfdec3b

Derive reasoning_effort support from get_supported_openai_params and update Nova 2 Lite expectations in tests. Co-authored-by: openhands <openhands@all-hands.dev>

all-hands-bot reviewed Feb 10, 2026

View reviewed changes

openhands-sdk/openhands/sdk/llm/utils/model_features.py Outdated Show resolved Hide resolved

all-hands-bot reviewed Feb 10, 2026

View reviewed changes

openhands-sdk/openhands/sdk/llm/utils/model_features.py Outdated Show resolved Hide resolved

This comment was marked as outdated.

Sign in to view

all-hands-bot reviewed Feb 10, 2026

View reviewed changes

openhands-sdk/openhands/sdk/llm/utils/model_features.py Outdated Show resolved Hide resolved

all-hands-bot reviewed Feb 10, 2026

View reviewed changes

openhands-sdk/openhands/sdk/llm/utils/model_features.py Outdated Show resolved Hide resolved

all-hands-bot reviewed Feb 10, 2026

View reviewed changes

tests/sdk/llm/test_model_features.py Show resolved Hide resolved

all-hands-bot reviewed Feb 10, 2026

View reviewed changes

This comment was marked as outdated.

Sign in to view

all-hands-bot reviewed Feb 10, 2026

View reviewed changes

Update openhands-sdk/openhands/sdk/llm/utils/model_features.py

626320c

Co-authored-by: OpenHands Bot <contact@all-hands.dev>

refactor(llm): use litellm supports_reasoning

c684715

enyst commented Feb 11, 2026

View reviewed changes

openhands-sdk/openhands/sdk/llm/utils/model_features.py Outdated Show resolved Hide resolved

enyst marked this pull request as draft February 11, 2026 21:42

OpenHands deleted a comment from openhands-ai bot Feb 12, 2026

enyst added 6 commits February 12, 2026 01:47

refactor(llm): let litellm parse provider for reasoning support

f5b7410

test(llm): cover nested provider model names

c13765b

test(llm): assert sonnet 4.5 reasoning support

994506a

test(llm): add openrouter kimi reasoning expectations

43b78e0

test(llm): cover deepseek reasoning models

f7e80ab

test(llm): add openrouter + proxy reasoning models

8d2f18d

OpenHands deleted a comment from openhands-ai bot Feb 12, 2026

enyst added 3 commits February 12, 2026 06:00

Merge branch 'main' into cleanup-reasoning-models

13ccccc

Merge branch 'main' into cleanup-reasoning-models

830e29b

Merge branch 'main' into cleanup-reasoning-models

1cf4fa1

fix(llm): default gemini-2.5-pro reasoning_effort\n\nCo-authored-by: …

96990b7

…openhands <openhands@all-hands.dev>

test(llm): use gemini-2.5-pro model name\n\nCo-authored-by: openhands…

0b5ce33

… <openhands@all-hands.dev>

enyst added behavior-initiative This is related to the system prompt sections and LLM steering. and removed behavior-initiative This is related to the system prompt sections and LLM steering. labels Feb 14, 2026

Conversation

enyst commented Feb 10, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

🟡 Acceptable - Works but introduces risks without adequate safeguards

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

🟡 Acceptable - Works but introduces risks without adequate safeguards

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Additional Critical Issues

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Additional Critical Issues

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

🟠 Testing Gaps - Critical Missing Coverage

Uh oh!

enyst commented Feb 11, 2026

Uh oh!

openhands-ai bot commented Feb 11, 2026

Final summary (new changes since the last PR state)

Checklist against the request

Conciseness / extraneous changes check

Uh oh!

Uh oh!

enyst commented Feb 12, 2026

What LiteLLM uses internally

Implication for our code

About the examples you mentioned

Uh oh!

enyst commented Feb 14, 2026

Uh oh!

openhands-ai bot commented Feb 14, 2026

Uh oh!

openhands-ai bot commented Feb 14, 2026

Uh oh!

openhands-ai bot commented Feb 14, 2026

Final summary (new changes since my last summary)

Double-check: request + instructions

What changed (per git diff)

Validation performed

Commit / push

Uh oh!

github-actions bot commented Feb 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

enyst commented Feb 10, 2026 •

edited by github-actions bot

Loading

github-actions bot commented Feb 14, 2026 •

edited

Loading