Skip to content

refactor(llm): use litellm params for reasoning support#1990

Draft
enyst wants to merge 14 commits intomainfrom
cleanup-reasoning-models
Draft

refactor(llm): use litellm params for reasoning support#1990
enyst wants to merge 14 commits intomainfrom
cleanup-reasoning-models

Conversation

@enyst
Copy link
Collaborator

@enyst enyst commented Feb 10, 2026

Summary

  • derive reasoning_effort support from get_supported_openai_params and remove hardcoded list
  • normalize litellm_proxy-prefixed models before checking supported params
  • update Nova 2 Lite reasoning expectations in tests

Testing

  • uv run pre-commit run --files openhands-sdk/openhands/sdk/llm/utils/model_features.py
  • uv run pre-commit run --files tests/sdk/llm/test_model_features.py

Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:52bc1a0-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-52bc1a0-python \
  ghcr.io/openhands/agent-server:52bc1a0-python

All tags pushed for this build

ghcr.io/openhands/agent-server:52bc1a0-golang-amd64
ghcr.io/openhands/agent-server:52bc1a0-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:52bc1a0-golang-arm64
ghcr.io/openhands/agent-server:52bc1a0-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:52bc1a0-java-amd64
ghcr.io/openhands/agent-server:52bc1a0-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:52bc1a0-java-arm64
ghcr.io/openhands/agent-server:52bc1a0-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:52bc1a0-python-amd64
ghcr.io/openhands/agent-server:52bc1a0-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:52bc1a0-python-arm64
ghcr.io/openhands/agent-server:52bc1a0-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:52bc1a0-golang
ghcr.io/openhands/agent-server:52bc1a0-java
ghcr.io/openhands/agent-server:52bc1a0-python

About Multi-Architecture Support

  • Each variant tag (e.g., 52bc1a0-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 52bc1a0-python-amd64) are also available if needed

Derive reasoning_effort support from get_supported_openai_params and
update Nova 2 Lite expectations in tests.

Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable - Works but introduces risks without adequate safeguards

Core Trade-off: You replaced a simple, explicit list with a runtime dependency on LiteLLM. This is DRY and reduces maintenance, but sacrifices control and introduces failure modes. The old approach was "worse is better" - ugly but bulletproof. The new approach is elegant but brittle.

Verdict:Needs rework - The bare except Exception: is unacceptable, cache is unbounded, and error paths are untested.

Key Insight: Dynamic lookups are only better than static lists when the external source is more reliable than you are—prove that LiteLLM is.

Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable - Works but introduces risks without adequate safeguards

Core Trade-off: You replaced a simple, explicit list with a runtime dependency on LiteLLM. This is DRY and reduces maintenance, but sacrifices control and introduces failure modes. The old approach was "worse is better" - ugly but bulletproof. The new approach is elegant but brittle.

Verdict:Needs rework - The bare except Exception: is unacceptable, cache is unbounded, and error paths are untested.

Key Insight: Dynamic lookups are only better than static lists when the external source is more reliable than you are—prove that LiteLLM is.

all-hands-bot

This comment was marked as outdated.

Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test comment for line 83

Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test comment for line 4

Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test comment for test file line 35

Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Critical Issues

Following up on my initial review with remaining inline comments:

Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Critical Issues

Following up on my initial review with remaining inline comments:

all-hands-bot

This comment was marked as outdated.

Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Testing Gaps - Critical Missing Coverage

You added complex logic (prefix stripping, provider splitting, exception handling, LRU cache) but only test the happy path through get_features().

Missing test coverage for _supports_reasoning_effort():

  1. Error handling: What happens when get_supported_openai_params raises an exception? Your bare except Exception: hides this.

  2. LRU cache behavior: Does it actually work? Test: Call same model twice, verify only one LiteLLM call made.

  3. Edge cases:

    • Empty string: _supports_reasoning_effort("")
    • None: _supports_reasoning_effort(None)
    • Whitespace-only: _supports_reasoning_effort(" ")
    • Malformed provider: _supports_reasoning_effort("a/b/c/d")
  4. Prefix handling: Does litellm_proxy/ prefix stripping work correctly?

    • _supports_reasoning_effort("litellm_proxy/o1")
    • _supports_reasoning_effort("litellm_proxy/openai/o1")
  5. Custom provider splitting: Does it handle multiple slashes correctly?

Tests should prove the code works, not just assert it was called. Add unit tests for _supports_reasoning_effort directly (yes, it's private, but it has complex logic that deserves dedicated tests).

Bottom line: Without these tests, you're flying blind. When LiteLLM changes or edge cases hit production, you'll have no safety net.

Co-authored-by: OpenHands Bot <contact@all-hands.dev>
@enyst
Copy link
Collaborator Author

enyst commented Feb 11, 2026

@OpenHands The goal of this PR is good, but the implementation is bad. Remove the cached thing.

LiteLLM has a supports_reasoning method or something named like that, right? Make the project with uv, find litellm codebase in .venv, investigate.

Lets use that for detecting which models support reasoning. Commit, run precommit, push.

@openhands-ai
Copy link

openhands-ai bot commented Feb 11, 2026

Final summary (new changes since the last PR state)

Checklist against the request

  • Removed the cached thing: deleted the @lru_cache-based reasoning-support detection.
  • Investigated LiteLLM in the uv-managed .venv: confirmed litellm.utils.supports_reasoning exists (and inspected its signature/source).
  • Used LiteLLM’s built-in method for detection: _supports_reasoning_effort() now delegates to supports_reasoning(...) after normalizing the model string (including litellm_proxy/ prefix handling).
  • Committed the change: refactor(llm): use litellm supports_reasoning (commit c6847155).
  • Ran pre-commit: uv run pre-commit run --all-files passes.
  • Pushed to origin/cleanup-reasoning-models (PR updated).

Conciseness / extraneous changes check

  • The diff is limited to one file: openhands-sdk/openhands/sdk/llm/utils/model_features.py.
  • No unrelated refactors/tests/docs were changed.

Branch is clean and fully pushed (no pending local changes).

View full conversation

@enyst enyst marked this pull request as draft February 11, 2026 21:42
Copy link
Collaborator Author

enyst commented Feb 12, 2026

HUMAN:

So my problem here with your code is this: I think litellm does more complex code to find out what the real provider is. We always send to litellm "full model name" which includes provider, so litellm figured it out. We never did extract provider ourselves.

Why it matters: apart from the obvious nice cases, there are cases like "openrouter/anthropic/sonnet-4-5", or "bedrock/anthropic/haiku-3-5:v1", or "openai/llama-4" (which is NOT openai, but it is openai-compatible provider, including custom, local, provider). Litellm knows how to deal with them and extract provider correctly.

Investigate its code. Find out if litellm has some method we can call so that it parses provider from our model, instead of duplicating it ourselves.

ANSWER:

I investigated LiteLLM 1.80.10 (the version in this repo's uv .venv). There is a built-in provider+model parser, and supports_reasoning() already uses it.

What LiteLLM uses internally

litellm.utils.supports_reasoning() is implemented as:

  • supports_reasoning()_supports_factory(model, custom_llm_provider, key="supports_reasoning")
  • _supports_factory() calls litellm.get_llm_provider(model=..., custom_llm_provider=...) first.

Source (from .venv/site-packages/litellm/utils.py):

model, custom_llm_provider, _, _ = litellm.get_llm_provider(
    model=model, custom_llm_provider=custom_llm_provider
)

So the intended public entrypoint for parsing a "full model name" is:

  • litellm.get_llm_provider(...)

It returns a tuple:

  • (model, custom_llm_provider, dynamic_api_key, api_base)

and the implementation contains a lot of special-case logic beyond a naive split("/", 1):

  • supports provider prefixes in the model string
  • handles OpenAI-compatible endpoints via api_base
  • has provider-specific routing rules (azure non-openai models, anthropic_text vs anthropic, cohere_chat vs cohere, etc.)

Implication for our code

We should not pre-split the provider ourselves just to call supports_reasoning().

Instead we can pass the full model string (after stripping our own litellm_proxy/ prefix if needed), and call:

  • supports_reasoning(model=full_model_string, custom_llm_provider=None)

LiteLLM will internally call get_llm_provider() and do the complex parsing.

About the examples you mentioned

Your examples like:

  • openrouter/anthropic/sonnet-4-5
  • bedrock/anthropic/haiku-3-5:v1
  • openai/llama-4

are exactly the kind of thing get_llm_provider() is meant to normalize (and this is the same function supports_reasoning() relies on). So the safest approach is to delegate parsing to LiteLLM by passing the full model string.

(As a side note: when I ran supports_reasoning() on those exact strings locally, it returned False for them in 1.80.10, but that's about capability metadata, not parsing — parsing is still happening via get_llm_provider().)

@OpenHands OpenHands deleted a comment from openhands-ai bot Feb 12, 2026
@OpenHands OpenHands deleted a comment from openhands-ai bot Feb 12, 2026
@OpenHands OpenHands deleted a comment from openhands-ai bot Feb 12, 2026
@enyst
Copy link
Collaborator Author

enyst commented Feb 14, 2026

@OpenHands CI is failing please fix. Make sure you read your AGENTS.md to use uv correctly!

@openhands-ai
Copy link

openhands-ai bot commented Feb 14, 2026

I'm on it! enyst can track my progress at all-hands.dev

@openhands-ai
Copy link

openhands-ai bot commented Feb 14, 2026

Looks like there are a few issues preventing this PR from being merged!

  • GitHub Actions are failing:
    • Run tests

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #1990 at branch `cleanup-reasoning-models`

Feel free to include any additional details that might help me get this PR into a better state.

You can manage your notification settings

@openhands-ai
Copy link

openhands-ai bot commented Feb 14, 2026

Final summary (new changes since my last summary)

Double-check: request + instructions

  • Fixed a CI failure: addressed the failing unit test test_gemini_2_5_pro_defaults_reasoning_effort_low_when_none.
  • Followed repo guidance: used uv run ... for pre-commit + pytest as per AGENTS.md.
  • Changes are concise: only touched one production file plus a small repo hygiene fix to ignore agent temp output.
  • Commit hygiene: commit message includes Co-authored-by: openhands <openhands@all-hands.dev>.
  • Pushed to the existing PR branch (cleanup-reasoning-models) (already pushed).

What changed (per git diff)

  1. openhands-sdk/openhands/sdk/llm/options/chat_options.py

    • Ensured Gemini 2.5 Pro models ("gemini-2.5-pro" substring) default to reasoning_effort="low" when LLM.reasoning_effort is None or "none".
    • This defaulting now happens even if get_features(model).supports_reasoning_effort is false (which was causing the test to fail).
    • Kept the “strip temperature/top_p” behavior only when the model is actually flagged as supporting reasoning effort.
  2. .gitignore

    • Added .agent_tmp/ to prevent agent-generated temporary artifacts from showing up as untracked files.

Validation performed

  • uv run pre-commit run --files openhands-sdk/openhands/sdk/llm/options/chat_options.py
  • uv run pytest -q tests/sdk/llm/test_chat_options.py::test_gemini_2_5_pro_defaults_reasoning_effort_low_when_none

Commit / push

View full conversation

@github-actions
Copy link
Contributor

github-actions bot commented Feb 14, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/llm/options
   chat_options.py44393%80–82
openhands-sdk/openhands/sdk/llm/utils
   model_features.py57394%34, 76–77
TOTAL17871537169% 

@enyst enyst added behavior-initiative This is related to the system prompt sections and LLM steering. and removed behavior-initiative This is related to the system prompt sections and LLM steering. labels Feb 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants