-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Description
🔴 Required Information
Please ensure all items in this section are completed to allow for efficient
triaging. Requests without complete information may be rejected / deprioritized.
If an item is not applicable to you - please mark it as N/A
Describe the Bug:
When using LLM as judge, google/adk/models/registry's resolve function fails when "judge_model" is "azure/gpt-4o".
This is because r"azure/.*" is missing in the adk/models/lite_llm LiteLLm supported_models() return value, this bug does not happen if this line is included.
Steps to Reproduce:
Run eval with any LLM-as-a-judge criteria on any evalset on an agent with the judge_model set to azure/gpt-4o.
Expected Behavior:
A clear and concise description of what you expected to happen.
The program will be unable to find a suitable model due to 'azure/gpt-4o' not matching any regex found in the _llm_registry_dict.
Observed Behavior:
What actually happened? Include error messages or crash stack traces here.
"2026-01-30 16:50:50,906 - ERROR - local_eval_service.py:357 - Metric evaluation failed for metric final_response_match_v2 for eval case id '[eval case id]' with following error `Model azure/gpt-4o not found."
In the table, all criteria that use an LLM-as-a-judge will have the Status "NOT_EVALUATED".
Environment Details:
- ADK Library Version (pip show google-adk):
Name: google-adk
Version: 1.23.0
Summary: Agent Development Kit
Home-page: https://google.github.io/adk-docs/
Author:
Author-email: Google LLC googleapis-packages@google.com
License:
Location: C:\Projects\IsIT.Agents.Search.venv\Lib\site-packages
Requires: aiosqlite, anyio, authlib, click, fastapi, google-api-python-client, google-auth, google-cloud-aiplatform, google-cloud-bigquery, google-cloud-bigquery-storage, google-cloud-bigtable, google-cloud-discoveryengine, google-cloud-pubsub, google-cloud-secret-manager, google-cloud-spanner, google-cloud-speech, google-cloud-storage, google-genai, graphviz, jsonschema, mcp, opentelemetry-api, opentelemetry-exporter-gcp-logging, opentelemetry-exporter-gcp-monitoring, opentelemetry-exporter-gcp-trace, opentelemetry-exporter-otlp-proto-http, opentelemetry-resourcedetector-gcp, opentelemetry-sdk, pyarrow, pydantic, python-dateutil, python-dotenv, PyYAML, requests, sqlalchemy, sqlalchemy-spanner, starlette, tenacity, typing-extensions, tzlocal, uvicorn, watchdog, websockets
Required-by: Search, toolbox-adk
- Desktop OS:** [e.g., macOS, Linux, Windows]
Windows - Python Version (python -V):
Python 3.12.10
Model Information:
- Are you using LiteLLM: Yes/No
Yes - Which model is being used: (e.g., gemini-2.5-pro)
azure/gpt-4o
🟡 Optional Information
Providing this information greatly speeds up the resolution process.
Regression:
Did this work in a previous version of ADK? If so, which one?
I don't think so.
How often has this issue occurred?:
- Always (100%)