-
Notifications
You must be signed in to change notification settings - Fork 1.1k
feat: Add MultiProviderClient for multi-LLM routing #472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: Add MultiProviderClient for multi-LLM routing #472
Conversation
|
@microsoft-github-policy-service agree |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Adds an async MultiProviderClient that mimics the OpenAI Python SDK surface (client.chat.completions.create) while routing requests to different LLM backends based on a provider-<model> prefix.
Changes:
- Introduces
MultiProviderClientwith lazy initialization of provider-specificAsyncOpenAIclients (Google, Groq, OpenAI, Azure, OpenRouter). - Implements model-string parsing (
provider-...) to select the correct backend and strip the provider prefix from the model. - Adds an OpenAI-compatible proxy surface (
chat.completions.create) for drop-in usage.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if provider not in self.clients: | ||
| for name in self.clients: | ||
| if model.startswith(name + "-"): | ||
| provider = name | ||
| actual_model = model[len(name) + 1:] | ||
| break | ||
| else: | ||
| raise ValueError(f"Unknown provider: {provider}. Supported: {list(self.clients.keys())}") | ||
|
|
Copilot
AI
Feb 1, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a supported provider is not configured (e.g., model starts with google- but GOOGLE_API_KEY is not set), this raises Unknown provider, which is misleading. Consider distinguishing between "unsupported provider" and "provider supported but not configured" and include which env var(s) are required in the error message.
| print("--- Multi Provider Client ---") | ||
| print(f"{provider.upper()}: {actual_model}") |
Copilot
AI
Feb 1, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print() calls in create() will spam stdout in library usage and can leak provider/model choices in production logs. Use the project logging facilities (or remove the output), ideally at a debug level so callers can control verbosity.
| def _parse_model(self, model: str) -> tuple[str, str]: | ||
| """Parse model name into provider and actual model name. | ||
|
|
||
| Args: | ||
| model: String in "provider-model_name" format | ||
|
|
||
| Returns: | ||
| (provider, actual_model_name) tuple | ||
| """ | ||
| if "-" not in model: | ||
| raise ValueError(f"Model format must be 'provider-model_name': {model}") | ||
|
|
||
| idx = model.find("-") | ||
| provider = model[:idx] | ||
| actual_model = model[idx + 1:] | ||
|
|
||
| if provider not in self.clients: | ||
| for name in self.clients: | ||
| if model.startswith(name + "-"): | ||
| provider = name | ||
| actual_model = model[len(name) + 1:] | ||
| break | ||
| else: | ||
| raise ValueError(f"Unknown provider: {provider}. Supported: {list(self.clients.keys())}") | ||
|
|
||
| return provider, actual_model |
Copilot
AI
Feb 1, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This introduces new routing/parsing behavior (_parse_model) and a new OpenAI-compatible proxy surface (chat.completions.create), but there are no tests covering provider selection, error cases (missing dash/unknown provider), or custom providers. Add unit tests (e.g., under tests/utils/) with a mocked AsyncOpenAI to verify routing and error messages without requiring real API keys.
| if custom_providers: | ||
| for name, config in custom_providers.items(): | ||
| api_key = config.get("api_key") or os.getenv(config.get("api_key_env", "")) | ||
| base_url = config.get("base_url") | ||
| self.clients[name] = AsyncOpenAI(api_key=api_key, base_url=base_url) |
Copilot
AI
Feb 1, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
custom_providers entries are registered even when api_key/api_key_env resolves to an empty value and/or base_url is missing. This makes misconfiguration hard to diagnose (it will fail later during the first request). Validate required fields during __init__ and raise a clear ValueError, or explicitly document/support providers that do not require an API key.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Pull Request: Multi-Provider Async Client for Hybrid Model Pipelines
💡 Description
This PR introduces a robust
MultiProviderClientthat enables seamless switching between different LLM providers within a single application flow. By parsing theprovider-model_nameprefix, the client automatically routes requests to the appropriate backend.The primary motivation for this implementation is to support Hybrid Model Pipelines. In agentic workflows (specifically prompt optimization tasks like Agent Lightning's APO), it is often more efficient to use a high-reasoning model for the "Gradient" step while offloading the "Apply Edit" step to a smaller, faster, and more cost-effective model (e.g., Llama-3-8B via Groq).
✨ Technical Features
client.chat.completions.create) to ensure drop-in compatibility with existing OpenAI-based codebases.google-gemini-2.0-flash) to select the correct client.GROQ_API_KEY,GOOGLE_API_KEY) are present, preventing runtime errors.custom_providersfor proprietary or local (Ollama/vLLM) endpoints.🛠 Supported Backends
GOOGLE_API_KEYgoogle-gemini-2.0-flashGROQ_API_KEYmeta-llama/llama-4-maverick-17b-128e-instructOPENAI_API_KEYopenai-gpt-4oAZURE_OPENAI_API_KEYazure-gpt-4OPENROUTER_API_KEYopenrouter-anthropic/claude-3🚀 Usage Example (Optimizing with Smaller Models)
This setup allows using a powerful model for reasoning and a smaller/faster model for edits to save resources: