[Integration]: Add Intelligent Semantic Routing with vLLM-SR #1735

Xunzhuo · 2025-11-06T07:05:40Z

Pull Request Description

[Integration]: Add Intelligent Semantic Routing with vLLM-SR

Contribution Guidelines (Expand for Details)

We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:

Pull Request Title Format

Your PR title should start with one of these prefixes to indicate the nature of the change:

[Bug]: Corrections to existing functionality
[CI]: Changes to build process or CI pipeline
[Docs]: Updates or additions to documentation
[API]: Modifications to aibrix's API or interface
[CLI]: Changes or additions to the Command Line Interface
[Misc]: For changes not covered above (use sparingly)

Note: For changes spanning multiple categories, use multiple prefixes in order of importance.

Submission Checklist

PR title includes appropriate prefix(es)
Changes are clearly explained in the PR description
New and existing tests pass successfully
Code adheres to project style and best practices
Documentation updated to reflect changes (if applicable)
Thorough testing completed, no regressions introduced

By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.

gemini-code-assist · 2025-11-06T07:05:58Z

Summary of Changes

Hello @Xunzhuo, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces extensive documentation for integrating Intelligent Semantic Routing (vLLM-SR) with AIBrix. The new guide explains how vLLM-SR enhances LLM inference by intelligently routing requests to the most suitable backend models, improving scalability, and optimizing costs. It provides clear instructions for deploying and testing this integration, ensuring users can easily leverage its advanced capabilities.

Highlights

New Feature Documentation: Introduced comprehensive documentation for integrating Intelligent Semantic Routing with vLLM-SR into the AIBrix ecosystem.
Deployment and Testing Guide: Added detailed, step-by-step instructions for deploying the vLLM Semantic Router, a demo LLM, and necessary Gateway API resources, along with methods for testing the deployment locally.
Semantic Router Capabilities: Outlined the key features of vLLM Semantic Router, including intelligent routing, AI-powered security, semantic caching, auto-reasoning, real-time analytics, and a scalable architecture.
Integration Benefits: Detailed the advantages of integrating vLLM Semantic Router with AIBrix, such as intelligent request routing, enhanced scalability, cost optimization, production-ready infrastructure, and simplified operations.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds new documentation for the Intelligent Semantic Routing feature. The documentation is well-structured and provides a good overview and a step-by-step guide. I've identified a few areas for improvement to make the guide more user-friendly and complete, such as using HTTPS for git cloning, clarifying prerequisites, and making the example response more relevant. The changes to index.rst are correct.

gemini-code-assist · 2025-11-06T07:06:51Z

docs/source/features/semantic-routing.rst

+Prerequisites
+========================================
+
+Before starting, ensure you have the installed AIBrix components.


The 'Prerequisites' section is too vague. It should explicitly list all requirements. For instance:

A running Kubernetes cluster. If kind is recommended (as suggested by the cleanup step), you should provide the command to create it here (e.g., kind create cluster --name semantic-router-cluster).

kubectl installed and configured.

The AIBrix components installed.

Without the cluster creation step, the cleanup command kind delete cluster --name semantic-router-cluster on line 156 will fail for users who followed this guide.

Prerequisites ==========

Let's get the === aligned with the title

gemini-code-assist · 2025-11-06T07:06:51Z

docs/source/features/semantic-routing.rst

+.. code-block:: bash
+
+   # Clone the semantic router repository
+   git clone git@github.com:vllm-project/semantic-router.git && cd semantic-router


Using git@github.com:... for cloning requires users to have SSH keys configured with GitHub. It's more user-friendly to provide the HTTPS URL, which works for everyone without any prior setup.

Suggested change

git clone git@github.com:vllm-project/semantic-router.git && cd semantic-router

git clone https://github.com/vllm-project/semantic-router.git && cd semantic-router

for public facing doc, let's change to https

i will update this soon, we provided the helm chart already

gemini-code-assist · 2025-11-06T07:06:51Z

docs/source/features/semantic-routing.rst

+   x-vsr-injected-system-prompt: true
+   transfer-encoding: chunked
+
+   {"id":"chatcmpl-f390a0c6-b38f-4a73-b019-9374a3c5d69b","created":1762411088,"model":"vllm-llama3-8b-instruct","usage":{"prompt_tokens":42,"completion_tokens":48,"total_tokens":90},"object":"chat.completion","do_remote_decode":false,"do_remote_prefill":false,"remote_block_ids":null,"remote_engine_id":"","remote_host":"","remote_port":0,"choices":[{"index":0,"finish_reason":"stop","message":{"role":"assistant","content":"I am your AI assistant, how can I help you today? To be or not to be that is the question. Alas, poor Yorick! I knew him, Horatio: A fellow of infinite jest Testing, testing 1,2,3"}}]}


The example response content seems to be a generic placeholder and doesn't relate to the math question asked in the curl request. While the headers (x-vsr-selected-category: math) correctly show the routing worked, a response that actually answers the question (or at least looks like a plausible answer from an LLM) would make the example more convincing and less confusing for the user.

Signed-off-by: bitliu <bitliu@tencent.com>

googs1025 · 2025-11-06T11:23:33Z

docs/source/features/semantic-routing.rst

+   git clone git@github.com:vllm-project/semantic-router.git && cd semantic-router
+
+   # Deploy semantic router using Kustomize
+   kubectl apply -k deploy/kubernetes/aibrix/semantic-router


~~do we need to add file in deploy/kubernetes/aibrix/semantic-router ?~~

~~or maybe we can just use link in project semantic-router 🤔~~

I'd like to know can it be part of the aibrix kustomization manifest or helm chart?
for future release, aibrix may be able to install the semantic router with main manifests

yes, make sense to me.

nurali-techie · 2025-11-07T06:35:33Z

docs/source/features/semantic-routing.rst

+About vLLM AIBrix
+========================================
+
+`vLLM AIBrix <https://github.com/vllm-project/aibrix>`_ is an open-source initiative designed to provide essential building blocks to construct scalable GenAI inference infrastructure. AIBrix delivers a cloud-native solution optimized for deploying, managing, and scaling large language model (LLM) inference, tailored specifically to enterprise needs.


nit: Don't see a need of About vLLM AIBrix within aibrix project.

nurali-techie · 2025-11-07T11:43:49Z

docs/source/features/semantic-routing.rst

+   kubectl delete -k deploy/kubernetes/aibrix/semantic-router
+
+   # Delete kind cluster
+   kind delete cluster --name semantic-router-cluster


It seems copy/paste mistake. We never instructed to create kind cluster semantic-router-cluster. So having delete command is not fitting well here.

nurali-techie · 2025-11-07T11:49:48Z

Tested OK ✔️

Hi, I have created aibrix cluster following guide -- https://github.com/vllm-project/aibrix/blob/main/development/vllm/README.md ; after that I followed this PR guide to add semantic-router. It's working as expected.

Thanks @Xunzhuo 👍

Xunzhuo · 2025-11-07T12:45:20Z

thanks for testing it @nurali-techie

Jeffwan · 2025-11-08T22:35:55Z

docs/source/features/semantic-routing.rst

+.. code-block:: bash
+
+   # Deploy demo LLM
+   kubectl apply -f deploy/kubernetes/aibrix/aigw-resources/base-model.yaml


how to make it work with existing aibrix samples or demos in quick start?

This is easy to do, just change the model configuration in semantic router, I will make it work in a follow-up

Jeffwan · 2025-11-08T22:36:30Z

docs/source/features/semantic-routing.rst

+Step 3: Create Gateway API Resources
+========================================
+
+Create the necessary Gateway API resources for the envoy gateway:


this is per deployment based resources or global configuration?

This is per-envoy configuration, but in AIBrix this is global. (An extproc injected before gateway-plugin in envoy gateway)

Jeffwan · 2025-11-08T22:37:53Z

I think the initial version overall looks good to me. I would like to check whether that's helpful to make it part of the release manifest. @googs1025 @Xunzhuo @nurali-techie any ideas?

googs1025 · 2025-11-09T02:02:03Z

I think the initial version overall looks good to me. I would like to check whether that's helpful to make it part of the release manifest. @googs1025 @Xunzhuo @nurali-techie any ideas?

Integration would be ideal and great! Currently, semantic-router release doesn't seem to have been officially released yet, so we can put integration into the Helm chart or manifest to the next step (perhaps until semantic-router is officially released). For now, we can let users try it out through docs. 😄

nurali-techie · 2025-11-09T02:38:28Z

I think the initial version overall looks good to me. I would like to check whether that's helpful to make it part of the release manifest. @googs1025 @Xunzhuo @nurali-techie any ideas?

@Jeffwan If this PR is part of v0.5.0 release then it's better to mention about semantic-router in release manifest. If need be then we can add "experimental" word, to indicate user that you can "try out" semantic-router with this release and expect more to come in sub-sequent release.

Jeffwan · 2025-11-11T00:04:47Z

sounds good. v0.5.0 has been cut. We can put it in v0.6.0. Consider @googs1025 has another PR on ai gateway integration. I think it's time to revisit the gateway arch now. I will follow up in slack channel

nurali-techie · 2025-11-12T13:45:49Z

I think it's time to revisit the gateway arch now. I will follow up in slack channel

@Jeffwan true 👍 I am interested to get involved here. Plz tag me in the future discussion whenever possible 🙏

Jeffwan · 2025-11-14T22:38:18Z

@Xunzhuo did you get a chance to address the comments? I think there're few issues need to be fixed

Jeffwan · 2025-11-27T07:14:37Z

@Xunzhuo any updates? /cc @googs1025

gemini-code-assist bot reviewed Nov 6, 2025

View reviewed changes

Xunzhuo force-pushed the integrate-vllm-aibrix branch from c846888 to 3eb3cfa Compare November 6, 2025 07:07

Xunzhuo requested a review from googs1025 November 6, 2025 07:10

[Integration]: Add Intelligent Semantic Routing with vLLM-SR

8e52c35

Signed-off-by: bitliu <bitliu@tencent.com>

Xunzhuo force-pushed the integrate-vllm-aibrix branch from 3eb3cfa to 8e52c35 Compare November 6, 2025 07:14

Xunzhuo requested a review from Jeffwan November 6, 2025 07:31

googs1025 reviewed Nov 6, 2025

View reviewed changes

nurali-techie reviewed Nov 7, 2025

View reviewed changes

Jeffwan reviewed Nov 8, 2025

View reviewed changes

Jeffwan mentioned this pull request Nov 14, 2025

[Docs]: feature: envoy ai gateway integration #1733

Merged

	git clone git@github.com:vllm-project/semantic-router.git && cd semantic-router
	git clone https://github.com/vllm-project/semantic-router.git && cd semantic-router

[Integration]: Add Intelligent Semantic Routing with vLLM-SR #1735

Are you sure you want to change the base?

[Integration]: Add Intelligent Semantic Routing with vLLM-SR #1735

Uh oh!

Conversation

Xunzhuo commented Nov 6, 2025

Pull Request Description

Pull Request Title Format

Submission Checklist

Uh oh!

gemini-code-assist bot commented Nov 6, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

googs1025 Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

googs1025 Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jeffwan Nov 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nurali-techie commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Xunzhuo commented Nov 7, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jeffwan commented Nov 8, 2025

Uh oh!

googs1025 commented Nov 9, 2025

Uh oh!

nurali-techie commented Nov 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Jeffwan commented Nov 11, 2025

Uh oh!

nurali-techie commented Nov 12, 2025

Uh oh!

Jeffwan commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

googs1025 Nov 6, 2025 •

edited

Loading

googs1025 Nov 6, 2025 •

edited

Loading

Jeffwan Nov 8, 2025 •

edited

Loading

nurali-techie commented Nov 7, 2025 •

edited

Loading

nurali-techie commented Nov 9, 2025 •

edited

Loading

Jeffwan commented Nov 14, 2025 •

edited

Loading