Skip to content

Conversation

@Xunzhuo
Copy link
Member

@Xunzhuo Xunzhuo commented Nov 6, 2025

Pull Request Description

[Integration]: Add Intelligent Semantic Routing with vLLM-SR


Contribution Guidelines (Expand for Details)

We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:

Pull Request Title Format

Your PR title should start with one of these prefixes to indicate the nature of the change:

  • [Bug]: Corrections to existing functionality
  • [CI]: Changes to build process or CI pipeline
  • [Docs]: Updates or additions to documentation
  • [API]: Modifications to aibrix's API or interface
  • [CLI]: Changes or additions to the Command Line Interface
  • [Misc]: For changes not covered above (use sparingly)

Note: For changes spanning multiple categories, use multiple prefixes in order of importance.

Submission Checklist

  • PR title includes appropriate prefix(es)
  • Changes are clearly explained in the PR description
  • New and existing tests pass successfully
  • Code adheres to project style and best practices
  • Documentation updated to reflect changes (if applicable)
  • Thorough testing completed, no regressions introduced

By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Xunzhuo, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces extensive documentation for integrating Intelligent Semantic Routing (vLLM-SR) with AIBrix. The new guide explains how vLLM-SR enhances LLM inference by intelligently routing requests to the most suitable backend models, improving scalability, and optimizing costs. It provides clear instructions for deploying and testing this integration, ensuring users can easily leverage its advanced capabilities.

Highlights

  • New Feature Documentation: Introduced comprehensive documentation for integrating Intelligent Semantic Routing with vLLM-SR into the AIBrix ecosystem.
  • Deployment and Testing Guide: Added detailed, step-by-step instructions for deploying the vLLM Semantic Router, a demo LLM, and necessary Gateway API resources, along with methods for testing the deployment locally.
  • Semantic Router Capabilities: Outlined the key features of vLLM Semantic Router, including intelligent routing, AI-powered security, semantic caching, auto-reasoning, real-time analytics, and a scalable architecture.
  • Integration Benefits: Detailed the advantages of integrating vLLM Semantic Router with AIBrix, such as intelligent request routing, enhanced scalability, cost optimization, production-ready infrastructure, and simplified operations.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds new documentation for the Intelligent Semantic Routing feature. The documentation is well-structured and provides a good overview and a step-by-step guide. I've identified a few areas for improvement to make the guide more user-friendly and complete, such as using HTTPS for git cloning, clarifying prerequisites, and making the example response more relevant. The changes to index.rst are correct.

Prerequisites
========================================

Before starting, ensure you have the installed AIBrix components.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The 'Prerequisites' section is too vague. It should explicitly list all requirements. For instance:

  • A running Kubernetes cluster. If kind is recommended (as suggested by the cleanup step), you should provide the command to create it here (e.g., kind create cluster --name semantic-router-cluster).
  • kubectl installed and configured.
  • The AIBrix components installed.

Without the cluster creation step, the cleanup command kind delete cluster --name semantic-router-cluster on line 156 will fail for users who followed this guide.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prerequisites
==========

Let's get the === aligned with the title

.. code-block:: bash

# Clone the semantic router repository
git clone git@github.com:vllm-project/semantic-router.git && cd semantic-router
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using git@github.com:... for cloning requires users to have SSH keys configured with GitHub. It's more user-friendly to provide the HTTPS URL, which works for everyone without any prior setup.

Suggested change
git clone git@github.com:vllm-project/semantic-router.git && cd semantic-router
git clone https://github.com/vllm-project/semantic-router.git && cd semantic-router

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for public facing doc, let's change to https

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i will update this soon, we provided the helm chart already

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great.

x-vsr-injected-system-prompt: true
transfer-encoding: chunked

{"id":"chatcmpl-f390a0c6-b38f-4a73-b019-9374a3c5d69b","created":1762411088,"model":"vllm-llama3-8b-instruct","usage":{"prompt_tokens":42,"completion_tokens":48,"total_tokens":90},"object":"chat.completion","do_remote_decode":false,"do_remote_prefill":false,"remote_block_ids":null,"remote_engine_id":"","remote_host":"","remote_port":0,"choices":[{"index":0,"finish_reason":"stop","message":{"role":"assistant","content":"I am your AI assistant, how can I help you today? To be or not to be that is the question. Alas, poor Yorick! I knew him, Horatio: A fellow of infinite jest Testing, testing 1,2,3"}}]}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The example response content seems to be a generic placeholder and doesn't relate to the math question asked in the curl request. While the headers (x-vsr-selected-category: math) correctly show the routing worked, a response that actually answers the question (or at least looks like a plausible answer from an LLM) would make the example more convincing and less confusing for the user.

@Xunzhuo Xunzhuo force-pushed the integrate-vllm-aibrix branch from c846888 to 3eb3cfa Compare November 6, 2025 07:07
@Xunzhuo Xunzhuo requested a review from googs1025 November 6, 2025 07:10
Signed-off-by: bitliu <bitliu@tencent.com>
@Xunzhuo Xunzhuo force-pushed the integrate-vllm-aibrix branch from 3eb3cfa to 8e52c35 Compare November 6, 2025 07:14
@Xunzhuo Xunzhuo requested a review from Jeffwan November 6, 2025 07:31
git clone git@github.com:vllm-project/semantic-router.git && cd semantic-router

# Deploy semantic router using Kustomize
kubectl apply -k deploy/kubernetes/aibrix/semantic-router
Copy link
Collaborator

@googs1025 googs1025 Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to add file in deploy/kubernetes/aibrix/semantic-router ?

Copy link
Collaborator

@googs1025 googs1025 Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or maybe we can just use link in project semantic-router 🤔

Copy link
Collaborator

@Jeffwan Jeffwan Nov 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to know can it be part of the aibrix kustomization manifest or helm chart?
for future release, aibrix may be able to install the semantic router with main manifests

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, make sense to me.

About vLLM AIBrix
========================================

`vLLM AIBrix <https://github.com/vllm-project/aibrix>`_ is an open-source initiative designed to provide essential building blocks to construct scalable GenAI inference infrastructure. AIBrix delivers a cloud-native solution optimized for deploying, managing, and scaling large language model (LLM) inference, tailored specifically to enterprise needs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Don't see a need of About vLLM AIBrix within aibrix project.

kubectl delete -k deploy/kubernetes/aibrix/semantic-router

# Delete kind cluster
kind delete cluster --name semantic-router-cluster
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems copy/paste mistake. We never instructed to create kind cluster semantic-router-cluster. So having delete command is not fitting well here.

@nurali-techie
Copy link
Contributor

nurali-techie commented Nov 7, 2025

Tested OK ✔️

Hi, I have created aibrix cluster following guide -- https://github.com/vllm-project/aibrix/blob/main/development/vllm/README.md ; after that I followed this PR guide to add semantic-router. It's working as expected.

Thanks @Xunzhuo 👍

@Xunzhuo
Copy link
Member Author

Xunzhuo commented Nov 7, 2025

thanks for testing it @nurali-techie

.. code-block:: bash

# Deploy demo LLM
kubectl apply -f deploy/kubernetes/aibrix/aigw-resources/base-model.yaml
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how to make it work with existing aibrix samples or demos in quick start?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is easy to do, just change the model configuration in semantic router, I will make it work in a follow-up

Step 3: Create Gateway API Resources
========================================

Create the necessary Gateway API resources for the envoy gateway:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is per deployment based resources or global configuration?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is per-envoy configuration, but in AIBrix this is global. (An extproc injected before gateway-plugin in envoy gateway)

@Jeffwan
Copy link
Collaborator

Jeffwan commented Nov 8, 2025

I think the initial version overall looks good to me. I would like to check whether that's helpful to make it part of the release manifest. @googs1025 @Xunzhuo @nurali-techie any ideas?

@googs1025
Copy link
Collaborator

I think the initial version overall looks good to me. I would like to check whether that's helpful to make it part of the release manifest. @googs1025 @Xunzhuo @nurali-techie any ideas?

Integration would be ideal and great! Currently, semantic-router release doesn't seem to have been officially released yet, so we can put integration into the Helm chart or manifest to the next step (perhaps until semantic-router is officially released). For now, we can let users try it out through docs. 😄

@nurali-techie
Copy link
Contributor

nurali-techie commented Nov 9, 2025

I think the initial version overall looks good to me. I would like to check whether that's helpful to make it part of the release manifest. @googs1025 @Xunzhuo @nurali-techie any ideas?

@Jeffwan If this PR is part of v0.5.0 release then it's better to mention about semantic-router in release manifest. If need be then we can add "experimental" word, to indicate user that you can "try out" semantic-router with this release and expect more to come in sub-sequent release.

@Jeffwan
Copy link
Collaborator

Jeffwan commented Nov 11, 2025

sounds good. v0.5.0 has been cut. We can put it in v0.6.0. Consider @googs1025 has another PR on ai gateway integration. I think it's time to revisit the gateway arch now. I will follow up in slack channel

@nurali-techie
Copy link
Contributor

I think it's time to revisit the gateway arch now. I will follow up in slack channel

@Jeffwan true 👍 I am interested to get involved here. Plz tag me in the future discussion whenever possible 🙏

@Jeffwan
Copy link
Collaborator

Jeffwan commented Nov 14, 2025

@Xunzhuo did you get a chance to address the comments? I think there're few issues need to be fixed

@Jeffwan
Copy link
Collaborator

Jeffwan commented Nov 27, 2025

@Xunzhuo any updates? /cc @googs1025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants