-
Notifications
You must be signed in to change notification settings - Fork 497
[Integration]: Add Intelligent Semantic Routing with vLLM-SR #1735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @Xunzhuo, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces extensive documentation for integrating Intelligent Semantic Routing (vLLM-SR) with AIBrix. The new guide explains how vLLM-SR enhances LLM inference by intelligently routing requests to the most suitable backend models, improving scalability, and optimizing costs. It provides clear instructions for deploying and testing this integration, ensuring users can easily leverage its advanced capabilities. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds new documentation for the Intelligent Semantic Routing feature. The documentation is well-structured and provides a good overview and a step-by-step guide. I've identified a few areas for improvement to make the guide more user-friendly and complete, such as using HTTPS for git cloning, clarifying prerequisites, and making the example response more relevant. The changes to index.rst are correct.
| Prerequisites | ||
| ======================================== | ||
|
|
||
| Before starting, ensure you have the installed AIBrix components. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 'Prerequisites' section is too vague. It should explicitly list all requirements. For instance:
- A running Kubernetes cluster. If
kindis recommended (as suggested by the cleanup step), you should provide the command to create it here (e.g.,kind create cluster --name semantic-router-cluster). kubectlinstalled and configured.- The AIBrix components installed.
Without the cluster creation step, the cleanup command kind delete cluster --name semantic-router-cluster on line 156 will fail for users who followed this guide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prerequisites
==========
Let's get the === aligned with the title
| .. code-block:: bash | ||
|
|
||
| # Clone the semantic router repository | ||
| git clone git@github.com:vllm-project/semantic-router.git && cd semantic-router |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using git@github.com:... for cloning requires users to have SSH keys configured with GitHub. It's more user-friendly to provide the HTTPS URL, which works for everyone without any prior setup.
| git clone git@github.com:vllm-project/semantic-router.git && cd semantic-router | |
| git clone https://github.com/vllm-project/semantic-router.git && cd semantic-router |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for public facing doc, let's change to https
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i will update this soon, we provided the helm chart already
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great.
| x-vsr-injected-system-prompt: true | ||
| transfer-encoding: chunked | ||
|
|
||
| {"id":"chatcmpl-f390a0c6-b38f-4a73-b019-9374a3c5d69b","created":1762411088,"model":"vllm-llama3-8b-instruct","usage":{"prompt_tokens":42,"completion_tokens":48,"total_tokens":90},"object":"chat.completion","do_remote_decode":false,"do_remote_prefill":false,"remote_block_ids":null,"remote_engine_id":"","remote_host":"","remote_port":0,"choices":[{"index":0,"finish_reason":"stop","message":{"role":"assistant","content":"I am your AI assistant, how can I help you today? To be or not to be that is the question. Alas, poor Yorick! I knew him, Horatio: A fellow of infinite jest Testing, testing 1,2,3"}}]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example response content seems to be a generic placeholder and doesn't relate to the math question asked in the curl request. While the headers (x-vsr-selected-category: math) correctly show the routing worked, a response that actually answers the question (or at least looks like a plausible answer from an LLM) would make the example more convincing and less confusing for the user.
c846888 to
3eb3cfa
Compare
Signed-off-by: bitliu <bitliu@tencent.com>
3eb3cfa to
8e52c35
Compare
| git clone git@github.com:vllm-project/semantic-router.git && cd semantic-router | ||
|
|
||
| # Deploy semantic router using Kustomize | ||
| kubectl apply -k deploy/kubernetes/aibrix/semantic-router |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to add file in deploy/kubernetes/aibrix/semantic-router ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or maybe we can just use link in project semantic-router 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to know can it be part of the aibrix kustomization manifest or helm chart?
for future release, aibrix may be able to install the semantic router with main manifests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, make sense to me.
| About vLLM AIBrix | ||
| ======================================== | ||
|
|
||
| `vLLM AIBrix <https://github.com/vllm-project/aibrix>`_ is an open-source initiative designed to provide essential building blocks to construct scalable GenAI inference infrastructure. AIBrix delivers a cloud-native solution optimized for deploying, managing, and scaling large language model (LLM) inference, tailored specifically to enterprise needs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Don't see a need of About vLLM AIBrix within aibrix project.
| kubectl delete -k deploy/kubernetes/aibrix/semantic-router | ||
|
|
||
| # Delete kind cluster | ||
| kind delete cluster --name semantic-router-cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems copy/paste mistake. We never instructed to create kind cluster semantic-router-cluster. So having delete command is not fitting well here.
|
Tested OK ✔️ Hi, I have created aibrix cluster following guide -- https://github.com/vllm-project/aibrix/blob/main/development/vllm/README.md ; after that I followed this PR guide to add semantic-router. It's working as expected. Thanks @Xunzhuo 👍 |
|
thanks for testing it @nurali-techie |
| .. code-block:: bash | ||
|
|
||
| # Deploy demo LLM | ||
| kubectl apply -f deploy/kubernetes/aibrix/aigw-resources/base-model.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how to make it work with existing aibrix samples or demos in quick start?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is easy to do, just change the model configuration in semantic router, I will make it work in a follow-up
| Step 3: Create Gateway API Resources | ||
| ======================================== | ||
|
|
||
| Create the necessary Gateway API resources for the envoy gateway: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is per deployment based resources or global configuration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is per-envoy configuration, but in AIBrix this is global. (An extproc injected before gateway-plugin in envoy gateway)
|
I think the initial version overall looks good to me. I would like to check whether that's helpful to make it part of the release manifest. @googs1025 @Xunzhuo @nurali-techie any ideas? |
Integration would be ideal and great! Currently, semantic-router release doesn't seem to have been officially released yet, so we can put integration into the Helm chart or manifest to the next step (perhaps until semantic-router is officially released). For now, we can let users try it out through docs. 😄 |
@Jeffwan If this PR is part of v0.5.0 release then it's better to mention about semantic-router in release manifest. If need be then we can add "experimental" word, to indicate user that you can "try out" semantic-router with this release and expect more to come in sub-sequent release. |
|
sounds good. v0.5.0 has been cut. We can put it in v0.6.0. Consider @googs1025 has another PR on ai gateway integration. I think it's time to revisit the gateway arch now. I will follow up in slack channel |
@Jeffwan true 👍 I am interested to get involved here. Plz tag me in the future discussion whenever possible 🙏 |
|
@Xunzhuo did you get a chance to address the comments? I think there're few issues need to be fixed |
|
@Xunzhuo any updates? /cc @googs1025 |
Pull Request Description
[Integration]: Add Intelligent Semantic Routing with vLLM-SR
Contribution Guidelines (Expand for Details)
We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:
Pull Request Title Format
Your PR title should start with one of these prefixes to indicate the nature of the change:
[Bug]: Corrections to existing functionality[CI]: Changes to build process or CI pipeline[Docs]: Updates or additions to documentation[API]: Modifications to aibrix's API or interface[CLI]: Changes or additions to the Command Line Interface[Misc]: For changes not covered above (use sparingly)Note: For changes spanning multiple categories, use multiple prefixes in order of importance.
Submission Checklist
By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.