-
Notifications
You must be signed in to change notification settings - Fork 22
Adding multi-turn support to all LLM based guardrails #65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds multi-turn conversation support to all LLM-based guardrails by extending the llm_base.py infrastructure. Previously, only the Jailbreak guardrail supported conversation history analysis; now all LLM-based guardrails can leverage conversation context for more robust detection across multiple turns.
Key changes:
- Extended
LLMConfigwith amax_turnsparameter (default: 10) to control conversation history length - Modified
run_llm()to accept conversation history and intelligently switch between single-turn and multi-turn formats - Refactored the Jailbreak guardrail to use the common
create_llm_check_fnfactory instead of custom implementation - Updated Prompt Injection Detection to respect the
max_turnsconfiguration
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/guardrails/checks/text/llm_base.py | Added multi-turn support infrastructure: max_turns field in LLMConfig, _build_analysis_payload() helper, and conversation history handling in run_llm() and create_llm_check_fn() |
| src/guardrails/checks/text/jailbreak.py | Refactored to use create_llm_check_fn() factory, removing custom payload building and execution logic (~80 lines of code removed) |
| src/guardrails/checks/text/prompt_injection_detection.py | Updated _extract_user_intent_from_messages() and _slice_conversation_since_latest_user() to accept and respect max_turns parameter |
| tests/unit/checks/test_llm_base.py | Added comprehensive tests for multi-turn functionality, conversation history extraction, and payload building |
| tests/unit/checks/test_jailbreak.py | Updated tests to work with refactored implementation using the common factory pattern |
| tests/unit/checks/test_prompt_injection_detection.py | Added tests verifying max_turns configuration is properly applied |
| docs/ref/checks/llm_base.md | Updated documentation to describe multi-turn support, max_turns parameter, and usage patterns |
| docs/ref/checks/jailbreak.md | Updated to reflect new multi-turn capabilities and simplified configuration |
| docs/ref/checks/nsfw.md | Added max_turns parameter documentation and token usage example |
| docs/ref/checks/off_topic_prompts.md | Added max_turns parameter documentation and token usage example |
| docs/ref/checks/custom_prompt_check.md | Added max_turns parameter documentation and token usage example |
| docs/ref/checks/prompt_injection_detection.md | Added max_turns parameter documentation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@codex review |
|
Codex Review: Didn't find any major issues. 🎉 ℹ️ About Codex in GitHubCodex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback". |
Extending
llm_base.pyto always use theconversation_historyfromctxto provide the conversation history to all LLM based guardrails. Previously we had the Jailbreak guardrail as a custom multi-turn guardrail.max_turnsin the config to control how much of the conversation is passed to the guardrail, balancing token cost with context