Skip to content

Conversation

@steven10a
Copy link
Collaborator

Extending llm_base.py to always use the conversation_history from ctx to provide the conversation history to all LLM based guardrails. Previously we had the Jailbreak guardrail as a custom multi-turn guardrail.

  • Conversation history will allow for more robust detection
  • User can set max_turns in the config to control how much of the conversation is passed to the guardrail, balancing token cost with context
  • Updated documentation
  • Updated and added tests

Copilot AI review requested due to automatic review settings December 12, 2025 22:30
@steven10a steven10a changed the title Dev/steven/multi turn Support multi-turn for all LLM based guardrails Dec 12, 2025
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends multi-turn conversation support from the Jailbreak guardrail to all LLM-based guardrails. It centralizes conversation history handling in llm-base.ts, introduces configurable max_turns and include_reasoning parameters, and refactors existing guardrails to use this unified infrastructure. The changes enable more robust detection across all LLM guardrails while giving users control over token costs through configurable reasoning fields and conversation history limits.

Key Changes:

  • Unified multi-turn support infrastructure in llm-base.ts with automatic conversation history extraction and configurable turn limits
  • Added include_reasoning config to control whether detailed explanation fields are included in outputs (defaults to false for production cost savings)
  • Added max_turns config to limit conversation history size (defaults to 10 turns)

Reviewed changes

Copilot reviewed 21 out of 22 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/checks/llm-base.ts Core infrastructure: added extractConversationHistory, buildAnalysisPayload, LLMReasoningOutput schema, include_reasoning and max_turns config fields; updated runLLM and createLLMCheckFn to support multi-turn analysis
src/checks/jailbreak.ts Refactored to use createLLMCheckFn instead of custom implementation, removing duplicate conversation history logic
src/checks/prompt_injection_detection.ts Added conditional reasoning field inclusion based on include_reasoning; integrated max_turns parameter for conversation slicing
src/checks/hallucination-detection.ts Added conditional reasoning field inclusion with explicit field listing pattern
src/checks/user-defined-llm.ts Updated to use automatic reasoning handling via createLLMCheckFn
src/checks/topical-alignment.ts Updated to use automatic reasoning handling via createLLMCheckFn
src/checks/nsfw.ts Updated to use automatic reasoning handling via createLLMCheckFn
src/checks/moderation.ts Minor cleanup: removed checked_text field from one error path
src/__tests__/unit/llm-base.test.ts Comprehensive tests for new helper functions, reasoning control, and multi-turn behavior
src/__tests__/unit/prompt_injection_detection.test.ts Tests for include_reasoning and max_turns configuration options
src/__tests__/unit/checks/jailbreak.test.ts Updated tests to reflect refactored implementation using createLLMCheckFn
src/__tests__/unit/checks/hallucination-detection.test.ts New comprehensive test file for reasoning control and error handling
src/__tests__/unit/checks/user-defined-llm.test.ts Updated test to verify include_reasoning functionality
examples/basic/hello_world.ts Demonstrates include_reasoning: true in example configuration
docs/ref/checks/*.md Comprehensive documentation updates across all LLM guardrail docs explaining new include_reasoning and max_turns parameters with consistent performance claims
.gitignore Added PR_READINESS_CHECKLIST.md to ignored files

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@steven10a
Copy link
Collaborator Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@steven10a steven10a requested a review from Copilot December 12, 2025 22:55
@steven10a
Copy link
Collaborator Author

@codex review

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 22 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. More of your lovely PRs please.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants