Skip to content

Conversation

@steven10a
Copy link
Collaborator

Allow users to toggle reason on and off for the LLM based guardrails via the config file

    • include_reasoning (optional): Whether to include reasoning/explanation fields in the guardrail output (default: false)
      • When false: The LLM only generates the essential fields (flagged and confidence), reducing token generation costs
      • When true: Additionally, returns detailed reasoning for its decisions
      • Use Case: Keep disabled for production to minimize costs; enable for development and debugging

Updated docs and tests

Copilot AI review requested due to automatic review settings December 10, 2025 20:31
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds the ability to parameterize whether LLM-based guardrails return reasoning/explanation fields, allowing users to reduce token generation costs in production while enabling detailed output for development and debugging.

Key Changes

  • Added include_reasoning field to LLMConfig (default: false) to control whether reasoning fields are included
  • Created LLMReasoningOutput class extending LLMOutput with a reason field
  • Updated guardrail implementations to conditionally use extended or base output models based on configuration
  • Updated tests to validate behavior with reasoning enabled/disabled
  • Updated documentation to reflect the new optional parameter

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/guardrails/checks/text/llm_base.py Added include_reasoning config field, LLMReasoningOutput class, and logic to select output model
src/guardrails/checks/text/jailbreak.py Removed custom JailbreakLLMOutput class, now uses base/reasoning models conditionally
src/guardrails/checks/text/hallucination_detection.py Updated to conditionally use HallucinationDetectionOutput based on reasoning config
src/guardrails/checks/text/prompt_injection_detection.py Updated to conditionally use PromptInjectionDetectionOutput based on reasoning config
src/guardrails/checks/text/nsfw.py Removed explicit output_model parameter, now uses default reasoning support
src/guardrails/checks/text/off_topic_prompts.py Removed explicit output_model parameter, now uses default reasoning support
src/guardrails/checks/text/user_defined_llm.py Removed explicit output_model parameter, now uses default reasoning support
src/guardrails/evals/core/benchmark_reporter.py Fixed file path sanitization for model names containing "/"
tests/unit/checks/test_llm_base.py Added tests for reasoning toggle behavior
tests/unit/checks/test_jailbreak.py Updated tests to not expect reason field by default, added reasoning toggle tests
docs/ref/checks/*.md Updated documentation to describe the include_reasoning parameter

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@steven10a steven10a requested a review from Copilot December 10, 2025 20:53
@steven10a
Copy link
Collaborator Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@steven10a steven10a requested a review from Copilot December 10, 2025 21:15
@steven10a
Copy link
Collaborator Author

@codex review

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@steven10a steven10a requested a review from Copilot December 10, 2025 21:39
@steven10a
Copy link
Collaborator Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

return (
output_model(
flagged=False,
confidence=0.0,
),

P2 Badge Handle empty responses with reasoning output

When include_reasoning=True, guardrails using create_llm_check_fn pass LLMReasoningOutput (which requires a reason) into run_llm. In the empty-response fallback here, the code instantiates output_model(flagged=False, confidence=0.0) without supplying the required reasoning field, so a blank/filtered completion will raise a validation error and fall into the generic error path instead of returning a benign unflagged result (the previous behaviour). Provide a default reason or skip validation in this branch so reasoning-enabled guardrails gracefully handle empty outputs.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@steven10a
Copy link
Collaborator Author

@codex review

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. Hooray!

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link
Collaborator

@gabor-openai gabor-openai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thank you!

@gabor-openai gabor-openai merged commit 92246d9 into main Dec 12, 2025
3 checks passed
@gabor-openai gabor-openai deleted the dev/steven/reasoning_output branch December 12, 2025 23:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants