Skip to content

fix(ralph-loop): prevent bypass via minimum work verification before accepting completion promise (#1921)#1923

Open
oinani0721 wants to merge 1 commit intocode-yeongyu:devfrom
oinani0721:fix/ralph-loop-bypass-prevention
Open

fix(ralph-loop): prevent bypass via minimum work verification before accepting completion promise (#1921)#1923
oinani0721 wants to merge 1 commit intocode-yeongyu:devfrom
oinani0721:fix/ralph-loop-bypass-prevention

Conversation

@oinani0721
Copy link

@oinani0721 oinani0721 commented Feb 17, 2026

Summary

Fixes #1921.

  • Add minimum tool-call verification before accepting completion promise — rejects zero-work bypass attempts
  • Strengthen the RALPH_LOOP_TEMPLATE prompt with explicit anti-bypass rules
  • Fail-open design: verification API errors don't block legitimate completions

The Problem

AI agents can trivially bypass the Ralph Loop by immediately outputting <promise>DONE</promise> without performing any actual work (no tool calls, no file operations, nothing). The loop detects the promise and exits with "Task completed after 1 iteration(s)" — but zero work was done.

This is the inverse of #1233 (promise not detected). Here, the promise IS detected correctly — but it shouldn't be accepted because no work was performed.

The Solution

New: work-verifier.ts

countToolCallsInCurrentIteration() inspects the session messages for tool_use and tool_result parts in assistant messages since the last user message. Returns:

  • N > 0: Number of tool calls found (work verified)
  • 0: No tool calls found (bypass detected)
  • -1: API error (fail-open — don't block legitimate completions)

Modified: ralph-loop-event-handler.ts

After completion promise is detected via session messages API:

  1. Call countToolCallsInCurrentIteration() to verify work
  2. If toolCallCount === 0: reject the promise, increment iteration, inject a "WORK REQUIRED" continuation prompt
  3. If toolCallCount > 0 or -1 (error): accept the promise normally

Transcript-detected completions skip the work check — transcript entries already contain tool_result data proving work was done.

Modified: continuation-prompt-builder.ts

New buildNoWorkRejectionPrompt() that explicitly tells the agent its promise was rejected because no tool calls were detected, and instructs it to actually work before declaring completion.

Modified: ralph-loop.ts template

Added three anti-bypass rules to the prompt template:

- **You MUST make at least one tool call before outputting the completion promise**
- The loop will REJECT your completion promise if no tool calls (file reads, edits, commands) were detected
- An immediate promise without work will be treated as a bypass attempt and rejected

Design Decisions

Decision Rationale
Work check only for API-detected completions Transcript completions already contain tool_result entries
Fail-open on API error (-1) Don't block legitimate completions due to transient API failures
Count since last user message Current iteration only — previous iterations' work doesn't count
Both tool_use and tool_result counted Different API response shapes may include either

Tests

3 new test cases added (43 total, 0 failures):

Test What it verifies
should reject completion promise when no tool calls were made (bypass prevention) Promise with text-only response → rejected, continuation injected with "WORK REQUIRED"
should accept completion promise when tool calls were made Promise with tool_use parts → accepted normally
should fail open when work verification API errors API error during verification → promise accepted (fail-open)
bun test v1.3.9
 43 pass
 0 fail
 126 expect() calls

Files Changed

File Change
src/hooks/ralph-loop/work-verifier.ts New — tool call counting logic
src/hooks/ralph-loop/ralph-loop-event-handler.ts Modified — integrate work verification
src/hooks/ralph-loop/continuation-prompt-builder.ts Modified — add no-work rejection prompt
src/hooks/ralph-loop/index.ts Modified — re-export work-verifier
src/hooks/ralph-loop/index.test.ts Modified — 3 new test cases
src/features/builtin-commands/templates/ralph-loop.ts Modified — anti-bypass rules in template

Summary by cubic

Prevents Ralph Loop bypasses by requiring at least one tool call before accepting a completion promise. Closes #1921 by rejecting zero‑work “DONE” outputs.

  • Bug Fixes

    • Verify minimum work on API-detected completions; reject if no tool_use/tool_result since the last user message.
    • Inject a “WORK REQUIRED” continuation and increment the iteration on rejection; stop at max iterations with a warning.
    • Fail-open on verification API errors to avoid blocking legitimate completions.
    • Skip verification for transcript-based completions (work already proven).
  • New Features

    • Added work-verifier.ts to count tool calls in the current iteration.
    • Added buildNoWorkRejectionPrompt to guide the agent after a rejected promise.
    • Strengthened RALPH_LOOP_TEMPLATE with explicit anti-bypass rules.

Written for commit 29203b4. Summary will update on new commits.

…accepting completion promise (code-yeongyu#1921)

Add work-verifier.ts that counts tool_use/tool_result blocks in the
current iteration's assistant messages. When the completion promise is
detected via the session messages API but zero tool calls were found,
the promise is rejected and a 'no work' continuation prompt is injected,
forcing the agent to actually perform work before completing.

Key changes:
- New work-verifier.ts with countToolCallsInCurrentIteration()
- Event handler integrates work verification before accepting promise
- New buildNoWorkRejectionPrompt() in continuation-prompt-builder
- Strengthened RALPH_LOOP_TEMPLATE with anti-bypass rules
- Fail-open behavior: verification API errors don't block completions
- Transcript-detected completions bypass work check (already proven work)
- 3 new test cases: bypass rejection, acceptance with tools, fail-open

Closes code-yeongyu#1921
@github-actions
Copy link
Contributor

Thank you for your contribution! Before we can merge this PR, we need you to sign our Contributor License Agreement (CLA).

To sign the CLA, please comment on this PR with:

I have read the CLA Document and I hereby sign the CLA

This is a one-time requirement. Once signed, all your future contributions will be automatically accepted.


I have read the CLA Document and I hereby sign the CLA


onani0721 seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot.

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 6 files

Confidence score: 2/5

  • There is a high-severity risk of prompt corruption in src/hooks/ralph-loop/continuation-prompt-builder.ts due to unsafe string replacement when state.prompt contains special characters.
  • This is a concrete user-impacting bug (corrupted prompts) and is the main reason for the lower score despite being a localized change.
  • Pay close attention to src/hooks/ralph-loop/continuation-prompt-builder.ts - replace the string substitution with a replacer function to treat the prompt as literal text.
Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/hooks/ralph-loop/continuation-prompt-builder.ts">

<violation number="1" location="src/hooks/ralph-loop/continuation-prompt-builder.ts:50">
P1: Unsafe string replacement allows prompt corruption via special characters. Use a replacer function instead: `.replace("{{PROMPT}}", () => state.prompt)` to treat the replacement as literal text.</violation>
</file>

Since this is your first cubic review, here's how it works:

  • cubic automatically reviews your code and comments on bugs and improvements
  • Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
  • Add one-off context when rerunning by tagging @cubic-dev-ai with guidance or docs links (including llms.txt)
  • Ask questions if you need clarification on any suggestion

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

)
.replace("{{MAX}}", String(state.max_iterations))
.replace("{{PROMISE}}", state.completion_promise)
.replace("{{PROMPT}}", state.prompt)
Copy link

@cubic-dev-ai cubic-dev-ai bot Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Unsafe string replacement allows prompt corruption via special characters. Use a replacer function instead: .replace("{{PROMPT}}", () => state.prompt) to treat the replacement as literal text.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/hooks/ralph-loop/continuation-prompt-builder.ts, line 50:

<comment>Unsafe string replacement allows prompt corruption via special characters. Use a replacer function instead: `.replace("{{PROMPT}}", () => state.prompt)` to treat the replacement as literal text.</comment>

<file context>
@@ -25,3 +39,15 @@ export function buildContinuationPrompt(state: RalphLoopState): string {
+	)
+		.replace("{{MAX}}", String(state.max_iterations))
+		.replace("{{PROMISE}}", state.completion_promise)
+		.replace("{{PROMPT}}", state.prompt)
+
+	return state.ultrawork ? `ultrawork ${rejectionPrompt}` : rejectionPrompt
</file context>
Suggested change
.replace("{{PROMPT}}", state.prompt)
.replace("{{PROMPT}}", () => state.prompt)
Fix with Cubic

@Wongbuer
Copy link

I have read the CLA Document and I hereby sign the CLA

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Agent can bypass Ralph/ULW Loop by immediately outputting completion promise without doing work

2 participants

Comments