fix(ralph-loop): prevent bypass via minimum work verification before accepting completion promise (#1921)#1923
Conversation
…accepting completion promise (code-yeongyu#1921) Add work-verifier.ts that counts tool_use/tool_result blocks in the current iteration's assistant messages. When the completion promise is detected via the session messages API but zero tool calls were found, the promise is rejected and a 'no work' continuation prompt is injected, forcing the agent to actually perform work before completing. Key changes: - New work-verifier.ts with countToolCallsInCurrentIteration() - Event handler integrates work verification before accepting promise - New buildNoWorkRejectionPrompt() in continuation-prompt-builder - Strengthened RALPH_LOOP_TEMPLATE with anti-bypass rules - Fail-open behavior: verification API errors don't block completions - Transcript-detected completions bypass work check (already proven work) - 3 new test cases: bypass rejection, acceptance with tools, fail-open Closes code-yeongyu#1921
|
Thank you for your contribution! Before we can merge this PR, we need you to sign our Contributor License Agreement (CLA). To sign the CLA, please comment on this PR with: This is a one-time requirement. Once signed, all your future contributions will be automatically accepted. I have read the CLA Document and I hereby sign the CLA onani0721 seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. |
There was a problem hiding this comment.
1 issue found across 6 files
Confidence score: 2/5
- There is a high-severity risk of prompt corruption in
src/hooks/ralph-loop/continuation-prompt-builder.tsdue to unsafe string replacement whenstate.promptcontains special characters. - This is a concrete user-impacting bug (corrupted prompts) and is the main reason for the lower score despite being a localized change.
- Pay close attention to
src/hooks/ralph-loop/continuation-prompt-builder.ts- replace the string substitution with a replacer function to treat the prompt as literal text.
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/hooks/ralph-loop/continuation-prompt-builder.ts">
<violation number="1" location="src/hooks/ralph-loop/continuation-prompt-builder.ts:50">
P1: Unsafe string replacement allows prompt corruption via special characters. Use a replacer function instead: `.replace("{{PROMPT}}", () => state.prompt)` to treat the replacement as literal text.</violation>
</file>
Since this is your first cubic review, here's how it works:
- cubic automatically reviews your code and comments on bugs and improvements
- Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
- Add one-off context when rerunning by tagging
@cubic-dev-aiwith guidance or docs links (includingllms.txt) - Ask questions if you need clarification on any suggestion
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| ) | ||
| .replace("{{MAX}}", String(state.max_iterations)) | ||
| .replace("{{PROMISE}}", state.completion_promise) | ||
| .replace("{{PROMPT}}", state.prompt) |
There was a problem hiding this comment.
P1: Unsafe string replacement allows prompt corruption via special characters. Use a replacer function instead: .replace("{{PROMPT}}", () => state.prompt) to treat the replacement as literal text.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/hooks/ralph-loop/continuation-prompt-builder.ts, line 50:
<comment>Unsafe string replacement allows prompt corruption via special characters. Use a replacer function instead: `.replace("{{PROMPT}}", () => state.prompt)` to treat the replacement as literal text.</comment>
<file context>
@@ -25,3 +39,15 @@ export function buildContinuationPrompt(state: RalphLoopState): string {
+ )
+ .replace("{{MAX}}", String(state.max_iterations))
+ .replace("{{PROMISE}}", state.completion_promise)
+ .replace("{{PROMPT}}", state.prompt)
+
+ return state.ultrawork ? `ultrawork ${rejectionPrompt}` : rejectionPrompt
</file context>
| .replace("{{PROMPT}}", state.prompt) | |
| .replace("{{PROMPT}}", () => state.prompt) |
|
I have read the CLA Document and I hereby sign the CLA |
Summary
Fixes #1921.
RALPH_LOOP_TEMPLATEprompt with explicit anti-bypass rulesThe Problem
AI agents can trivially bypass the Ralph Loop by immediately outputting
<promise>DONE</promise>without performing any actual work (no tool calls, no file operations, nothing). The loop detects the promise and exits with "Task completed after 1 iteration(s)" — but zero work was done.This is the inverse of #1233 (promise not detected). Here, the promise IS detected correctly — but it shouldn't be accepted because no work was performed.
The Solution
New:
work-verifier.tscountToolCallsInCurrentIteration()inspects the session messages fortool_useandtool_resultparts in assistant messages since the last user message. Returns:N > 0: Number of tool calls found (work verified)0: No tool calls found (bypass detected)-1: API error (fail-open — don't block legitimate completions)Modified:
ralph-loop-event-handler.tsAfter completion promise is detected via session messages API:
countToolCallsInCurrentIteration()to verify worktoolCallCount === 0: reject the promise, increment iteration, inject a "WORK REQUIRED" continuation prompttoolCallCount > 0or-1(error): accept the promise normallyTranscript-detected completions skip the work check — transcript entries already contain tool_result data proving work was done.
Modified:
continuation-prompt-builder.tsNew
buildNoWorkRejectionPrompt()that explicitly tells the agent its promise was rejected because no tool calls were detected, and instructs it to actually work before declaring completion.Modified:
ralph-loop.tstemplateAdded three anti-bypass rules to the prompt template:
Design Decisions
-1)tool_useandtool_resultcountedTests
3 new test cases added (43 total, 0 failures):
should reject completion promise when no tool calls were made (bypass prevention)should accept completion promise when tool calls were madeshould fail open when work verification API errorsFiles Changed
src/hooks/ralph-loop/work-verifier.tssrc/hooks/ralph-loop/ralph-loop-event-handler.tssrc/hooks/ralph-loop/continuation-prompt-builder.tssrc/hooks/ralph-loop/index.tssrc/hooks/ralph-loop/index.test.tssrc/features/builtin-commands/templates/ralph-loop.tsSummary by cubic
Prevents Ralph Loop bypasses by requiring at least one tool call before accepting a completion promise. Closes #1921 by rejecting zero‑work “DONE” outputs.
Bug Fixes
New Features
Written for commit 29203b4. Summary will update on new commits.