-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Problem
Different LLM models behaved inconsistently during agent execution:
- Some showed plans, others didn't
- Approval flow varied by model
- Output formatting differed
Observed Inconsistencies
| Model | Plan Shown | Waited for Approval | Output Format |
|---|---|---|---|
| GPT-5-mini | Yes (multiple) | No | Plain text |
| Claude Sonnet 4.5 | Yes (structured) | No | Markdown |
| Gemini | Yes | No | Plain text |
Expected Behavior
Regardless of which LLM provider is used:
- Same plan format displayed
- Same approval flow enforced
- Same tool calling interface
- Consistent output formatting
Implementation Suggestions
The agent layer should normalize behavior:
- Wrap model responses in consistent format
- Enforce approval gates at application level (not model level)
- Standardize output through formatters
// Application-level enforcement, not model-dependent
const executeWithApproval = async (plan: Plan): Promise<void> => {
const approved = await showPlanAndWaitForApproval(plan);
if (!approved) return;
// Execute...
};Priority
🟡 Medium - Affects user experience and predictability
Generated from model evaluation test
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels