Improve tool calling response handling#39
Improve tool calling response handling#39MichaelAnders wants to merge 9 commits intoFast-Editor:mainfrom
Conversation
bd054a2 to
6bcc803
Compare
- Add fallback parsing for Ollama models that return tool calls as JSON text in message content instead of using the structured tool_calls field - Return tool results directly to CLI instead of making a follow-up LLM call, reducing latency and preventing hallucinated rewrites of output - Add dedicated Glob tool returning plain text (one path per line) instead of JSON, with workspace_list accepting both 'pattern' and 'patterns' - Clarify why Glob is not aliased to workspace_list (format mismatch)
- Additional logging for tool call parsing and execution - Hard-coded shell commands for reliable tool execution - Deduplication of tool calls within a single response - Collect and return results from all called tools - Ollama uses specified Ollama model - Fix double-serialized JSON parameters from some providers
Tool results now loop back to the model for natural language synthesis instead of being returned raw to the CLI. This fixes the bug where conversational messages (e.g. "hi") triggered tool calls and dumped raw output. Additional improvements: - Context-aware tiered compression that scales with model context window - Empty response detection with retry-then-fallback - _noToolInjection flag to prevent provider-level tool re-injection - Auto-approve external file reads in tool executor - Conversation context search in workspace_search
Three concurrent runAgentLoop calls per user message waste GPU time with
large models (~30s each). This adds SUGGESTION_MODE_MODEL config to skip
("none") or redirect suggestion mode to a lighter model. Also adds ISO
timestamps and mode tags to debug logs for easier debugging.
- Add ECONNREFUSED to retryable errors and check undici TypeError .cause.code so connection-refused errors get retried with backoff - Wrap invokeModel in try/catch returning structured 503 with provider_unreachable error instead of raw TypeError bubbling to Express error middleware - Fix suggestion mode early return response shape (json -> body) to match router expectations
… responses The heuristic-based stripThinkingBlocks() matched standard markdown bullets (- item, * item) as "thinking block markers" and dropped all subsequent content. Replace with stripThinkTags() that only strips <think>...</think> tags used by models like DeepSeek and Qwen for chain-of-thought reasoning.
The dotenv override (override: NODE_ENV !== "test") requires NODE_ENV=test to prevent .env values from stomping on test-set environment variables.
6bcc803 to
a2d78ee
Compare
|
Can you please add .env.example file as well. |
Add TOPIC_DETECTION_MODEL, MODEL_DEFAULT, Ollama tuning (OLLAMA_TIMEOUT_MS, OLLAMA_KEEP_ALIVE), and LLM Audit logging section to .env.example for discoverability. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Done! |
| // Guard: drop hallucinated tool calls when no tools were sent to the model. | ||
| // Some models (e.g. Llama 3.1) hallucinate tool_call blocks from conversation | ||
| // history even when the request contained zero tool definitions. | ||
| const toolsWereSent = Array.isArray(cleanPayload.tools) && cleanPayload.tools.length > 0; |
There was a problem hiding this comment.
@MichaelAnders
Looks like this one is repeated
Can you please check
There was a problem hiding this comment.
Good catch! You were right — the Ollama else if branch was incorrectly nested inside the OpenAI deduplication if (toolCalls.length > 0) block, which caused the OpenAI parsing + dedup code to appear twice (once at the outer level and again as the else fallback).
Fixed in 62a0710 by restructuring so all three provider branches (Anthropic, Ollama, OpenAI/Databricks) are siblings at the same nesting level, each with their own deduplication logic. The duplicate OpenAI block is removed.
There was a problem hiding this comment.
When I run this pr
I get the following error
npm start
> lynkr@7.2.5 prestart
> node -e "if(process.env.HEADROOM_ENABLED==='true'&&process.env.HEADROOM_DOCKER_ENABLED!=='false'){process.exit(0)}else{process.exit(1)}" && docker compose --profile headroom up -d headroom 2>/dev/null || echo 'Headroom skipped (disabled or Docker not running)'
Headroom skipped (disabled or Docker not running)
> lynkr@7.2.5 start
> node index.js 2>&1 | npx pino-pretty --sync
[CONFIG] Tool execution mode: client
/Users/vishalveera.reddy/claude-code/src/orchestrator/index.js:2097
const toolsWereSent = Array.isArray(cleanPayload.tools) && cleanPayload.tools.length > 0;
^
SyntaxError: Identifier 'toolsWereSent' has already been declared
at internalCompileFunction (node:internal/vm:73:18)
at wrapSafe (node:internal/modules/cjs/loader:1153:20)
at Module._compile (node:internal/modules/cjs/loader:1205:27)
at Module._extensions..js (node:internal/modules/cjs/loader:1295:10)
at Module.load (node:internal/modules/cjs/loader:1091:32)
at Module._load (node:internal/modules/cjs/loader:938:12)
at Module.require (node:internal/modules/cjs/loader:1115:19)
at require (node:internal/modules/helpers:130:18)
at Object.<anonymous> (/Users/vishalveera.reddy/claude-code/src/api/router.js:2:28)
at Module._compile (node:internal/modules/cjs/loader:1241:14)
Node.js v20.9.0
There was a problem hiding this comment.
const toolsWereSent this one is declared twice
The Ollama else-if branch was incorrectly nested inside the OpenAI deduplication if-block, causing the OpenAI parsing + dedup code to appear twice. Restructured so Ollama is a peer branch alongside Anthropic and OpenAI/Databricks, each with its own dedup logic. Addresses PR Fast-Editor#39 review feedback from veerareddyvishal144. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
Problems
Configuration for TOPIC_DETECTION_MODEL
Files Changed
src/api/middleware/logging.jssrc/api/middleware/request-logging.jssrc/api/router.jssrc/config/index.jssrc/orchestrator/index.jssrc/tools/index.jssrc/tools/stubs.jssrc/tools/workspace.jsTest plan
SUGGESTION_MODE_MODEL=none)