Skip to content

feat: Add web search and fetch tools #109

@CarGDev

Description

@CarGDev

Summary

Add tools for web search and URL fetching to enable research capabilities.

Why It's Needed

  • Research: Look up documentation, APIs, solutions
  • Current Info: Access information beyond training cutoff
  • External Resources: Fetch README, docs, examples from URLs
  • Competitor Parity: Both OpenCode and Claude Code have this

Tools to Implement

WebSearch Tool

interface WebSearchInput {
  query: string
  limit?: number      // Default 5
  site?: string       // Limit to specific domain
}

interface WebSearchResult {
  results: {
    title: string
    url: string
    snippet: string
  }[]
}

Provider Options:

  1. DuckDuckGo - Free, no API key
  2. Brave Search API - Privacy-focused, affordable
  3. Exa - Semantic search, used by OpenCode
  4. SerpAPI - Google results, more expensive

WebFetch Tool

interface WebFetchInput {
  url: string
  selector?: string   // CSS selector to extract specific content
  format?: "markdown" | "text" | "html"
}

interface WebFetchResult {
  content: string     // Converted to markdown by default
  title: string
  url: string
  truncated: boolean
  contentLength: number
}

Features:

  • HTML to Markdown conversion
  • Content truncation for large pages (configurable limit)
  • CSS selector extraction
  • Respect robots.txt
  • Handle redirects
  • Timeout handling

Usage Examples

User: "How do I use the Zod library?"

Agent: Let me search for Zod documentation.
[Uses WebSearch: "Zod typescript validation library documentation"]

Agent: Found the official docs. Let me fetch the getting started guide.
[Uses WebFetch: "https://zod.dev/docs/getting-started"]

Agent: Based on the docs, here's how to use Zod...

Implementation Notes

// WebFetch with Turndown for HTML→Markdown
import TurndownService from "turndown"
import { JSDOM } from "jsdom"

async function fetchAndConvert(url: string): Promise<string> {
  const response = await fetch(url)
  const html = await response.text()
  
  const dom = new JSDOM(html)
  const turndown = new TurndownService()
  
  // Remove scripts, styles, nav, footer
  const content = dom.window.document.querySelector("main, article, .content") 
    || dom.window.document.body
  
  return turndown.turndown(content.innerHTML)
}

Configuration

{
  "tools": {
    "webSearch": {
      "enabled": true,
      "provider": "duckduckgo",
      "maxResults": 5
    },
    "webFetch": {
      "enabled": true,
      "maxContentLength": 50000,
      "timeout": 30000
    }
  }
}

Acceptance Criteria

  • WebSearch tool with configurable provider
  • WebFetch tool with HTML→Markdown conversion
  • Content truncation for large pages
  • CSS selector extraction support
  • Timeout and error handling
  • Respect robots.txt
  • Rate limiting
  • Configuration options

Effort Estimate

3 days

Privacy Consideration

  • Search queries may be logged by search providers
  • URL fetches may be logged by target servers
  • Consider adding privacy notice in docs

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions