Skip to content

feat: Add image support for vision-capable models #110

@CarGDev

Description

@CarGDev

Summary

Enable users to include images in conversations for vision-capable models.

Why It's Needed

  • UI/UX Feedback: "What's wrong with this screenshot?"
  • Diagram Understanding: Analyze architecture diagrams
  • Error Screenshots: Debug from error screenshots
  • Design Implementation: "Implement this design mockup"
  • Competitor Parity: Both OpenCode and Claude Code support images

Features

Image Input Methods

  1. File Path: @screenshot.png or --files image.jpg
  2. Paste from Clipboard: Ctrl+V in TUI
  3. Drag and Drop: Drop image file onto TUI
  4. URL: @https://example.com/image.png

Supported Formats

  • PNG, JPG, JPEG, GIF, WebP
  • Max size: 20MB (configurable)
  • Auto-resize if too large

Implementation

Message Format

interface ImageMessage {
  role: "user"
  content: [
    { type: "text", text: "What's in this image?" },
    { 
      type: "image_url", 
      image_url: { 
        url: "...",
        detail: "auto"  // or "low" | "high"
      } 
    }
  ]
}

Image Processing

async function processImage(input: string): Promise<ImageContent> {
  let buffer: Buffer
  
  if (input.startsWith("http")) {
    // Fetch from URL
    const response = await fetch(input)
    buffer = Buffer.from(await response.arrayBuffer())
  } else {
    // Read from file
    buffer = await readFile(input)
  }
  
  // Resize if too large
  if (buffer.length > MAX_SIZE) {
    buffer = await resizeImage(buffer, MAX_DIMENSIONS)
  }
  
  const base64 = buffer.toString("base64")
  const mimeType = detectMimeType(buffer)
  
  return {
    type: "image_url",
    image_url: {
      url: \`data:\${mimeType};base64,\${base64}\`
    }
  }
}

Clipboard Support (TUI)

// On Ctrl+V, check for image in clipboard
import clipboard from "clipboardy"

async function handlePaste() {
  // Check for image data
  const imageData = await getClipboardImage()
  if (imageData) {
    addImageToContext(imageData)
    showImagePreview(imageData)
  }
}

Provider Support

Provider Vision Support
Copilot (GPT-4o)
Copilot (GPT-4)
Copilot (Claude)
Ollama (llava)
Ollama (bakllava)
Ollama (others)

TUI Display

┌─────────────────────────────────────┐
│ You:                                │
│ What's wrong with this UI?          │
│                                     │
│ ┌─────────────┐                     │
│ │ 📷 image.png │                    │
│ │   (256x128)  │                    │
│ └─────────────┘                     │
└─────────────────────────────────────┘

Configuration

{
  "images": {
    "enabled": true,
    "maxSize": 20971520,
    "autoResize": true,
    "maxDimensions": 2048,
    "detail": "auto"
  }
}

Acceptance Criteria

  • Support @image.png syntax
  • Support --files with images
  • Clipboard paste support (Ctrl+V)
  • Auto-resize large images
  • Image preview in TUI
  • Provider capability detection
  • Graceful fallback if vision not supported
  • Multiple images per message

Effort Estimate

3 days

Dependencies

  • sharp (image processing)
  • clipboardy (clipboard access)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions