Skip to content

feat: agent skill to convert chat conversations into eval cases #182

@christso

Description

@christso

Problem

When iterating on agent prompts interactively (e.g. in VS Code Copilot, Claude Code, or any chat-based agent), useful test scenarios emerge organically from real conversations. Currently there's no way to capture these conversations and convert them into AgentV eval cases without manually writing YAML.

Proposal

Create an agent skill (not a CLI subcommand) that:

  1. Accepts a chat conversation transcript (e.g. markdown, JSON, or pasted text)
  2. Extracts test-worthy exchanges — identifying the user input, expected outcome, and optionally expected tool calls
  3. Generates AgentV eval YAML cases from them
  4. Optionally appends to an existing eval file or creates a new one

Why a skill, not a subcommand

  • Keeps AgentV core minimal
  • The conversion is inherently LLM-powered (extracting intent, expected outcomes from freeform chat) — perfect for an agent skill
  • Users can customise the skill's prompt to match their eval style
  • Works with any agent that supports skills (Copilot CLI, Claude Code, etc.)

Acceptance Criteria

  • Skill accepts a conversation transcript and produces valid AgentV eval YAML
  • Generated cases include input, expected_outcome, and optionally evaluators config
  • Works with common transcript formats (markdown chat, JSON messages array)
  • Documentation and example usage

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions