Skip to content

feat: add workspace setup and teardown scripts #177

@christso

Description

@christso

Summary

Add support for custom setup and teardown scripts that run before/after each eval case when using workspace_template.

Background

PR #180 adds workspace_template which copies a directory to a temp location for each eval. This issue extends that with custom script execution.

Proposed Config

- name: claude-eval
  provider: claude-code
  workspace_template: ./templates/workspace/
  workspace_setup: ./scripts/setup.sh      # runs after copy, before eval
  workspace_teardown: ./scripts/cleanup.sh # runs after eval, before deletion

Use Cases

  1. Install dependencies: npm install, pip install -r requirements.txt
  2. Generate files: Create config files from templates
  3. Database setup: Seed test data
  4. Custom cleanup: Archive logs, save artifacts before deletion

Requirements

  • Scripts can be any executable (shell, python, node, etc.)
  • Scripts receive workspace path as argument or env var
  • Setup script failure should abort the eval case
  • Teardown script failure should warn but not fail the eval
  • Timeout support for scripts
  • Capture script output in eval results

Alternatives Considered

  • CLAUDE.md instructions: Agent runs setup, but non-deterministic
  • allagents integration: External tool, adds dependency
  • Built-in scripts: Simpler but covers most cases

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions