-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Summary
Add support for custom setup and teardown scripts that run before/after each eval case when using workspace_template.
Background
PR #180 adds workspace_template which copies a directory to a temp location for each eval. This issue extends that with custom script execution.
Proposed Config
- name: claude-eval
provider: claude-code
workspace_template: ./templates/workspace/
workspace_setup: ./scripts/setup.sh # runs after copy, before eval
workspace_teardown: ./scripts/cleanup.sh # runs after eval, before deletionUse Cases
- Install dependencies:
npm install,pip install -r requirements.txt - Generate files: Create config files from templates
- Database setup: Seed test data
- Custom cleanup: Archive logs, save artifacts before deletion
Requirements
- Scripts can be any executable (shell, python, node, etc.)
- Scripts receive workspace path as argument or env var
- Setup script failure should abort the eval case
- Teardown script failure should warn but not fail the eval
- Timeout support for scripts
- Capture script output in eval results
Alternatives Considered
- CLAUDE.md instructions: Agent runs setup, but non-deterministic
- allagents integration: External tool, adds dependency
- Built-in scripts: Simpler but covers most cases
Related
- Depends on: PR feat: add workspace_template for isolated eval workspaces #180 (
workspace_templatefeature)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels