Skip to content

Conversation

@robotdad
Copy link
Member

Adds recipe tests for check cli functionality. The orchestration recipes require a fix from this pr, all recipes need the new result-validator agent also in that pr.

robotdad and others added 17 commits December 10, 2025 14:39
Add 01-basic-execution.yaml as Phase 0 proof-of-concept for recipe-based
testing approach. Tests basic recipe execution with foundation:explorer
agent and result-validator for verification.

Update .gitignore to exclude DECISIONS.md (working notes, not for git).

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
Add 02-variable-substitution.yaml to test context variable definition
and substitution in recipe prompts. Tests string, number, and boolean
types with {{variable_name}} syntax.

Phase 0: Test 2 of 3

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
Remove 02-variable-substitution.yaml - tests recipe engine features
not CLI integration capabilities.

Per specification v2.0.0: CLI tests focus on integration boundaries.

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
Add three CLI-focused smoke tests created in parallel:
- 02-tool-loading.yaml: Module loading → Tool availability
- 03-provider-loading.yaml: Provider loading → Model access
- 04-profile-resolution.yaml: Profile resolution → Config assembly

Each test validates ONE atomic CLI integration boundary per revised
specification v2.0.0. Phase 0 complete: 4 tests covering core CLI
integration points (session, tools, providers, profiles).

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
Remove 03-provider-loading and 04-profile-resolution tests. Analysis showed:
- Test 03: Agent self-reported wrong model (3.5 vs 4.5), passed anyway
- Test 04: Agent just listed tools without using any

These don't validate CLI capabilities. Test 01 (agent responds) already
proves provider works. Test 02 (tool executes) already proves tool
loading works. Self-reporting adds no validation value.

Phase 0: 2 tests remaining (01-agent-spawning, 02-tool-loading)
Principle: Validate capabilities through concrete actions, not claims.

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
Add Phase 1 capability-focused feature tests:
- tool-bash.yaml: Execute bash commands
- tool-web.yaml: Web fetch operations
- tool-search.yaml: Grep pattern search
- collection-resolution.yaml: @mention path resolution
- agent-delegation.yaml: Sub-session spawning via task tool

Add test fixture: sample-code.py for search testing

Note: tool-filesystem.yaml missing - will add separately

All tests use concrete actions for capability validation.

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
Add final Phase 1 feature test:
- tool-filesystem.yaml: Tests read_file, glob, and write_file operations
- fixtures/sample.txt: Fixture for read_file testing

Phase 1 complete: 6 of 6 feature tests ready

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
The fixture had only 1 line but the test expected 3 lines with specific
content. Updated to include all three expected lines:
- Line 1: Sample content for filesystem testing
- Line 2: This file tests read_file operations
- Line 3: Line three contains test data

Fixes tool-filesystem.yaml validation.

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
Update paths to be relative to workspace root (/Users/robotdad/Source/recipes)
instead of amplifier-app-cli/ directory. This allows running recipes from
workspace root while using workspace .amplifier settings.

Paths changed:
- tests/recipes/fixtures -> amplifier-app-cli/tests/recipes/fixtures

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
Update fixtures_dir path to be relative to workspace root.

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
Recipes should use paths relative to amplifier-app-cli root for portability.
Previous workspace-relative paths broke when repo is used standalone.

Paths restored:
- amplifier-app-cli/tests/recipes/fixtures -> tests/recipes/fixtures

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
Rename smoke tests to descriptive names without arbitrary numbering:
- 01-basic-execution.yaml -> agent-spawning.yaml
- 02-tool-loading.yaml -> tool-loading.yaml

Phase 0 tests don't require ordering, descriptive names are clearer.

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
Add two orchestrators for running test suites in parallel:
- run-smoke-tests.yaml: Executes 2 smoke tests concurrently
- run-feature-tests.yaml: Executes 6 feature tests concurrently

Features:
- Parallel execution with foreach + parallel: true (~6x speedup)
- Automatic result collection and aggregation
- 3-step pattern: run → synthesize → validate
- Clear verdicts: SMOKE_TESTS_PASS/FAIL, FEATURE_TESTS_PASS/FAIL
- Error resilience: continues even if some tests fail
- Comprehensive summary reports with per-test breakdown

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
Move run-smoke-tests.yaml and run-feature-tests.yaml from orchestrators/
subdirectory to tests/recipes/ root. Two files don't justify a subfolder,
and location at root makes the connection to smoke/ and features/ clear.

Updated paths:
- ../smoke/... -> smoke/...
- ../features/... -> features/...

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
The result-validator agent has been contributed to amplifier-collection-recipes
and is now available as recipes:result-validator. Removed the local workspace
copy since tests will use the collection version.

Changes:
- Removed .amplifier/agents/result-validator.md (duplicate)
- Agent now available via recipes collection

Related: Tests continue using agent: "result-validator" which will resolve
to the collection version when amplifier-collection-recipes is installed.

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
Updated all test recipes to reference result-validator using the
collection prefix (recipes:result-validator) now that the agent has
been moved from local workspace to the recipes collection.

Changes:
- Updated 10 recipe files to use recipes:result-validator
- Ensures proper agent resolution from collection

Files updated:
- tests/recipes/run-smoke-tests.yaml
- tests/recipes/run-feature-tests.yaml
- tests/recipes/smoke/agent-spawning.yaml
- tests/recipes/smoke/tool-loading.yaml
- tests/recipes/features/agent-delegation.yaml
- tests/recipes/features/collection-resolution.yaml
- tests/recipes/features/tool-bash.yaml
- tests/recipes/features/tool-filesystem.yaml
- tests/recipes/features/tool-search.yaml
- tests/recipes/features/tool-web.yaml

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant