Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
206 changes: 206 additions & 0 deletions .claude/commands/add-integration-test.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,206 @@
---
allowed-tools: Read, Write, Bash, Glob, Grep, Edit
description: Create a new integration test case following the testcases/ pattern
argument-hint: <test-name> <description>
---

I'll help you create a new integration test case following the established pattern in the `testcases/` directory, similar to the `eval-input-overrides` example from PR #1101.

## Understanding the Test Structure

Based on the existing test pattern, each integration test should have:
- `run.sh` - Main test execution script
- `pyproject.toml` - Python dependencies
- `entry-points.json` - Entry point configuration
- `uipath.json` - UiPath configuration
- `src/` directory containing:
- Evaluation set JSON files
- Input/configuration JSON files
- `assert.py` - Validation script

## Step 1: Gather Information

I need to understand what you're testing. Please provide:
1. **Test Name**: A descriptive name for your test (e.g., "eval-multimodal-inputs")
2. **Test Purpose**: What feature or scenario are you testing?
3. **Evaluation Set**: What evaluations will run?
4. **Expected Behavior**: What should the test verify?

Let me check the existing testcases structure:

!ls -1 testcases/

## Step 2: Create Test Directory Structure

Based on your test name `${test-name}`, I'll create:

```bash
testcases/${test-name}/
├── run.sh
├── pyproject.toml
├── entry-points.json
├── uipath.json
├── src/
│ ├── eval-set.json
│ ├── config.json (if needed)
│ └── assert.py
```

Let me read the reference implementation to understand the pattern:

!cat testcases/eval-input-overrides/run.sh
!cat testcases/eval-input-overrides/pyproject.toml
!cat testcases/eval-input-overrides/src/assert.py

## Step 3: Create the Test Files

I'll create each file following the established pattern:

### 1. run.sh - Test Execution Script
```bash
#!/bin/bash
set -e

echo "Syncing dependencies..."
uv sync

echo ""
echo "Running ${test-name} integration test..."
echo ""

# Create output directory
mkdir -p __uipath

# Run evaluations
uv run uipath eval main src/eval-set.json \
--no-report \
--output-file __uipath/output.json

echo ""
echo "Test completed! Verifying results..."
echo ""

# Run assertion script to verify results
uv run python src/assert.py

echo ""
echo "${test-name} integration test completed successfully!"
```

### 2. pyproject.toml - Dependencies
```toml
[project]
name = "${test-name}"
version = "0.1.0"
requires-python = ">=3.11"
dependencies = [
"uipath>=2.4.0",
]

[build-system]
requires = ["setuptools>=61.0"]
build-backend = "setuptools.build_meta"
```

### 3. entry-points.json - Entry Points Configuration
```json
{
"main": "src/main.json"
}
```

### 4. uipath.json - UiPath Configuration
```json
{
"name": "${test-name}",
"version": "1.0.0"
}
```

### 5. src/eval-set.json - Evaluation Set
(You'll need to provide the specific evaluation configuration)

### 6. src/assert.py - Validation Script
```python
"""Assertions for ${test-name} testcase."""
import json
import os


def main() -> None:
"""Main assertion logic."""
output_file = "__uipath/output.json"

assert os.path.isfile(output_file), (
f"Evaluation output file '{output_file}' not found"
)
print(f"✓ Found evaluation output file: {output_file}")

with open(output_file, "r", encoding="utf-8") as f:
output_data = json.load(f)

print("✓ Loaded evaluation output")

# Add your specific assertions here
assert "evaluationSetResults" in output_data

evaluation_results = output_data["evaluationSetResults"]
assert len(evaluation_results) > 0, "No evaluation results found"

print(f"✓ Found {len(evaluation_results)} evaluation result(s)")

# Add test-specific validations

print("\\n✅ All assertions passed!")


if __name__ == "__main__":
main()
```

## Step 4: Make run.sh Executable

!chmod +x testcases/${test-name}/run.sh

## Step 5: Test the Integration Test

Let's validate the test runs correctly:

!cd testcases/${test-name} && ./run.sh

## Step 6: Add to Documentation

Consider documenting your test in the project README or test documentation:
- What scenario it tests
- How to run it manually
- What it validates

---

## Summary

Your new integration test `${test-name}` has been created following the established pattern:

**Directory Structure**: Matches testcases/ pattern
**Dependencies**: Configured in pyproject.toml
**Test Script**: run.sh with proper error handling
**Assertions**: Validation logic in assert.py
**Configuration**: UiPath and entry points configured

## Next Steps

1. **Customize** the eval-set.json with your specific test data
2. **Update** assert.py with test-specific validations
3. **Run** the test: `cd testcases/${test-name} && ./run.sh`
4. **Document** the test purpose and usage
5. **Commit** the new test to version control

## Tips

- Keep tests focused on a single feature or scenario
- Use descriptive evaluation names in eval-set.json
- Add clear assertion messages for debugging
- Follow the echo statement pattern (removed from initial header, kept for progress)
- Ensure all JSON files are properly formatted

Need help customizing any specific part of the test? Just ask!
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "uipath"
version = "2.4.21"
version = "2.4.22"
description = "Python SDK and CLI for UiPath Platform, enabling programmatic interaction with automation services, process management, and deployment tools."
readme = { file = "README.md", content-type = "text/markdown" }
requires-python = ">=3.11"
Expand Down
80 changes: 80 additions & 0 deletions src/uipath/_cli/_evals/_eval_util.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
"""Utility functions for applying input overrides to evaluation inputs."""

import copy
import logging
from typing import Any

logger = logging.getLogger(__name__)


def deep_merge(base: dict[str, Any], override: dict[str, Any]) -> dict[str, Any]:
"""Recursively merge override into base dictionary.
Args:
base: The base dictionary to merge into
override: The override dictionary to merge from
Returns:
A new dictionary with overrides recursively merged into base
"""
result = copy.deepcopy(base)
for key, value in override.items():
if key in result and isinstance(result[key], dict) and isinstance(value, dict):
# Recursively merge nested dicts
result[key] = deep_merge(result[key], value)
else:
# Direct replacement for non-dict or new keys
result[key] = value
return result


def apply_input_overrides(
inputs: dict[str, Any],
input_overrides: dict[str, Any],
eval_id: str | None = None,
) -> dict[str, Any]:
"""Apply input overrides to inputs using direct field override.
Format: Per-evaluation overrides (keys are evaluation IDs):
{"eval-1": {"operator": "*"}, "eval-2": {"a": 100}}
Deep merge is supported for nested objects:
- {"filePath": {"ID": "new-id"}} - deep merges inputs["filePath"] with {"ID": "new-id"}
Args:
inputs: The original inputs dictionary
input_overrides: Dictionary mapping evaluation IDs to their override values
eval_id: The evaluation ID (required)
Returns:
A new dictionary with overrides applied
"""
if not input_overrides:
return inputs

if not eval_id:
logger.warning(
"eval_id not provided, cannot apply input overrides. Input overrides require eval_id."
)
return inputs

result = copy.deepcopy(inputs)

# Check if there are overrides for this specific eval_id
if eval_id not in input_overrides:
logger.debug(f"No overrides found for eval_id='{eval_id}'")
return result

overrides_to_apply = input_overrides[eval_id]
logger.debug(f"Applying overrides for eval_id='{eval_id}': {overrides_to_apply}")

# Apply direct field overrides with recursive deep merge
for key, value in overrides_to_apply.items():
if key in result and isinstance(result[key], dict) and isinstance(value, dict):
# Recursive deep merge for dict values
result[key] = deep_merge(result[key], value)
else:
# Direct replacement for non-dict or new keys
result[key] = value

return result
16 changes: 14 additions & 2 deletions src/uipath/_cli/_evals/_runtime.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@
from .._utils._eval_set import EvalHelpers
from .._utils._parallelization import execute_parallel
from ._configurable_factory import ConfigurableRuntimeFactory
from ._eval_util import apply_input_overrides
from ._evaluator_factory import EvaluatorFactory
from ._models._evaluation_set import (
EvaluationItem,
Expand Down Expand Up @@ -262,6 +263,7 @@ class UiPathEvalContext:
verbose: bool = False
enable_mocker_cache: bool = False
report_coverage: bool = False
input_overrides: dict[str, Any] | None = None
model_settings_id: str = "default"


Expand Down Expand Up @@ -524,7 +526,10 @@ async def _execute_eval(
),
)
agent_execution_output = await self.execute_runtime(
eval_item, execution_id, runtime
eval_item,
execution_id,
runtime,
input_overrides=self.context.input_overrides,
)
except Exception as e:
if self.context.verbose:
Expand Down Expand Up @@ -759,6 +764,7 @@ async def execute_runtime(
eval_item: EvaluationItem,
execution_id: str,
runtime: UiPathRuntimeProtocol,
input_overrides: dict[str, Any] | None = None,
) -> UiPathEvalRunExecutionOutput:
log_handler = self._setup_execution_logging(execution_id)
attributes = {
Expand All @@ -785,8 +791,14 @@ async def execute_runtime(

start_time = time()
try:
# Apply input overrides to inputs if configured
inputs_with_overrides = apply_input_overrides(
eval_item.inputs,
input_overrides or {},
eval_id=eval_item.id,
)
result = await execution_runtime.execute(
input=eval_item.inputs,
input=inputs_with_overrides,
)
except Exception as e:
end_time = time()
Expand Down
10 changes: 10 additions & 0 deletions src/uipath/_cli/cli_eval.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import ast
import asyncio
import os
from typing import Any

import click
from uipath.core.tracing import UiPathTraceManager
Expand Down Expand Up @@ -113,6 +114,12 @@ def setup_reporting_prereq(no_report: bool) -> bool:
default=20,
help="Maximum concurrent LLM requests (default: 20)",
)
@click.option(
"--input-overrides",
cls=LiteralOption,
default="{}",
help='Input field overrides per evaluation ID: \'{"eval-1": {"operator": "*"}, "eval-2": {"a": 100}}\'. Supports deep merge for nested objects.',
)
def eval(
entrypoint: str | None,
eval_set: str | None,
Expand All @@ -126,6 +133,7 @@ def eval(
model_settings_id: str,
trace_file: str | None,
max_llm_concurrency: int,
input_overrides: dict[str, Any],
) -> None:
"""Run an evaluation set against the agent.
Expand All @@ -141,6 +149,7 @@ def eval(
model_settings_id: Model settings ID to override agent settings
trace_file: File path where traces will be written in JSONL format
max_llm_concurrency: Maximum concurrent LLM requests
input_overrides: Input field overrides mapping (direct field override with deep merge)
"""
set_llm_concurrency(max_llm_concurrency)

Expand Down Expand Up @@ -178,6 +187,7 @@ def eval(
eval_context.eval_ids = eval_ids
eval_context.report_coverage = report_coverage
eval_context.model_settings_id = model_settings_id
eval_context.input_overrides = input_overrides

try:

Expand Down
Loading