Rename fields in `CaseRecord` for consistency with langfuse evaluators #38

fcogidi · 2026-02-06T15:17:32Z

Summary

Rename fields in CaseRecord for consistency with langfuse evaluators

Clickup Ticket(s): N/A

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)
✨ New feature (non-breaking change that adds functionality)
💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
📝 Documentation update
🔧 Refactoring (no functional changes)
⚡ Performance improvement
🧪 Test improvements
🔒 Security fix

Changes Made

Rename case to input.
Rename groundtruth to expected_output.
Rename analysis to output.

Testing

Tests pass locally (uv run pytest tests/)
Type checking passes (uv run mypy <src_dir>)
Linting passes (uv run ruff check src_dir/)
Manual testing performed (describe below)

Manual testing details:
N/A

Screenshots/Recordings

N/A

Related Issues

N/A

Deployment Notes

N/A

Checklist

Code follows the project's style guidelines
Self-review of code completed
Documentation updated (if applicable)
No sensitive information (API keys, credentials) exposed

Copilot

Pull request overview

Renames CaseRecord fields to align with langfuse evaluator conventions (input, expected_output, output) and updates agent + tests accordingly.

Changes:

Renamed CaseRecord fields (case → input, groundtruth → expected_output, analysis → output).
Updated AML investigation agent logic to read/write/analyze using the new field names.
Updated AML investigation test cases to assert against the renamed fields.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
implementations/aml_investigation/agent.py	Updates analysis pipeline, resume logic, and metrics computation to use `input/expected_output/output`.
aieng-eval-agents/aieng/agent_evals/aml_investigation/data/cases.py	Renames the `CaseRecord` schema and updates case-building helpers accordingly.
aieng-eval-agents/tests/aieng/agent_evals/aml_investigation/data/test_cases.py	Updates tests to reference the renamed `CaseRecord` fields.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-06T15:19:49Z

aieng-eval-agents/aieng/agent_evals/aml_investigation/data/cases.py

+    input: CaseFile = Field(..., description="Metadata for the laundering case.")
+    expected_output: GroundTruth = Field(..., description="Ground truth information for the laundering case.")
+    output: AnalystOutput | None = Field(
        default=None,
        description="Optional analyst output for the laundering case. Typically populated after investigation.",
    )


Renaming these serialized fields will break parsing of any existing JSON/JSONL persisted with the old keys (case, groundtruth, analysis). Since the agent reads prior results via CaseRecord.model_validate_json(...) for resume behavior, legacy lines will fail validation and be skipped. Consider adding Pydantic v2 compatibility via validation_alias (e.g., AliasChoices('input', 'case'), AliasChoices('expected_output', 'groundtruth'), AliasChoices('output', 'analysis')) so existing artifacts remain readable while emitting the new field names.

Copilot · 2026-02-06T15:19:49Z

implementations/aml_investigation/agent.py

    input_records = _load_records(input_path)
-    existing_results = {record.case.case_id: record for record in _load_records(output_path)}
-    to_run = [r for r in input_records if existing_results.get(r.case.case_id, r).analysis is None]
+    existing_results = {record.input.case_id: record for record in _load_records(output_path)}


With the CaseRecord field rename, _load_records(output_path) will skip any legacy JSONL rows (old schema) as invalid, which can cause existing_results to be incomplete and make the resume logic re-run cases unnecessarily (and potentially overwrite/duplicate outputs). After adding schema aliases (or a migration path), it would also help to make legacy parsing failures more explicit here (e.g., fail fast or log a clear warning/count) so users don’t silently lose resume behavior.

Suggested change

existing_results = {record.input.case_id: record for record in _load_records(output_path)}

# Load existing results from the output file, and detect any lines that could not be parsed

existing_records = list(_load_records(output_path))

existing_results = {record.input.case_id: record for record in existing_records}

# Compare parsed records against total lines to surface potential legacy/invalid rows

if output_path.exists():

try:

with output_path.open("r", encoding="utf-8") as f:

total_lines = sum(1 for _ in f)

except OSError:

total_lines = None

if total_lines is not None and total_lines > len(existing_records):

logger.warning(

"Detected %d/%d records in %s that could not be parsed. "

"These may be legacy or invalid rows, and resume behavior may be affected.",

total_lines - len(existing_records),

total_lines,

output_path,

)

aieng-eval-agents/aieng/agent_evals/aml_investigation/data/cases.py

amrit110

Seems fine!

Rename fields in CaseRecord for consistency with langfuse evaluators

809e001

fcogidi requested review from amrit110 and lotif February 6, 2026 15:17

fcogidi self-assigned this Feb 6, 2026

fcogidi added the refactor Refactor or clean up code structure label Feb 6, 2026

fcogidi requested a review from Copilot February 6, 2026 15:18

Copilot AI reviewed Feb 6, 2026

View reviewed changes

amrit110 approved these changes Feb 9, 2026

View reviewed changes

lotif approved these changes Feb 9, 2026

View reviewed changes

fcogidi merged commit ae3b526 into main Feb 9, 2026
3 checks passed

fcogidi deleted the fco/rename_fields branch February 9, 2026 17:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rename fields in `CaseRecord` for consistency with langfuse evaluators #38

Rename fields in `CaseRecord` for consistency with langfuse evaluators #38

Uh oh!

fcogidi commented Feb 6, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 6, 2026

Uh oh!

Copilot AI Feb 6, 2026

Uh oh!

Uh oh!

amrit110 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-    existing_results = {record.input.case_id: record for record in _load_records(output_path)}
+    # Load existing results from the output file, and detect any lines that could not be parsed
+    existing_records = list(_load_records(output_path))
+    existing_results = {record.input.case_id: record for record in existing_records}
+    # Compare parsed records against total lines to surface potential legacy/invalid rows
+    if output_path.exists():
+        try:
+            with output_path.open("r", encoding="utf-8") as f:
+                total_lines = sum(1 for _ in f)
+        except OSError:
+            total_lines = None
+        if total_lines is not None and total_lines > len(existing_records):
+            logger.warning(
+                "Detected %d/%d records in %s that could not be parsed. "
+                "These may be legacy or invalid rows, and resume behavior may be affected.",
+                total_lines - len(existing_records),
+                total_lines,
+                output_path,
+            )

Rename fields in CaseRecord for consistency with langfuse evaluators #38

Rename fields in CaseRecord for consistency with langfuse evaluators #38

Uh oh!

Conversation

fcogidi commented Feb 6, 2026

Summary

Type of Change

Changes Made

Testing

Screenshots/Recordings

Related Issues

Deployment Notes

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

amrit110 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Rename fields in `CaseRecord` for consistency with langfuse evaluators #38

Rename fields in `CaseRecord` for consistency with langfuse evaluators #38