Skip to content

Fix: Enable LLM-as-Judge base model evaluation integration tests, Add cleanup mechanism for MC dataset integ test#5576

Merged
mollyheamazon merged 6 commits intoaws:masterfrom
mollyheamazon:fix/dataset-cleanup-integ-tests
Feb 27, 2026
Merged

Fix: Enable LLM-as-Judge base model evaluation integration tests, Add cleanup mechanism for MC dataset integ test#5576
mollyheamazon merged 6 commits intoaws:masterfrom
mollyheamazon:fix/dataset-cleanup-integ-tests

Conversation

@mollyheamazon
Copy link
Contributor

@mollyheamazon mollyheamazon commented Feb 25, 2026

Issues Fixed

Integration tests for PR #5558 (LLM-as-Judge base model fix) were failing due to multiple issues:

  1. Missing IAM permissions for Bedrock model invocation
  2. MLflow experiment name handling causing pipeline failures
  3. Display utility crashing on None evaluation scores

Changes

1. Test Infrastructure Improvements

  • Enhanced error logging in integration tests to show detailed step failure information
  • Removed MLflow resource ARN from test configuration (not required for base model evaluation)
  • Added error handling for display utility failures to prevent test crashes

2. MLflow Experiment Name Handling

  • Initially added automatic generation of mlflow_experiment_name when mlflow_resource_arn is provided

3. Display Utility Robustness

  • Fixed _format_score() to handle None scores - Returns "N/A" instead of crashing with TypeError
  • Prevents crashes when evaluation metrics return None (e.g., model failures, malformed outputs)

4. Dataset Cleanup

  • Added cleanup mechanism for Model Customization dataset integration tests
  • Simplified conftest.py fixture management

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@mollyheamazon mollyheamazon changed the title Chore: Add cleanup mechanism for MC dataset integ test Fix: Enable LLM-as-Judge base model evaluation integration tests, Add cleanup mechanism for MC dataset integ test Feb 27, 2026
@mollyheamazon mollyheamazon merged commit 1210ac1 into aws:master Feb 27, 2026
15 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants