Skip to content

Conversation

@camilamacedo86
Copy link
Contributor

@camilamacedo86 camilamacedo86 commented Jan 28, 2026

Problem

When the operator-controller encounters configuration validation errors (like missing required fields or invalid namespace names), it
was showing confusing status messages:

  • Progressing: True, Reason: Retrying (keeps retrying forever)
  • Installed: False, Reason: Absent (wrong - it's not absent, it failed)
  • Error messages showed "terminal error: terminal error: ..." prefix

Additionally, the generic Blocked reason was used for all terminal errors, even though it was semantically intended for external
blockers (resource collisions, etc.), not user configuration errors.

Changes Made

1. Config errors are now permanent with specific reason (provider.go)

  • Wrap config validation errors with NewTerminalError(ReasonInvalidConfiguration, err)
  • Prevents infinite retries when user config is invalid
  • Shows Progressing: False, Reason: InvalidConfiguration instead of Retrying
  • New ReasonInvalidConfiguration constant distinguishes config errors from external blockers

2. Custom TerminalError with granular Reason support (terminal.go)

  • Added NewTerminalError(reason, err) to create terminal errors with specific reasons
  • Added ExtractTerminalReason(err) to retrieve the reason from errors
  • Preserves reason through error wrapping in wrapErrorWithResolutionInfo()
  • Falls back to ReasonBlocked for backward compatibility with plain reconcile.TerminalError()
  • Blocked now reserved for its intended purpose (external blockers)

3. Status shows "Failed" for config errors (common_controller.go)

  • Added determineFailureReason() function
  • Checks only the LATEST revision's Progressing condition (old errors don't matter)
  • Returns Failed when revision has Reason: Retrying (indicates error)
  • Returns Absent when revision has Reason: RollingOut (healthy progress)

4. BOXCUTTER sets status even on errors (boxcutter_reconcile_steps.go)

  • Don't set Installed condition for terminal errors
  • Progressing condition already provides all necessary information
  • Avoids redundant status reporting

5. Clean error messages (common_controller.go, clusterextension_controller.go)

  • Added UnwrapTerminal(err) helper to remove "terminal error:" prefix
  • Unwrap before displaying message to users
  • Error wrapping preserves both terminal type and reason

Result

Before:

Progressing: True, Reason: Retrying                                                                                                      
Message: "terminal error: invalid ClusterExtension configuration: missing field"                                                         
Installed: False, Reason: Absent  # Confusing!                                                                                           

After:

Progressing: False, Reason: InvalidConfiguration  # Clear & specific!                                                                    
Message: "invalid ClusterExtension configuration: missing field"  # Clean!                                                               
# No Installed condition - Progressing already says it all                                                                               

Testing

  • Unit tests verify TerminalError behavior with reason preservation
  • Unit tests verify determineFailureReason() logic for latest revision
  • Unit tests verify error unwrapping and message cleaning
  • E2E tests validate all config validation scenarios now show InvalidConfiguration
  • All tests passing with race detector

API Changes

  • Added ReasonInvalidConfiguration constant to api/v1/common_types.go
  • Registered in conditionsets.go ConditionReasons list

Copilot AI review requested due to automatic review settings January 28, 2026 21:37
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 28, 2026
@openshift-ci openshift-ci bot requested review from OchiengEd and pedjak January 28, 2026 21:38
@netlify
Copy link

netlify bot commented Jan 28, 2026

Deploy Preview for olmv1 ready!

Name Link
🔨 Latest commit c2807f5
🔍 Latest deploy log https://app.netlify.com/projects/olmv1/deploys/697c96b443d92700084e8b8b
😎 Deploy Preview https://deploy-preview-2465--olmv1.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refines how the controller reports installation status when configuration validation fails, ensuring Boxcutter behavior matches Helm by using Installed=False with Reason=Failed instead of Absent for config errors. It also adds unit and e2e coverage to lock in the new semantics.

Changes:

  • Updated setInstalledStatusFromRevisionStates to derive the Installed condition reason via a new determineFailureReason helper that distinguishes config validation errors from normal rollouts.
  • Introduced hasConfigValidationError to detect "invalid configuration:" / "invalid ClusterExtension configuration:" messages on rolling revisions and treat them as failures.
  • Extended unit tests and e2e scenarios to verify that, for config validation errors, the ClusterExtension Installed condition is False with Reason=Failed.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
test/e2e/features/install.feature Adds assertions that, during config validation failures in single/own-namespace scenarios, the ClusterExtension Installed condition is False with Reason=Failed.
internal/operator-controller/controllers/common_controller.go Refactors Installed-condition calculation to use determineFailureReason and hasConfigValidationError, mapping specific config error messages to ReasonFailed and otherwise using ReasonAbsent.
internal/operator-controller/controllers/common_controller_test.go Adds table-driven tests for setInstalledStatusFromRevisionStates to cover no revisions, non-config errors (Absent), and both config error message variants (Failed).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@camilamacedo86 camilamacedo86 changed the title WIP 🐛 Fix Boxcutter to show Failed (not Absent) for config errors TEST WIP Jan 28, 2026
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 28, 2026
@camilamacedo86 camilamacedo86 force-pushed the fix-config-errors-status branch from 416a487 to d7a076c Compare January 28, 2026 21:57
@camilamacedo86 camilamacedo86 changed the title TEST WIP WIP 🐛 Fix Boxcutter to show Failed (not Absent) for config errors Jan 28, 2026
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 28, 2026
Copilot AI review requested due to automatic review settings January 29, 2026 06:58
@camilamacedo86 camilamacedo86 force-pushed the fix-config-errors-status branch from d7a076c to 0421c38 Compare January 29, 2026 06:58
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codecov
Copy link

codecov bot commented Jan 29, 2026

Codecov Report

❌ Patch coverage is 85.71429% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.63%. Comparing base (fbe909f) to head (c2807f5).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
internal/shared/util/error/terminal.go 70.83% 5 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2465      +/-   ##
==========================================
+ Coverage   69.52%   73.63%   +4.10%     
==========================================
  Files         102      102              
  Lines        8231     8273      +42     
==========================================
+ Hits         5723     6092     +369     
+ Misses       2056     1717     -339     
- Partials      452      464      +12     
Flag Coverage Δ
e2e 47.07% <65.30%> (+0.14%) ⬆️
experimental-e2e 54.75% <79.59%> (+41.37%) ⬆️
unit 57.42% <42.85%> (-0.13%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@camilamacedo86 camilamacedo86 force-pushed the fix-config-errors-status branch from 0421c38 to d9786e0 Compare January 29, 2026 11:54
if len(revisionStates.RollingOut) == 0 {
setInstalledStatusConditionFalse(ext, ocv1.ReasonFailed, "No bundle installed")
} else {
setInstalledStatusConditionFalse(ext, ocv1.ReasonAbsent, "No bundle installed")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would set ReasonAbset always when we should say Failed if we see that we are retrying due to errors

c/c @pedjak @perdasilva ^

@camilamacedo86 camilamacedo86 force-pushed the fix-config-errors-status branch from d9786e0 to a5ba252 Compare January 29, 2026 12:02
Copilot AI review requested due to automatic review settings January 29, 2026 12:02
@camilamacedo86 camilamacedo86 force-pushed the fix-config-errors-status branch from a5ba252 to f059a7b Compare January 29, 2026 12:05
// Retrying indicates an error occurred (config, apply, validation, etc.)
// Use Failed for semantic correctness: installation failed due to error
return ocv1.ReasonFailed
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pedjak @perdasilva ^ We check the LATEST only

@camilamacedo86 camilamacedo86 changed the title WIP 🐛 Fix Boxcutter to show Failed (not Absent) for config errors WIP 🐛 Fix config validation errors to show Failed status Jan 29, 2026
@camilamacedo86 camilamacedo86 changed the title WIP 🐛 Fix config validation errors to show Failed status 🐛 Fix config validation errors to show Failed status Jan 29, 2026
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 29, 2026
@camilamacedo86 camilamacedo86 force-pushed the fix-config-errors-status branch from f059a7b to 0b603c4 Compare January 29, 2026 12:08
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

When configuration is invalid, mark it as a permanent error so the
system doesn't keep trying to install it over and over. This saves
resources and makes it clear the user needs to fix their config.

Assisted-by: Claude <noreply@anthropic.com>
@camilamacedo86 camilamacedo86 force-pushed the fix-config-errors-status branch from 2d2cbaf to 7c2fd16 Compare January 30, 2026 10:55
Copilot AI review requested due to automatic review settings January 30, 2026 10:55
camilamacedo86 and others added 5 commits January 30, 2026 10:57
Before: Invalid config showed status as "Absent" which was confusing
After: Shows "Failed" so users know something is wrong

Only checks the latest attempt, not old ones that don't matter anymore.

Assisted-by: Claude <noreply@anthropic.com>
BOXCUTTER runtime was returning early on config errors without
setting the status. Now it updates the status first so users can
see what went wrong.

Assisted-by: Claude <noreply@anthropic.com>
Tests check:
- Config errors are permanent (not retried)
- Status shows "Failed" when config is wrong
- Namespace names are validated
- AllNamespaces operators reject watchNamespace config

Includes both unit tests and end-to-end tests.

Assisted-by: Claude <noreply@anthropic.com>
When TerminalError is used, unwrap it before showing the error
message in status conditions. This prevents the 'terminal error:'
prefix from appearing in user-facing messages.

Changes:
- Add UnwrapTerminal() helper function to cleanly unwrap terminal errors
- Use the helper in setStatusProgressing and wrapErrorWithResolutionInfo
- Don't set Installed condition when Progressing is Blocked (terminal error)
  since the Progressing condition already provides all necessary information

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Previously, all terminal errors used the generic "Blocked" reason in
the Progressing condition. This wasn't semantically correct - "Blocked"
was intended for external blockers like resource collisions, not for
user configuration errors.

This commit introduces a custom TerminalError type that can carry a
specific Reason field, allowing more precise categorization of terminal
errors in status conditions.

Changes:
- Add ReasonInvalidConfiguration constant for configuration errors
- Create NewTerminalError(reason, err) to wrap errors with a reason
- Add ExtractTerminalReason(err) to retrieve the reason from errors
- Update setStatusProgressing to use extracted reason when available
- Fall back to ReasonBlocked for backward compatibility with plain
  reconcile.TerminalError() usage
- Update config validation to use ReasonInvalidConfiguration
- Update tests and e2e expectations accordingly

Result:
- Config errors now show: Progressing: False, Reason: InvalidConfiguration
- Generic terminal errors show: Progressing: False, Reason: Blocked
- More helpful and semantically correct status reporting

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@camilamacedo86 camilamacedo86 force-pushed the fix-config-errors-status branch from 7c2fd16 to 511c8c1 Compare January 30, 2026 10:58
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…teps.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings January 30, 2026 11:32
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@camilamacedo86
Copy link
Contributor Author

@pedjak @perdasilva

All addressed now. WDYT?
Could we get this one merged so that we can easily still working on the follow ups?

@perdasilva
Copy link
Contributor

/approve

@openshift-ci
Copy link

openshift-ci bot commented Jan 30, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: perdasilva

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 30, 2026
@tmshort
Copy link
Contributor

tmshort commented Jan 30, 2026

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 30, 2026
@openshift-merge-bot openshift-merge-bot bot merged commit ee4bab5 into operator-framework:main Jan 30, 2026
35 checks passed
@camilamacedo86 camilamacedo86 deleted the fix-config-errors-status branch January 31, 2026 00:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants