-
Notifications
You must be signed in to change notification settings - Fork 68
⚠️ ClusterExtentionRevision conditions #2340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for olmv1 ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the ClusterExtensionRevision conditions system to better distinguish between availability and progression states. The changes introduce a new "Progressing" condition type and update condition reasons to be more descriptive of the revision lifecycle states.
Key changes:
- Introduces
Progressingcondition type alongside refactoredAvailableandSucceededconditions - Replaces generic condition reasons with specific lifecycle states (e.g.,
RollingOut,RolledOut,Retrying,ProbesSucceeded) - Adds comprehensive E2E tests for ClusterExtensionRevision condition behavior
- Removes enum validation for
CollisionProtectionfield in CRDs
Reviewed Changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| api/v1/clusterextensionrevision_types.go | Refactors condition type constants and adds Progressing type, updates condition reason constants to reflect specific lifecycle states |
| internal/operator-controller/controllers/clusterextensionrevision_controller.go | Implements condition setting logic with new helper functions and updates reconciliation flow to properly set Progressing/Available conditions |
| internal/operator-controller/controllers/clusterextensionrevision_controller_test.go | Updates unit tests to validate new condition reasons and adds test coverage for error scenarios |
| internal/operator-controller/applier/boxcutter.go | Adds handling for Retrying reason in progressing condition |
| test/e2e/cluster_extension_revision_test.go | Adds comprehensive E2E test covering revision conditions, availability probes, and archiving |
| test/e2e/e2e_suite_test.go | Adds Kubernetes clientset for pod exec operations in E2E tests |
| manifests/experimental.yaml, manifests/experimental-e2e.yaml, helm/olmv1/base/operator-controller/crd/experimental/olm.operatorframework.io_clusterextensionrevisions.yaml | Adds Progressing column to kubectl output and removes CollisionProtection enum validation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
793f821 to
ae7d78d
Compare
ae7d78d to
8a492cb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
internal/operator-controller/controllers/clusterextensionrevision_controller.go
Show resolved
Hide resolved
8a492cb to
033aedf
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2340 +/- ##
==========================================
+ Coverage 74.45% 74.70% +0.24%
==========================================
Files 93 93
Lines 7300 7261 -39
==========================================
- Hits 5435 5424 -11
+ Misses 1433 1406 -27
+ Partials 432 431 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
033aedf to
537ad96
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (2)
internal/operator-controller/controllers/clusterextensionrevision_controller_test.go:915
- Corrected spelling of 'InTransistion' to 'InTransition'.
func (m mockRevisionResult) InTransistion() bool {
internal/operator-controller/controllers/clusterextensionrevision_controller_test.go:953
- Corrected spelling of 'InTransistion' to 'InTransition'.
func (m mockPhaseResult) InTransistion() bool {
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
537ad96 to
d9d9924
Compare
c3c0e14 to
fcb2a0f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 15 out of 15 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if err != nil { | ||
| if rres != nil { | ||
| l.Error(err, "revision reconcile failed") | ||
| l.V(1).Info("reconcile failure report", "report", rres.String()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@perdasilva here we need keep debug logs.
I will clarify that in the code and create a test
OR we must remove those reports they are too verbose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pedjak @perdasilva You both have been working with this more recently.
I also don’t find the report very valuable in the logs — it’s confusing and hard to read.
What do you think about removing the report entirely?
And instead, formatting the message in a clearer way, similar to what we do in other places, so it becomes more meaningful? It would either solve the bug above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've re-added it (but only in debug mode). Let's leave it like that for this PR and address this separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got up and was looking at this today.
I’m thinking the best solution might be similar to what we do for crddiff.
We need a format to extract the report information and normalize it into both machine-readable messages and human-readable logs. Then we can use it for any situation—info or error—while normalizing and stripping out only the data that is actually validatable.
This definitely needs to be done in a separate PR, and we’ll need to discuss a better long-term approach for handling this.
Thank you for the attention !!!
c/c @pedjak
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't necessarily disagree. But this is something we'll need to bake into the Boxcutter library itself. So, it's out of scope for this PR. I'll bring it up with Nico.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of error I think it would still very useful to log et error level, assuming that errors are happening occasionally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if there's a collision it will perma-occur. I think this was the original problem.
fcb2a0f to
2239a2c
Compare
2239a2c to
68a0700
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 15 out of 15 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
68a0700 to
98b7083
Compare
98b7083 to
af87fe6
Compare
af87fe6 to
2e4f6dd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 18 out of 18 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
2e4f6dd to
7fc2962
Compare
7fc2962 to
98112f6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 15 out of 15 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
pedjak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please update/extend PR title to summarize what this PR is about.
| // When Progressing is True and the Reason is Retrying, the ClusterExtension has encountered an error that could be resolved on subsequent reconciliation attempts. | ||
| // When Progressing is False and the Reason is Blocked, the ClusterExtension has encountered an error that requires manual intervention for recovery. | ||
| // <opcon:experimental:description> | ||
| // When Progressing is True and Reason is RollingOut, the ClusterExtension has one or more ClusterExtensionRevisions in active roll out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to report in the PR description the changes to ClusterExtension conditions, similar to what you did for ClusterExtensionRevision
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| // - When status is True and reason is RollingOut, the ClusterExtensionRevision rollout is actively making progress and is in transition. | ||
| // - When status is True and reason is Retrying, the ClusterExtensionRevision has encountered an error that could be resolved on subsequent reconciliation attempts. | ||
| // - When status is True and reason is Succeeded, the ClusterExtensionRevision has reached the desired state. | ||
| // - When status is False and reason is Blocked, the ClusterExtensionRevision has encountered an error that requires manual intervention for recovery. | ||
| // - When status is False and reason is Archived, the ClusterExtensionRevision is archived and not being actively reconciled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please update the table in the PR description, it looks it is out-of-date.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| // - When status is True and reason is ProbesSucceeded, the revision has been successfully rolled out and all objects pass their readiness probes. | ||
| // - When status is False and reason is ProbeFailure, one or more objects are failing their readiness probes during rollout. | ||
| // - When status is False and reason is ReconcileFailure, the revision has encountered a general reconciliation failure. | ||
| // - When status is False and reason is RevisionValidationFailure, the revision failed preflight validation checks. | ||
| // - When status is False and reason is PhaseValidationError, a phase within the revision failed preflight validation checks. | ||
| // - When status is False and reason is ObjectCollisions, objects in the revision collide with existing cluster objects that cannot be adopted. | ||
| // - When status is Unknown and reason is Reconciling, the revision has encountered an error that prevented it from observing the probes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the docs here and the table reported in the PR description are not in sync, the table has more rows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's right:
ProbesSucceeded/True
ProbeFailure/False
Reconciling/Unknown
Migrated/Unknown
Archived/Unknown
or is it the specific text?
I'm going to push an update anyway because in one text we say revision in another ClusterExtensionRevision. So, I'll make that consistent.
| if rres != nil { | ||
| // Log detailed reconcile reports only in debug mode (V(1)) to reduce verbosity. | ||
| l.V(1).Info("reconcile report", "report", rres.String()) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should move this after we checked for the errors, because in the case of errors we should still log with l.Error(..)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll leave it still as debug for now because the report can be VERY verbose. If we still want it, let's address it it in a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll leave it still as debug for now because the report can be VERY verbose. If we still want it, let's address it it in a follow-up.
Leaving as debug is fine for regular reconcile path (i.e. no errors). I was asking to print it to the debug level, only if there are no errors. In case of errors, I would still log it on the error level, so that could be caught in production (debug level is typically not set there).
| if err != nil { | ||
| if rres != nil { | ||
| l.Error(err, "revision reconcile failed") | ||
| l.V(1).Info("reconcile failure report", "report", rres.String()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of error I think it would still very useful to log et error level, assuming that errors are happening occasionally.
internal/operator-controller/controllers/clusterextensionrevision_controller.go
Outdated
Show resolved
Hide resolved
internal/operator-controller/controllers/clusterextensionrevision_controller_test.go
Outdated
Show resolved
Hide resolved
Co-authored-by: Per Goncalves da Silva <pegoncal@redhat.com> Co-authored-by: Predrag Knezevic <pknezevi@redhat.com Signed-off-by: Per Goncalves da Silva <pegoncal@redhat.com>
98112f6 to
2c73a3c
Compare
Description
Summary
This PR refactors the ClusterExtensionRevision status conditions to provide clearer, more actionable feedback about revision lifecycle states. The changes introduce a new Progressing condition, simplify condition reasons, and ensure that revision-level errors (retrying states) are properly surfaced to the parent ClusterExtension.
This PR does introduce API breaking changes. But, only to experimental APIs. It should be ok to override.
Key Changes
Simplified condition types:
Updated condition reasons to be more semantic and actionable:
API Changes:
Modified the Boxcutter applier to propagate retrying errors from ClusterExtensionRevision to the parent ClusterExtension. When a revision is in a Progressing=True state with Reason=Retrying, the error is now surfaced to the ClusterExtension, providing better visibility into failed reconciliation attempts. When the revision is Progressing=True/Succeeded the applier returns success.
Added end-to-end tests for ClusterExtensionRevision lifecycle scenarios to validate the new condition states and transitions.
ProgressionreasonClusterExtension Status Conditions
Progressing Condition (experimental only)
ClusterExtensionRevision Status Conditions
Progressing Condition
Indicates whether the revision is progressing to its next state. It follows the same semantic as the ClusterExtension and Deployment
Progressingcondition.Not a part of this PR yet, but
Progressingcan also beBlockedfor non-retryable errors.Available Condition
Indicates whether the revision's objects are available and passing the object status probes.
Succeeded Condition
Indicates whether the revision has successfully completed its rollout.
Migration Notes
Reviewer Checklist