Skip to content

Conversation

@kimpenhaus
Copy link
Collaborator

@kimpenhaus kimpenhaus commented Nov 4, 2025

Hey Christoph @buehler

This is a comprehensive PR containing changes from integrating the operator into our cluster environment.

Summary

This PR introduces breaking changes to the KubeOps SDK, implementing a result pattern inspired by the Go operator implementation. Controllers and finalizers now return ReconciliationResult<TEntity> enabling explicit success/failure states, centralized requeuing via RequeueAfter, and automatic finalizer lifecycle management. Additional improvements include extensible requeue mechanisms, const value support in source generators, and configurable leader election types.

Breaking Changes ⚠️

1. Result Pattern

Controller and finalizer interfaces now return Task<ReconciliationResult<TEntity>> instead of Task:

Before:

public interface IEntityController<TEntity>
{
    Task ReconcileAsync(TEntity entity, CancellationToken cancellationToken);
    Task DeletedAsync(TEntity entity, CancellationToken cancellationToken);
}

After:

public interface IEntityController<TEntity>
{
    Task<ReconciliationResult<TEntity>> ReconcileAsync(TEntity entity, CancellationToken cancellationToken);
    Task<ReconciliationResult<TEntity>> DeletedAsync(TEntity entity, CancellationToken cancellationToken);
}

The ReconciliationResult<TEntity> provides:

  • Success/failure status with error information
  • Optional RequeueAfter timespan for delayed reprocessing
  • Access to the updated entity after reconciliation (which allows, for example, changing the entity's state before finalizer detachment, which was not possible before as the entity would have been in a modified state)

Migration Example:

// Old implementation
public async Task ReconcileAsync(V1TestEntity entity, CancellationToken token)
{
    // ... reconciliation logic
}

// New implementation
public async Task<ReconciliationResult<V1TestEntity>> ReconcileAsync(V1TestEntity entity, CancellationToken token)
{
    // ... reconciliation logic

    // Success - requeue after 5 minutes
    return ReconciliationResult<V1TestEntity>.Success(entity, TimeSpan.FromMinutes(5));

    // Or failure with error message
    return ReconciliationResult<V1TestEntity>.Failure(entity, "Failed to process entity");
}

2. Namespace Reorganization

Types moved to new namespaces:

  • IEntityController<TEntity>: KubeOps.Abstractions.ControllerKubeOps.Abstractions.Reconciliation.Controller
  • IEntityFinalizer<TEntity>: KubeOps.Abstractions.FinalizerKubeOps.Abstractions.Reconciliation.Finalizer
  • EntityRequeue: KubeOps.Abstractions.QueueKubeOps.Abstractions.Reconciliation.Queue
  • IEntityRequeueFactory: KubeOps.Abstractions.QueueKubeOps.Abstractions.Reconciliation.Queue

Migration: Update using statements in your controllers and finalizers.

3. Queue Interface Changes

The internal queue interface is now public and extensible:

public interface ITimedEntityQueue<TEntity>
{
    Task Enqueue(TEntity entity, RequeueType type, TimeSpan requeueIn, CancellationToken cancellationToken);
    Task Remove(TEntity entity, CancellationToken cancellationToken);
}

This enables implementing durable requeue mechanisms (e.g., backed by Redis, Service Bus, database) by overriding the default in-memory implementation.

New Features

1. Automatic Finalizer Management

Two new settings provide automatic finalizer lifecycle management:

builder.Services
    .AddKubernetesOperator(settings =>
    {
        // Automatically attach finalizers during reconciliation (default: true)
        settings.AutoAttachFinalizers = true;

        // Automatically detach finalizers after successful finalization (default: true)
        settings.AutoDetachFinalizers = true;
    });

Benefits:

  • No manual finalizer management required
  • Consistent finalizer handling across operators
  • Reduces boilerplate code
  • Can be disabled for custom finalization workflows

2. Const Value Support in Source Generator

The syntax receiver now supports constant values in Kubernetes entity attributes:

public static class Constants
{
    public const string ApiGroup = "mycompany.com";
    public const string ApiVersion = "v1";
}

[KubernetesEntity(
    Group = Constants.ApiGroup,  // Const values now supported
    ApiVersion = Constants.ApiVersion,
    Kind = "MyResource")]
public class V1MyResource : CustomKubernetesEntity<V1MyResourceSpec>
{
}

Benefits:

  • Centralized API group/version management
  • Compile-time constant validation
  • Better code organization for multi-resource operators

3. Leader Election Type Configuration

Introduction of LeaderElectionType enum for explicit leader election configuration:

public enum LeaderElectionType
{
    None = 0,    // No leader election (default)
    Single = 1,  // Single leader election using Kubernetes leases
    Custom = 2   // Custom user-defined leader election mechanism
}

Configuration:

builder.Services
    .AddKubernetesOperator(settings =>
    {
        settings.LeaderElectionType = LeaderElectionType.Single;
        settings.LeaderElectionLeaseDuration = TimeSpan.FromSeconds(15);
        settings.LeaderElectionRenewDeadline = TimeSpan.FromSeconds(10);
        settings.LeaderElectionRetryPeriod = TimeSpan.FromSeconds(2);
    });

Benefits:

  • Explicit configuration of leader election behavior
  • Support for custom leader election implementations
  • Clear distinction between single-instance and multi-instance deployments

4. Extensible Requeue Mechanism

Introduction of RequeueType enum and ITimedEntityQueue<TEntity> interface:

public enum RequeueType
{
    Added,
    Modified,
    Deleted
}

Use Cases:

  • Implement durable requeue using external storage (Redis, Service Bus, database)
  • Survive operator restarts
  • Implement custom requeue strategies
  • Add monitoring and metrics for requeue operations

Example Implementation:

public class DurableEntityQueue<TEntity> : ITimedEntityQueue<TEntity>
{
    public async Task Enqueue(TEntity entity, RequeueType type, TimeSpan requeueIn, CancellationToken cancellationToken)
    {
        // Store in Redis/Database with execution time
        await _storage.SaveAsync(entity, type, DateTime.UtcNow.Add(requeueIn));
    }

    public async Task Remove(TEntity entity, CancellationToken cancellationToken)
    {
        // Remove from external storage
        await _storage.DeleteAsync(entity);
    }
}

5. ReconciliationContext

New context object providing metadata about reconciliation triggers:

public sealed record ReconciliationContext<TEntity>
{
    public TEntity Entity { get; }
    public WatchEventType EventType { get; }
    public ReconciliationTriggerSource ReconciliationTriggerSource { get; }
}

Helps distinguish between API server events and operator-initiated requeues.

Implementation Details

Core Components

  1. ReconciliationResult (src/KubeOps.Abstractions/Reconciliation/ReconciliationResult{TEntity}.cs)

    • Immutable record type with success/failure semantics
    • Optional requeue after duration
    • Error message and exception support
  2. Reconciler (src/KubeOps.Operator/Reconciliation/Reconciler.cs)

    • Centralized reconciliation orchestration
    • Handles controller and finalizer invocation
    • Manages generation-based caching
    • Automatic finalizer attachment/detachment
    • Better testability
  3. ITimedEntityQueue (src/KubeOps.Operator/Queue/ITimedEntityQueue{TEntity}.cs)

    • Public interface for queue implementations
    • Async methods with cancellation token support
    • Extensibility point for custom implementations

Alignment with Go Implementation

This implementation draws inspiration from controller-runtime (Go):

  • Result pattern for reconciliation outcomes
  • RequeueAfter concept for delayed reprocessing
  • Clear separation of success/error states
  • Flexible error handling strategies

Testing

  • ✅ Comprehensive unit tests for ReconciliationResult<TEntity>
  • ✅ Unit tests for ReconciliationContext<TEntity>
  • ✅ Integration tests for finalizer auto-attach/detach
  • ✅ Tests for const value support in syntax receiver
  • ✅ Queue functionality tests with new RequeueType
  • ✅ All existing integration tests updated and passing

Documentation

  • Updated controller examples with new result pattern
  • Added advanced configuration guide
  • Updated finalizer documentation with auto-attach/detach settings
  • Added caching documentation
  • Migration guide included in this PR description

Additional Notes

Migration Checklist

For operators upgrading to this version:

  • Update controller methods to return ReconciliationResult<TEntity>
  • Update finalizer methods to return ReconciliationResult<TEntity>
  • Update namespace imports for reconciliation types
  • Review automatic finalizer settings (defaults are enabled)
  • Review leader election configuration (default: None)
  • Consider using const values for entity attributes (optional)
  • Test requeue behavior with new result pattern
  • Review error handling using result pattern instead of exceptions

kimpenhaus added 30 commits June 4, 2025 07:35
# Conflicts:
#	src/KubeOps.Abstractions/KubeOps.Abstractions.csproj
…g (hybrid cache)

- Integrated FusionCache for robust caching in resource watchers.
- Enhanced default configuration with extensible settings in `OperatorSettings`.
- Improved concurrency handling using `SemaphoreSlim` for entity events.
- Updated tests and dependencies to reflect caching changes.
…nt entity locks

- Renamed `DefaultCacheConfiguration` to `DefaultResourceWatcherCacheConfiguration` for clarity.
- Introduced cache key prefix to improve cache segmentation.
- Removed `ConcurrentDictionary` for entity locks to simplify concurrency management.
- Refactored event handling logic for "added" and "modified" events to streamline codebase.
- Updated `ConfigureResourceWatcherEntityCache` to use `IFusionCacheBuilder` for extensibility.
- Moved resource watcher cache setup logic to `WithResourceWatcherCaching` extension.
- Added detailed XML comments for `EntityLoggingScope` to improve documentation.
- Removed redundant `DefaultResourceWatcherCacheConfiguration`.
- Renamed `WithResourceWatcherCaching` to `WithResourceWatcherEntityCaching` for clarity.
- Updated `CacheExtensions` to be `internal` to limit scope.
- Removed unused dependency on `ZiggyCreatures.Caching.Fusion`.
- Added a new `Caching` documentation page explaining resource watcher caching with FusionCache and configuration options (in-memory and distributed).
- Updated sidebar positions for `Deployment`, `Utilities`, and `Testing` to accommodate the new `Caching` page.
…usionCache details

- Improved explanations for in-memory and distributed caching setups.
- Added example code for customizing resource watcher cache with FusionCache.
- Included references to FusionCache and Redis documentation for further guidance.
# Conflicts:
#	src/KubeOps.Operator/Watcher/ResourceWatcher{TEntity}.cs
# Conflicts:
#	examples/Operator/Finalizer/FinalizerOne.cs
#	src/KubeOps.Abstractions/KubeOps.Abstractions.csproj
#	src/KubeOps.Operator/Builder/CacheExtensions.cs
#	src/KubeOps.Operator/Constants/CacheConstants.cs
#	src/KubeOps.Operator/KubeOps.Operator.csproj
#	src/KubeOps.Operator/Watcher/ResourceWatcher{TEntity}.cs
…ependency

- Removed redundant requeue logic and optimized entity cache operations during deletion in `ResourceWatcher`.
- Upgraded `ZiggyCreatures.FusionCache` to version `2.4.0`.
- Introduced `RequeueType` enumeration to specify requeue operation types (`Added`, `Modified`, `Deleted`).
- Implemented `RequeueTypeExtensions` for mapping `WatchEventType` to `RequeueType`.
- Updated requeue mechanism to include `RequeueType` in `EntityRequeue` and related methods.
- Refactored `TimedEntityQueue` and related classes to support `RequeueEntry` containing both the entity and its requeue type.
- Adjusted tests to incorporate `RequeueType` into entity requeue logic.
… reconciliation logic

- Created `IReconciler<TEntity>` interface and its implementation to handle entity creation, modification, and deletion.
- Updated `ResourceWatcher` and `EntityRequeueBackgroundService` to use `Reconciler` for reconciliation operations.
- Removed redundant FusionCache dependency from `ResourceWatcher` and related classes.
- Streamlined requeue mechanics and replaced entity finalization logic with `Reconciler` integration.
- Registered `IReconciler<TEntity>` and its implementation `Reconciler<TEntity>` in the service container.
- Ensured proper integration with existing requeue and entity processing workflows.
…-attach/detach options

- Added `AutoAttachFinalizers` and `AutoDetachFinalizers` settings in `OperatorSettings`, enabling automatic management of entity finalizers during reconciliation.
- Extended `Reconciler` to respect these settings for adding and removing finalizers.
- Introduced `EntityFinalizerExtensions` for streamlined finalizer handling and identifier generation.
- Updated relevant interfaces and documentation for improved clarity and usability.
…ant values

- Update `KubernetesEntitySyntaxReceiver` to utilize `SemanticModel` for attribute argument resolution, ensuring accurate value retrieval.
- Updated `EntityFinalizerExtensions` to correctly append "finalizer" when missing from the name.
- Added unit tests to validate finalizer identifier generation, including cases for length limits and naming consistency.
- Renamed test cases and entities for improved clarity and consistency.
- Added new tests for entities with no group values and entities with varying group definitions.
- Adjusted expected
…interface for improved flexibility

- Extracted `ITimedEntityQueue` interface from `TimedEntityQueue` implementation.
- Updated all usages, including services and tests, to rely on the interface.
- Added extension methods for requeue key management.
- Improved code consistency and maintainability across the queue system.
…r election

- Replaced `EnableLeaderElection` with `LeaderElectionType` in `OperatorSettings` for enhanced configurability.
- Added `LeaderElectionType` enum with options: None, Single, and Custom.
- Updated `OperatorBuilder` to handle leader election logic based on `LeaderElectionType`.
- Modified `EntityRequeueBackgroundService` to public visibility and implemented proper `Dispose` logic.
- Adjusted tests to reflect new leader election mechanism.
- Improved code maintainability and alignment with distributed system requirements.
@kimpenhaus kimpenhaus changed the title Introduce Result Pattern and Automatic Finalizer Management feat!: introduce result-pattern and automatic finalizer management Nov 4, 2025
@buehler
Copy link
Collaborator

buehler commented Nov 7, 2025

Hey @kimpenhaus
Wow. Thanks for this big contribution! It will surely take a while to comprehend what you've done :-)

One question to start with: The first change with the returning result pattern. This was implmeneted in a long past version of the sdk (v6 or so) and I changed it because I thought it is more extensible when you inject finalizer and requeue mechanisms instead of relying on return values. The return values are parsed by the SDK core engine and thus, feature implementations and enhancements must also touch the core. With the injection of such extensions (e.g. finalizers, requeue), you can provide those without touching the core. Or: at least, that was my intention.

wdyt?

@kimpenhaus
Copy link
Collaborator Author

Hey Christoph @buehler,

Yeah - take your time. I know it's a lot of work on your side. I appreciate the time you'll be investing - thanks for that. 🙏🏼

Regarding your point on the result pattern:

My intention is as follows:

  • We had trouble with changing the entity while finalizing. With automatic detaching and no real option to route back the changed entity, this leads to a 409 conflict as the entity has already changed when trying to remove the finalizer. Returning the entity solves this issue.

  • Regarding the flexibility you mentioned: from my point of view, you don't lose it - as retryAfter is optional. The idea was to optimize recurring code within the controller and finalizer, and remove code that can be centralized and might otherwise blur the logic in the controller and finalizer.

  • Finalizer attaching and detaching could be configured, which also helps remove redundant, recurring code from the finalizer.

  • I tried to orient the design around the Go implementation, which handles it in a similar way.

  • In my opinion, this helps to better follow the responsibilities of each component (that's why I also introduced the reconciler).

Looking forward to your feedback! 😊

# Conflicts:
#	src/KubeOps.Abstractions/Entities/KubernetesExtensions.cs
#	src/KubeOps.Operator/Watcher/ResourceWatcher{TEntity}.cs
#	test/KubeOps.Operator.Test/KubeOps.Operator.Test.csproj
#	test/KubeOps.Transpiler.Test/KubeOps.Transpiler.Test.csproj
@ralf-cestusio
Copy link

I wanted to give some qualitative feedback (completely from a user perspective)

I have adapted out operator to use this pr and i really like how development feels.
Especially finalizer management has become a lot more expressive and easier to use.
The ability to handle detaching a finalizer with the result pattern (or scheduling a rerun of the finalizer in case it is not done yet) feels a lot more natural.

Normal reconcile code has also become more readable and i managed to now avoid all 409 errors, because we can harness updates more easily.

@buehler
Copy link
Collaborator

buehler commented Nov 11, 2025

Cool! Thanks for the insights. Definitely looking forward to the code :)

@buehler buehler self-requested a review November 12, 2025 10:17
Copy link
Collaborator

@buehler buehler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this big contribution! I really like the changes and I'm looking forward to see how people integrate those into their operators. Getting more aligned with the go implementation makes sense apparently.

I do have some minor comments/questions. Feel free to comment on them and let us have a discussion :-)

Thanks again!

entity.Kind,
entity.Name(),
identifier);
return ReconciliationResult<TEntity>.Success(entity);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "not being able to finalize" a success or rather an error? Depends on the finalizer I guess. wdyt?

Copy link
Collaborator Author

@kimpenhaus kimpenhaus Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the original idea I had was just to differentiate between exception == failure and no exception == success but when @ralf-cestusio commented on one of my commits he raised the same question which makes me now indecisive if its usage is clear.

the difference between both (in the end) is, that a failure leads to an error log message. alls other should be nearly the same. nevertheless the usage should be clear to the consumer.

based on the discussion with the cascading foreground cascading deletion finalizer and the fact that a failure would lead into the error log I guess success is the way to go here.

…ate reconciliation method signatures

- Marked entity-related classes as sealed for improved clarity and security.
- Adjusted reconciliation and finalizer method return types to use `ReconciliationResult` in dotnet templates.
- Simplified condition checks by replacing `IsFailure` with `!IsSuccess`.
- Updated related tests and logic to reflect the removal of `IsFailure` property.
…ialization and simplify object creation

- Replaced factory method with init-only properties in `RequeueEntry`.
- Enhanced instantiation of `TimedQueueEntry` with object initializer syntax.
- Added XML documentation for improved code readability.
if (scope.ServiceProvider.GetKeyedService<IEntityFinalizer<TEntity>>(identifier) is not
{ } finalizer)
{
logger.LogDebug(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should be a warning.
There are valid cases why there is a finalizer on a resource, but its not handled by the operator.
A good example is a cascading foreground delete where k8s itself sets a finalizer

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ralf-cestusio @buehler I had a discussion about that with a colleague yesterday - before changing it to a warning. we also had the cascading foreground deletion in mind - as we saw that when using lens a couple of times. but we came to the conclusion that it's not bad at least to see a warning in the logs - what do you think is wrong with a warning and maybe double check whether it is all well or maybe not?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. It depends on the use-case. I think both arguments are valid. We can have it on the middle ground and use "info" for the log severity? It is not necessarily a warning, but also debug feels a little to less.

How about we use Info?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me Warnings are an operational indicator that something is going wrong.
And this for me does not meet that criteria. Since the valid case when it is logged is when a programming error occured and we forgot to register a finalizer (and this would not even happen with the autoattach set to true, because in that case the finalizer would not have been added in the first case).
So i think Info would be the better level

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with information - updated it :)

Copy link
Collaborator

@buehler buehler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! :)
One or two suggestions and discussions are open, but then I'll guess we're good to go!

… for reconciliations

- Introduced `RequeueStrategy` to configure reconciliation queuing behavior - addressing to proper configure custom requeue strategies
- Enhanced `EntityLoggingScope` with additional metadata and public visibility - addressing usage in custom leadership- or requeue overloads
- Updated background services to support activity-based tracing and scoped logging - addressing missing log information
- Adjusted `RequeueEntry` to use `struct` for performance benefits.
@kimpenhaus
Copy link
Collaborator Author

kimpenhaus commented Nov 14, 2025

hey Christoph @buehler - sorry for pushing new changes 🙈 I just saw that there were some logs missing (and an activity) - so tried to align that for consistency. also it doesn't felt good how to "configure" custom requeue mechanism, so I mad it configurable same way as finalizer handling.

this should it be for now - (except changes to the docs to reflect all the changes here - but that will be one last commit) I have two open points I'd like to discuss but will move them maybe to the discussion first.

@buehler
Copy link
Collaborator

buehler commented Nov 14, 2025

No worries. which parts do you want to discuss?

@kimpenhaus
Copy link
Collaborator Author

No worries. which parts do you want to discuss?

  1. I think there is an issue between deserialization in the resource watcher (which is done through the Kubernetes.Client --> KubernetesJson) and the deserialzation in the admission webhooks (validate/mutate) (which is done by the default System.Text.Json). Why do I think that? We have - not sure if it's uncommon - n entity model containing some TimeSpan properties. In the KubernetesJson there is a special converter enforcing the ISO8601 duration format. (eg: PT1H if it's 1 hour). This could be proper deserialized/serialized by the KubernetesClient but with the default System.Text.Json this will fail in the validation webhook - expecting it to be eg 01:00:00 as standard timespan format. My idea was to give the admission webhooks a special modelbinder using the KubernetesJson for the deserialization - but from what I saw this is currently not possible: I asked on the KubernetsClient side: client deserialization <-> validation webhook deserialization kubernetes-client/csharp#1683

  2. when deleting a entity we wanted to set the state to terminating when the finalizer is triggered. this leads to a crd change which fires a modified event (with deletion timestamp - a new resource version - same generation). we had an error in our code which made the finalizer fail - what happens now is an infinite loop :) what we figured out is whenever the finalizer couldn't be detached but the crd gets modified this leads to an infinite loop. in the case where finalizing a crd leads to deletion the modified event is skipped (because the entity doesn't exists anymore) no idea/experience how to proper solve this but I saw other crd's having a state reflecting when it comes to finalization.

  3. not sure about this: but I think that the requeue service and the resource watcher can lead to concurrent/parallel reconciliation which might lead to conflicts. not sure if that is intentional or a common use case and therefore accepted. but it feels kind of a overhead to reconcile the one on top of the other in parallel

I know these are 3 points 🤣

@ralf-cestusio
Copy link

I wanted to chime in on 3. the potential parallel execution of watcher and requeue.
There was a bug mentioning this a few weeks ago: #977
I feel it can lead to some rather hard to debug 409 errors. But i have not experienced this myself so i find it hard to judge how disruptive this behavior is.
But this PR is already very large so i am not sure we should address this one in the same PR.

@kimpenhaus
Copy link
Collaborator Author

I wanted to chime in on 3. the potential parallel execution of watcher and requeue. There was a bug mentioning this a few weeks ago: #977 I feel it can lead to some rather hard to debug 409 errors. But i have not experienced this myself so i find it hard to judge how disruptive this behavior is. But this PR is already very large so i am not sure we should address this one in the same PR.

thanks @ralf-cestusio - honestly I hadn't checked for existing issues. this 3 points weren't planned to go in to this PR :) these were just some points I'd like to discuss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants