Skip to content

Conversation

@LukeAVanDrie
Copy link
Contributor

@LukeAVanDrie LukeAVanDrie commented Nov 7, 2025

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

This PR simplifies the core queuing contracts within the Flow Control framework.

This is achieved through two primary, related changes:

1. Simplifying the SafeQueue Contract

The SafeQueue interface and its implementations are designed for unbounded, in-memory, synchronous flow control. This context allows for a much simpler, more idiomatic, and higher-performance contract:

  • Infallible Operations: Most methods (Add, Peek*, Drain, Cleanup) cannot logically fail, as OOM conditions are panics, and capacity is managed externally. Their error returns have been removed.
  • Idiomatic Empty State: PeekHead and PeekTail now return nil when a queue is empty, which is more idiomatic Go than returning ErrQueueEmpty.
  • Remove Remains Fallible: As it deals with externally provided handles, Remove remains the only fallible operation.

This refactoring removes unnecessary defensive checks and error handling from the hot path, simplifying both the queue implementations and the caller logic in the ShardProcessor.

2. Ensuring Correctness via Composition (ManagedQueue)

As part of simplifying the SafeQueue interface to use an infallible Add() method, a potential design conflict was identified. The higher-level ManagedQueue component requires a fallible Add operation to atomically reject requests when its parent shard is DRAINING.

To address this, this PR refactors ManagedQueue to favor composition over embedding:

  • ManagedQueue no longer embeds SafeQueue; it contains it as a field, delegating to it accordingly.
  • Its Add method maintains its own explicit, fallible contract (-> error), allowing it to perform the isDraining() check and the addition to the underlying queue as a single, atomic, lock-protected operation.
  • Its Cleanup and Drain methods follow the simplified SafeQueue contract.

Specific Changes:

  • contracts.ManagedQueue:
    • No longer embeds framework.SafeQueue; it now contains it via composition.
    • Add(item) still returns an error to atomically handle the ErrShardDraining case while Cleanup and Drain take on the simplified SafeQueue contract.
  • framework.SafeQueue:
    • Add(item) now returns no error.
    • PeekHead() and PeekTail() now return (types.QueueItemAccessor) and nil if empty.
    • Drain() and Cleanup() now return ([]types.QueueItemAccessor).
  • framework.ErrQueueEmpty: Removed.
  • All callers (controller, registry) have been updated to conform to these new, more robust contracts in addition to tests and mocks.

Which issue(s) this PR fixes:

Tracks #1794

Does this PR introduce a user-facing change?:

NONE

This commit refactors the `framework.SafeQueue` interface to remove
error returns from methods where failure is not expected in normal
operation (Add, PeekHead, PeekTail, Drain, Cleanup).

The interface now relies on the following premises:
- Queues are unbounded and in-memory.
- OOM is a panic, not a handled error.
- Internal callers are trusted not to provide nil items.
- PeekHead/PeekTail return nil for empty queues.

This change simplifies the interface and reduces unnecessary error
checking on the hot path.

This commit updates the queue implementations (ListQueue, MaxMinHeap),
mocks, and all callers within the `pkg/epp/flowcontrol/framework/...`
tree to conform to the new interface.

A subsequent commit will address callers outside this tree.
This commit refactors the `ManagedQueue` contract after the preceding
commit which altered the `SafeQueue` contract.

Motivation: The preceding commit made the generic
`framework.SafeQueue.Add` method infallible. This created a design
conflict, as the higher-level `ManagedQueue` decorator requires a
*fallible* `Add` operation to atomically reject requests when its parent
shard is draining.

This commit addresses this by refactoring `ManagedQueue` to
favor composition over embedding:
- `ManagedQueue` no longer embeds `SafeQueue`; it now contains it as a
  field, correctly modeling the "has-a" relationship.
- The `ManagedQueue.Add` method now has its own explicit, fallible
  contract, returning `contracts.ErrShardDraining`.
- The `Remove` and `Cleanup` operations follow the new, simpler
  `SafeQueue` contract.

This change also includes the necessary updates to all callers in the
`controller` and `registry` packages to conform to these new, more
robust contracts as well as updating tests and mocks.
@k8s-ci-robot k8s-ci-robot added the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label Nov 7, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: LukeAVanDrie
Once this PR has been reviewed and has the lgtm label, please assign kfswain for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 7, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @LukeAVanDrie. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 7, 2025
@netlify
Copy link

netlify bot commented Nov 7, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit 2374184
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/690e60c67fcc920008f5305b
😎 Deploy Preview https://deploy-preview-1836--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@danehans
Copy link
Contributor

danehans commented Nov 7, 2025

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 7, 2025
@LukeAVanDrie
Copy link
Contributor Author

This is a fairly large diffbase. The core change is simply removing error return values that can never be reached in practice from the plugin contract. The bulk of the diff is modifying all call sites of the updated methods below and removing error handling logic.

  • SafeQueue.Add(item types.QueueItemAccessor) error --> Add(item types.QueueItemAccessor)
  • SafeQueue.Cleanup(predicate PredicateFunc) (cleanedItems []types.QueueItemAccessor, err error) --> Cleanup(predicate PredicateFunc) []types.QueueItemAccessor
  • SafeQueue.Drain() (drainedItems []types.QueueItemAccessor, err error) --> Drain() []types.QueueItemAccessor
  • SafeQueue.PeekHead() (item types.QueueItemAccessor, err error) --> PeekHead() types.QueueItemAccessor
  • SafeQueue.PeekTail() (item types.QueueItemAccessor, err error) --> PeekTail() types.QueueItemAccessor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants