Add DataProducer PrepareData and Admission control plugins #1796

rahulgurnani · 2025-10-31T16:22:18Z

Add PrepareData and AdmitRequest plugins based on recommendations in evolving datalayer changes

The prepare data plugins are executed sequentially in this PR.
Furthermore, the prefix cache match plugin is updated to implement the PrepareRequestData plugin by making minimal changes.

In a follow up PR, the prepare data plugins would be executed in order of dependency graph and validated on startup for cycles. Also, the prefix cache match plugin will be split into a separate scorer in future.

The PR also does some refactor of director.go code.

cc @BenjaminBraunDev

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR is needed to make it easier to implement plugins that produce data consumed by other plugins. For instance latency predictor, prefix match plugin etc.

Read the doc evolving datalayer changes
for more details

Which issue(s) this PR fixes:

Addresses #1743

Does this PR introduce a user-facing change?:

Yes, enables writing prepare data and admit request plugins for users of IGW.

Add prepare data and admit request plugins

netlify · 2025-10-31T16:22:24Z

✅ Deploy Preview for gateway-api-inference-extension ready!

Name	Link
🔨 Latest commit	`f35f05e`
🔍 Latest deploy log	https://app.netlify.com/projects/gateway-api-inference-extension/deploys/6913dfadb539d8000826a811
😎 Deploy Preview	https://deploy-preview-1796--gateway-api-inference-extension.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

k8s-ci-robot · 2025-10-31T16:22:26Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: rahulgurnani
Once this PR has been reviewed and has the lgtm label, please assign nirrozenbaum for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2025-10-31T16:22:29Z

Hi @rahulgurnani. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

kfswain · 2025-10-31T20:50:01Z

/ok-to-test

kfswain · 2025-10-31T20:11:13Z

pkg/epp/requestcontrol/director.go

+	loggerDebug := log.FromContext(ctx).V(logutil.DEBUG)
+	for _, plugin := range d.requestControlPlugins.admitRequestPlugins {
+		loggerDebug.Info("Running AdmitRequest plugin", "plugin", plugin.TypedName())
+		if !plugin.Admit(ctx, request, pods) {


We should allow the plugin to return a string explaining the reason for rejection. We can then just treat the empty string as the allow mechanism (less opinionated on this part tho).

Updated, thanks!

pkg/epp/datalayer/attributemap.go

pkg/epp/requestcontrol/director.go

pkg/epp/requestcontrol/plugins.go

kfswain · 2025-10-31T20:45:42Z

pkg/epp/scheduling/types/types.go

+
+// Attributes provides a goroutine-safe implementation of AttributeMap.
+type Attributes struct {
+	data sync.Map // key: attribute name (string), value: attribute value (opaque, Cloneable)


Typically I am all for using prebuild libs to handle this type of complexity.

But since writes to specific attributes will lock the entire data object, we may have high lock contention here. Did we explore having a lock per attribute key?

That would let locks be at the granularity of a specific endpoint & a specific attribute, which should hopefully reduce lock contention & let our system be more performant.

Thanks for the suggestion. This attribute map is a per request copy where we take a snapshot of the attributes so that we can use them in the scheduling layer. Given the per request nature of the map, I think it won't have contention because it will take like t < microsecond to update the map. I think its reasonable to use sync map here.

Can we scale test to ensure we don't have any regression? We can consider baseline metrics what we have here: #1458

I plan to do this in next PRs since we are not actually using the map in this change. Thanks!

pkg/epp/scheduling/types/types.go

pkg/epp/requestcontrol/director_test.go

pkg/epp/requestcontrol/director.go

pkg/epp/scheduling/types/types.go

pkg/epp/requestcontrol/plugins.go

nirrozenbaum · 2025-11-03T12:14:44Z

pkg/epp/requestcontrol/director.go

 		return reqCtx, errutil.Error{Code: errutil.ServiceUnavailable, Msg: "failed to find candidate pods for serving the request"}
 	}
-
+	// TODO(rahulgurnani/lukevandrie): Perhaps, refactor/implement Admit plugin for Admission control.


this comment is important. what is the relation between admission controller Admit and the admitRequest plugins?

++

This would be a break in contract of how flow control operates. This specific plugin is for request specific semantics. We have currently do not have Flow Control considering request specific semantics, and there hasnt been a proposal suggesting this change. I think we should remove this todo until we have strong reasoning to actually do this work.

my apologize, but I don't understand the intention of this new Admission plugin.
I thought we want to have admission check pluggable, but it seems now that we have two types of admission checks, with two different interfaces.
this seems wrong.

Removed the comment. Thanks for the catch!

pkg/epp/requestcontrol/director.go

nirrozenbaum · 2025-11-03T12:24:06Z

pkg/epp/requestcontrol/director.go

-	result, err := d.scheduler.Schedule(ctx, reqCtx.SchedulingRequest, d.toSchedulerPodMetrics(candidatePods))
+	// Prepare per request data
+	// TODO(rahulgurnani): Add retries and timeout in the preparedata step.
+	d.runPrepareDataPlugins(ctx, reqCtx.SchedulingRequest, snapshotOfCandidatePods)


why do we create snapshot of candidate pods?
we should work with candidatePods and create a snapshot only when calling the scheduler.
this is true for both prep data and admit request.
helper functions in the director should not rely on the internal scheduler representation of the endpoints.

both prep data and admit request are request specific, and so if we add request specific data to the shared endpoints that could risk data corruption.

Snapshotting before these steps ensures that this data lifecycle is only in the context it is consumed in.

the intention of converting the endpoints representation to scheduler internal structure was only for the purpose of sending it to the scheduler.
PodMetrics has MetricsState behind atomic pointer and reading the metrics is an atomic operation (read all metrics in one operation).
I must be missing something although I've read the proposal doc.

pkg/epp/requestcontrol/director.go

pkg/epp/requestcontrol/request_control_config.go

nirrozenbaum · 2025-11-04T08:24:01Z

pkg/epp/requestcontrol/request_control_config.go

 	return c
 }

+// WithPrepareDataPlugins sets the given plugins as the PrepareData plugins.


Suggested change

// WithPrepareDataPlugins sets the given plugins as the PrepareData plugins.

// WithDataProducers sets the given plugins as the DataProducer plugins.

@nirrozenbaum I reverted the change. The name for the plugin to be PrepareDataPlugin is apt becuase the plugin could produce as well as consume data. Please review the last commit.

nirrozenbaum · 2025-11-04T08:35:58Z

pkg/epp/requestcontrol/director.go


-	result, err := d.scheduler.Schedule(ctx, reqCtx.SchedulingRequest, d.toSchedulerPodMetrics(candidatePods))
+	// Run admit request plugins
+	if !d.runAdmitRequestPlugins(ctx, reqCtx.SchedulingRequest, snapshotOfCandidatePods) {


can you give an example of how AdmissionPlugin is using candidate pods?
IMO admission check should include only the request (including its metadata like headers) and the system state (like saturation or other system metrics).

@kfswain I really think we should think carefully on the interfaces and align on them.
it doesn't make sense to me to pass the candidate pods to admission check.

I would be happy to get an example for an admission check that is per request data and depends on the candidate pods (other than saturation that is checked separately).
I'm a bit concerned about over complication here, and unless there is a real use case for that we should probably wait with this addition (talking only about admission, data producer/consumer is clearly needed).

separately, I think there is a missing struct, like RequestState, that conceptually should be similar to PluginState we have today. RequestState should be created at the beginning of the request, and passed around for RequestDataProducers to be filled, and later on to ConsumerPlugins (probably instead of CycleState).

I am mostly following recommendation in https://docs.google.com/document/d/1EQwXL2pCuUyM1B917FUgP_8pFS3VF8F_bUfjy8IE7gM/edit?tab=t.vmaefhinvkl5#heading=h.5qzcvbilfn6d

I think we would be adding the first AdmitRequest plugin with latency predictor.

The AdmitRequest plugin would consume data produced by latency predictor (producer)

@kfswain please keep me honest here. Thanks!

…in parallel

…omments.

…lso added more tests and some refactoring

rahulgurnani · 2025-11-11T20:20:15Z

Hi @kfswain , @nirrozenbaum

I updated the PR to not have the DAG validation and parallel execution changes. Instead, made some changes to migrate the prefix cache match plugin to use the new prepare data plugin. I will follow up with the DAG related changes in a follow up PR, they are currently staged at: rahulgurnani@18592b7

pkg/epp/requestcontrol/director.go

…st if prepare data call fails

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Oct 31, 2025

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 31, 2025

k8s-ci-robot requested review from ahg-g and elevran October 31, 2025 16:22

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Oct 31, 2025

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 31, 2025

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 31, 2025

kfswain reviewed Oct 31, 2025

View reviewed changes

pkg/epp/requestcontrol/director.go Outdated Show resolved Hide resolved

rahulgurnani force-pushed the preparedata-changes branch from 63208cf to bf49fb5 Compare October 31, 2025 22:56

nirrozenbaum reviewed Nov 2, 2025

View reviewed changes

pkg/epp/requestcontrol/director.go Outdated Show resolved Hide resolved

rahulgurnani force-pushed the preparedata-changes branch 3 times, most recently from 3e16069 to e3021c7 Compare November 3, 2025 03:13

nirrozenbaum reviewed Nov 3, 2025

View reviewed changes

pkg/epp/scheduling/types/types.go Outdated Show resolved Hide resolved

nirrozenbaum reviewed Nov 3, 2025

View reviewed changes

pkg/epp/requestcontrol/plugins.go Outdated Show resolved Hide resolved

nirrozenbaum reviewed Nov 3, 2025

View reviewed changes

pkg/epp/requestcontrol/plugins.go Outdated Show resolved Hide resolved

nirrozenbaum reviewed Nov 3, 2025

View reviewed changes

pkg/epp/requestcontrol/director.go Outdated Show resolved Hide resolved

nirrozenbaum reviewed Nov 3, 2025

View reviewed changes

nirrozenbaum reviewed Nov 4, 2025

View reviewed changes

pkg/epp/requestcontrol/director.go Outdated Show resolved Hide resolved

nirrozenbaum reviewed Nov 4, 2025

View reviewed changes

pkg/epp/requestcontrol/request_control_config.go Outdated Show resolved Hide resolved

nirrozenbaum reviewed Nov 4, 2025

View reviewed changes

pkg/epp/requestcontrol/request_control_config.go Outdated Show resolved Hide resolved

nirrozenbaum reviewed Nov 4, 2025

View reviewed changes

rahulgurnani force-pushed the preparedata-changes branch 2 times, most recently from f28020a to b85113f Compare November 10, 2025 23:42

rahulgurnani added 8 commits November 11, 2025 06:31

Refactor director to split into smaller functions

b43c44d

Add AdmitRequest and PrepareData plugins

ab723fe

Add unit tests and comments

8cd3b61

Add comments

303e401

Address review comments

9c5276b

Make PrepareData step time bound and execute all preparedata plugins …

75de637

…in parallel

Update interface names based on suggestions

47fc3a5

Update test and remove duplicate AttributeMap. Address other review c…

4bbfcbe

…omments.

rahulgurnani changed the title ~~Add DataProducer PrepareData and Admission control plugins~~ [WIP] Add DataProducer PrepareData and Admission control plugins Nov 11, 2025

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 11, 2025

rahulgurnani force-pushed the preparedata-changes branch from b85113f to e52f121 Compare November 11, 2025 19:29

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 11, 2025

rahulgurnani force-pushed the preparedata-changes branch from e52f121 to 7ae9f7e Compare November 11, 2025 19:51

rahulgurnani changed the title ~~[WIP] Add DataProducer PrepareData and Admission control plugins~~ Add DataProducer PrepareData and Admission control plugins Nov 11, 2025

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 11, 2025

rahulgurnani force-pushed the preparedata-changes branch from 7ae9f7e to e3efe88 Compare November 11, 2025 20:02

rahulgurnani added 2 commits November 11, 2025 20:03

Execute prepare data plugins sequentially with retries and timeout. A…

01b6b3b

…lso added more tests and some refactoring

Update prefix match plugin to implement PrepareData plugin

3c7c320

rahulgurnani force-pushed the preparedata-changes branch 2 times, most recently from 9b145a6 to f9c580c Compare November 11, 2025 20:17

Add back stashed changes. Update outdated comments.

acd9db0

rahulgurnani force-pushed the preparedata-changes branch from f9c580c to acd9db0 Compare November 11, 2025 21:10

BenjaminBraunDev reviewed Nov 12, 2025

View reviewed changes

pkg/epp/requestcontrol/director.go Outdated Show resolved Hide resolved

BenjaminBraunDev reviewed Nov 12, 2025

View reviewed changes

pkg/epp/requestcontrol/director.go Outdated Show resolved Hide resolved

BenjaminBraunDev reviewed Nov 12, 2025

View reviewed changes

pkg/epp/requestcontrol/director.go Outdated Show resolved Hide resolved

Update function names and remove extra methods. Also don't fail reque…

f35f05e

…st if prepare data call fails

	// WithPrepareDataPlugins sets the given plugins as the PrepareData plugins.
	// WithDataProducers sets the given plugins as the DataProducer plugins.

Add DataProducer PrepareData and Admission control plugins #1796

Are you sure you want to change the base?

Add DataProducer PrepareData and Admission control plugins #1796

Conversation

rahulgurnani commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netlify bot commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for gateway-api-inference-extension ready!

Uh oh!

k8s-ci-robot commented Oct 31, 2025

Uh oh!

k8s-ci-robot commented Oct 31, 2025

Uh oh!

kfswain commented Oct 31, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rahulgurnani Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nirrozenbaum Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kfswain Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nirrozenbaum Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rahulgurnani Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rahulgurnani commented Oct 31, 2025 •

edited

Loading

netlify bot commented Oct 31, 2025 •

edited

Loading

rahulgurnani Nov 3, 2025 •

edited

Loading

nirrozenbaum Nov 3, 2025 •

edited

Loading

kfswain Nov 3, 2025 •

edited

Loading

nirrozenbaum Nov 4, 2025 •

edited

Loading

rahulgurnani Nov 4, 2025 •

edited

Loading