Skip to content

Conversation

@danehans
Copy link
Contributor

@danehans danehans commented Oct 24, 2025

What type of PR is this?

/kind test
/area conformance-test

What this PR does / why we need it:

  • Updates the header-based-testing-filter to filter selects pods whose "IP" or "IP:port" matches any value in the "test-epp-endpoint-selection" header. If a port is provided, only an exact "IP:port" match is accepted.
  • Adds a conformance test that asserts data parallelism by using the "X-Echo-HTTP-Port" response header introduced in Conformance: Adds Port Response Header to Echo Server gateway-api#4230.
  • Bumps the EPP image tag in conformance to the tip of main, e.g. v20251105-cbb8928.

Which issue(s) this PR fixes:

Fixes #1680

Does this PR introduce a user-facing change?:

Conformance: Updates the header-based-testing-filter to filter selects pods whose "IP" or "IP:port" matches any value in the "test-epp-endpoint-selection" header. Adds test to exercise data parallelism routing. Bumps conformance EPP image tag to `v20251105-cbb8928`.

@k8s-ci-robot k8s-ci-robot added the area/conformance-test Issues or PRs related to Conformance tests. label Oct 24, 2025
@k8s-ci-robot
Copy link
Contributor

@danehans: The label(s) kind/test cannot be applied, because the repository doesn't have them.

In response to this:

What type of PR is this?
/kind test
/area conformance-test

What this PR does / why we need it:

  • Adds a conformance test that tests routing to endpoints with data parallelism enabled.
  • Bumps the EPP image tag in conformance to v20251023-d788a2c.

Which issue(s) this PR fixes:

Fixes #1680

Does this PR introduce a user-facing change?:

Conformance: Adds test to exercise data parallelism routing. Bumps confomrance EPP image tag to `v20251023-d788a2c`.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 24, 2025
@k8s-ci-robot k8s-ci-robot requested review from ahg-g and elevran October 24, 2025 21:39
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 24, 2025
@netlify
Copy link

netlify bot commented Oct 24, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit 53bfda7
🔍 Latest deploy log https://app.netlify.com/projects/gateway-api-inference-extension/deploys/690e7c94850d4900086079a8
😎 Deploy Preview https://deploy-preview-1769--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@danehans
Copy link
Contributor Author

/cc @shmuelk @robscott @zetxqx

targetPorts:
- number: 3000
- number: 3002
- number: 3004
Copy link
Contributor

@shmuelk shmuelk Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this is an interesting configuration, I don't think you could do this with a real vLLM server

I was wrong

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shmuelk can you elaborate? Are you referring to the non-contiguous port numbers? If so, the reason for this configuration is because the backend is an echo server which listens on multiple ports (xref).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had thought that when one launched vLLM with Data Parallel in a configuration that listened on multiple ports, that the "vLLM Launcher" started up all of the processes.

I was wrong. They are started individually and each one can listen on any port one wants.

spec:
containers:
- name: echoserver-3000
image: gcr.io/k8s-staging-gateway-api/echo-basic:v20240412-v1.0.0-394-g40c666fd
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than your use of non-contiguous ports here, why not not use llm-d-inference-sim which supports --data-parallel-size=N ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to swap out the backend to support a single conformance test. @robscott llm-d/llm-d-inference-sim#230 added support for echo'ing the port number. I don't see this supported in the echo-basic image that we currently use. I submitted kubernetes-sigs/gateway-api#4230 upstream to add the required functionality to the echo server. cc: @robscott

{
name: "DP routes only to all pods (EPP returns all; ranks balanced internally)",
podIPsToBeReturnedByEPP: []string{podIPs[0], podIPs[1], podIPs[2]},
expectAllRequestsRoutedWithinPodNames: []string{podNames[0], podNames[1], podNames[2]},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are only checking Pod IPs. Shouldn't youalso be checking Pod Ports?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't compare the port number in the response with the backend ports since the request/response is being proxied. We need the echo server to support echo'ing the port number back in a response header, i.e. llm-d/llm-d-inference-sim#232, or switch to vllm-sim. cc: @robscott @zetxqx

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit 0d46ef5 updates the conformance test to compare the response IP:port with the the IP:port set in the "test-epp-endpoint-selection" header. Note that kubernetes-sigs/gateway-api#4230 is required for the echo server to return the port number in the "X-Echo-HTTP-Port" response header.


// Filter selects pods that match the IP addresses specified in the request header.
// Filter selects pods whose IPs match any value in the "test-epp-endpoint-selection" header.
// Values may be "IP" or "IP:port"; ports (ranks) are ignored here because DP fan-out happens later.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you want to ignore the port? I would have thought you would add code that if a port was specified, it filters by both IP and port.

Without that how do you know that DP really worked and sent requests to a "non-base" of the model server?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit 0d46ef5 updated this method to filter selects pods whose IP or IP:port matches any value in the "test-epp-endpoint-selection" header. If a port is provided, only an exact IP:port match is accepted.

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 5, 2025
@danehans danehans requested a review from shmuelk November 5, 2025 20:22
@danehans
Copy link
Contributor Author

danehans commented Nov 5, 2025

/test pull-gateway-api-inference-extension-test-unit-main

@danehans danehans self-assigned this Nov 6, 2025
Copy link
Contributor

@zetxqx zetxqx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay review, looks good, just some nit comments.

I'm also setting up GKE to run against this conformance test as well.

// Update the CONTROLLER_TOOLS_VERSION in Makefile when bumping controller-tools.
sigs.k8s.io/controller-tools v0.19.0
sigs.k8s.io/gateway-api v1.4.0
sigs.k8s.io/gateway-api v1.3.1-0.20251106052652-079e4774d76b
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering can this be 1.4.0-something? this will be synced back to our internal pipeline, and it already updated to 1.4.0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it would be a v1.4.0-something too, but ^ is how the dep gets resolved when I:

$ REQ_COMMIT=079e4774d76b909a8d43eae7dba570e2208cc9a7
$ go get sigs.k8s.io/gateway-api@$REQ_COMMIT

// - "IP:port" — selects only pods whose IP and port both match exactly.
// Ports correspond to data-parallel ranks or specific targetPorts.
//
// IPv6 addresses are supported, with or without brackets (e.g. "fd00::1" or "[fd00::1]:3002").
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding the ipv6 support!

Signed-off-by: Daneyon Hansen <daneyon.hansen@solo.io>
Signed-off-by: Daneyon Hansen <daneyon.hansen@solo.io>
Copy link
Contributor

@zetxqx zetxqx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I run this test successfully with GKE as well.

command

go test -v ./conformance -args -debug -gateway-class gke-l7-regional-external-managed -cleanup-base-resources=false -allow-crds-mismatch=true -run-test GatewayFollowingEPPRoutingWithDataParallelism
=== RUN   TestConformance/InferencePoolResolvedRefsCondition
    conformance.go:66: Skipping InferencePoolResolvedRefsCondition: test explicitly skipped
--- PASS: TestConformance (33.63s)
    --- SKIP: TestConformance/EppUnAvailableFailOpen (0.00s)
    --- SKIP: TestConformance/GatewayFollowingEPPRouting (0.00s)
    --- PASS: TestConformance/GatewayFollowingEPPRoutingWithDataParallelism (31.70s)
        --- PASS: TestConformance/GatewayFollowingEPPRoutingWithDataParallelism/DP_routes_only_to_one_designated_pod_(any_rank) (0.77s)
        --- PASS: TestConformance/GatewayFollowingEPPRoutingWithDataParallelism/DP_routes_only_to_two_designated_pods;_one_has_a_fixed_rank (0.76s)
        --- PASS: TestConformance/GatewayFollowingEPPRoutingWithDataParallelism/DP_routes_to_all_pods;_one_pod_restricted_to_3000,3004 (0.93s)
    --- SKIP: TestConformance/GatewayWeightedAcrossTwoInferencePools (0.00s)
    --- SKIP: TestConformance/HTTPRouteInvalidInferencePoolRef (0.00s)
    --- SKIP: TestConformance/HTTPRouteMultipleGatewaysDifferentPools (0.00s)
    --- SKIP: TestConformance/InferencePoolAccepted (0.00s)
    --- SKIP: TestConformance/InferencePoolHTTPRoutePortValidation (0.00s)
    --- SKIP: TestConformance/InferencePoolInvalidEPPService (0.00s)
    --- SKIP: TestConformance/HTTPRouteMultipleRulesDifferentPools (0.00s)
    --- SKIP: TestConformance/InferencePoolResolvedRefsCondition (0.00s)
PASS
ok  	sigs.k8s.io/gateway-api-inference-extension/conformance	33.840s

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danehans, zetxqx

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@zetxqx
Copy link
Contributor

zetxqx commented Nov 8, 2025

/hold

in case others want to take another look @robscott @shmuelk

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 8, 2025
@zetxqx
Copy link
Contributor

zetxqx commented Nov 8, 2025

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 8, 2025
@shmuelk
Copy link
Contributor

shmuelk commented Nov 9, 2025

This looks good to me. As I am not an official reviewer, I'm not adding slash-lgtm and slash-approve

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/conformance-test Issues or PRs related to Conformance tests. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Conformance: Add a test for Gateway support of multiple targetPorts in an InferencePool

4 participants