-
Notifications
You must be signed in to change notification settings - Fork 190
Conformance: Adds Data Parallelism Test #1769
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@danehans: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
| targetPorts: | ||
| - number: 3000 | ||
| - number: 3002 | ||
| - number: 3004 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this is an interesting configuration, I don't think you could do this with a real vLLM server
I was wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had thought that when one launched vLLM with Data Parallel in a configuration that listened on multiple ports, that the "vLLM Launcher" started up all of the processes.
I was wrong. They are started individually and each one can listen on any port one wants.
conformance/resources/base.yaml
Outdated
| spec: | ||
| containers: | ||
| - name: echoserver-3000 | ||
| image: gcr.io/k8s-staging-gateway-api/echo-basic:v20240412-v1.0.0-394-g40c666fd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than your use of non-contiguous ports here, why not not use llm-d-inference-sim which supports --data-parallel-size=N ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to swap out the backend to support a single conformance test. @robscott llm-d/llm-d-inference-sim#230 added support for echo'ing the port number. I don't see this supported in the echo-basic image that we currently use. I submitted kubernetes-sigs/gateway-api#4230 upstream to add the required functionality to the echo server. cc: @robscott
| { | ||
| name: "DP routes only to all pods (EPP returns all; ranks balanced internally)", | ||
| podIPsToBeReturnedByEPP: []string{podIPs[0], podIPs[1], podIPs[2]}, | ||
| expectAllRequestsRoutedWithinPodNames: []string{podNames[0], podNames[1], podNames[2]}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are only checking Pod IPs. Shouldn't youalso be checking Pod Ports?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't compare the port number in the response with the backend ports since the request/response is being proxied. We need the echo server to support echo'ing the port number back in a response header, i.e. llm-d/llm-d-inference-sim#232, or switch to vllm-sim. cc: @robscott @zetxqx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit 0d46ef5 updates the conformance test to compare the response IP:port with the the IP:port set in the "test-epp-endpoint-selection" header. Note that kubernetes-sigs/gateway-api#4230 is required for the echo server to return the port number in the "X-Echo-HTTP-Port" response header.
|
|
||
| // Filter selects pods that match the IP addresses specified in the request header. | ||
| // Filter selects pods whose IPs match any value in the "test-epp-endpoint-selection" header. | ||
| // Values may be "IP" or "IP:port"; ports (ranks) are ignored here because DP fan-out happens later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you want to ignore the port? I would have thought you would add code that if a port was specified, it filters by both IP and port.
Without that how do you know that DP really worked and sent requests to a "non-base" of the model server?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit 0d46ef5 updated this method to filter selects pods whose IP or IP:port matches any value in the "test-epp-endpoint-selection" header. If a port is provided, only an exact IP:port match is accepted.
|
/test pull-gateway-api-inference-extension-test-unit-main |
zetxqx
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay review, looks good, just some nit comments.
I'm also setting up GKE to run against this conformance test as well.
| // Update the CONTROLLER_TOOLS_VERSION in Makefile when bumping controller-tools. | ||
| sigs.k8s.io/controller-tools v0.19.0 | ||
| sigs.k8s.io/gateway-api v1.4.0 | ||
| sigs.k8s.io/gateway-api v1.3.1-0.20251106052652-079e4774d76b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering can this be 1.4.0-something? this will be synced back to our internal pipeline, and it already updated to 1.4.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it would be a v1.4.0-something too, but ^ is how the dep gets resolved when I:
$ REQ_COMMIT=079e4774d76b909a8d43eae7dba570e2208cc9a7
$ go get sigs.k8s.io/gateway-api@$REQ_COMMIT| // - "IP:port" — selects only pods whose IP and port both match exactly. | ||
| // Ports correspond to data-parallel ranks or specific targetPorts. | ||
| // | ||
| // IPv6 addresses are supported, with or without brackets (e.g. "fd00::1" or "[fd00::1]:3002"). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding the ipv6 support!
Signed-off-by: Daneyon Hansen <daneyon.hansen@solo.io>
Signed-off-by: Daneyon Hansen <daneyon.hansen@solo.io>
zetxqx
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I run this test successfully with GKE as well.
command
go test -v ./conformance -args -debug -gateway-class gke-l7-regional-external-managed -cleanup-base-resources=false -allow-crds-mismatch=true -run-test GatewayFollowingEPPRoutingWithDataParallelism
=== RUN TestConformance/InferencePoolResolvedRefsCondition
conformance.go:66: Skipping InferencePoolResolvedRefsCondition: test explicitly skipped
--- PASS: TestConformance (33.63s)
--- SKIP: TestConformance/EppUnAvailableFailOpen (0.00s)
--- SKIP: TestConformance/GatewayFollowingEPPRouting (0.00s)
--- PASS: TestConformance/GatewayFollowingEPPRoutingWithDataParallelism (31.70s)
--- PASS: TestConformance/GatewayFollowingEPPRoutingWithDataParallelism/DP_routes_only_to_one_designated_pod_(any_rank) (0.77s)
--- PASS: TestConformance/GatewayFollowingEPPRoutingWithDataParallelism/DP_routes_only_to_two_designated_pods;_one_has_a_fixed_rank (0.76s)
--- PASS: TestConformance/GatewayFollowingEPPRoutingWithDataParallelism/DP_routes_to_all_pods;_one_pod_restricted_to_3000,3004 (0.93s)
--- SKIP: TestConformance/GatewayWeightedAcrossTwoInferencePools (0.00s)
--- SKIP: TestConformance/HTTPRouteInvalidInferencePoolRef (0.00s)
--- SKIP: TestConformance/HTTPRouteMultipleGatewaysDifferentPools (0.00s)
--- SKIP: TestConformance/InferencePoolAccepted (0.00s)
--- SKIP: TestConformance/InferencePoolHTTPRoutePortValidation (0.00s)
--- SKIP: TestConformance/InferencePoolInvalidEPPService (0.00s)
--- SKIP: TestConformance/HTTPRouteMultipleRulesDifferentPools (0.00s)
--- SKIP: TestConformance/InferencePoolResolvedRefsCondition (0.00s)
PASS
ok sigs.k8s.io/gateway-api-inference-extension/conformance 33.840s
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: danehans, zetxqx The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/lgtm |
|
This looks good to me. As I am not an official reviewer, I'm not adding slash-lgtm and slash-approve |
What type of PR is this?
/kind test
/area conformance-test
What this PR does / why we need it:
main, e.g.v20251105-cbb8928.Which issue(s) this PR fixes:
Fixes #1680
Does this PR introduce a user-facing change?: