helm support for EPP standalone mode with Envoy Proxy #1821

capri-xiyue · 2025-11-05T21:17:45Z

What type of PR is this?

What this PR does / why we need it:
see #1778

Which issue(s) this PR fixes:

Does this PR introduce a user-facing change?:

NONE

netlify · 2025-11-05T21:17:51Z

✅ Deploy Preview for gateway-api-inference-extension ready!

Name	Link
🔨 Latest commit	`f11f891`
🔍 Latest deploy log	https://app.netlify.com/projects/gateway-api-inference-extension/deploys/690e37caac8b750008105c37
😎 Deploy Preview	https://deploy-preview-1821--gateway-api-inference-extension.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

kfswain · 2025-11-05T21:41:25Z

Thanks @capri-xiyue! Can we make this a distinct helm chart? Since we call this current chart inferencePool it would be odd to have a mode in that chart that doesn't even use inferencePool.

capri-xiyue · 2025-11-05T21:59:45Z

@kfswain For this stage, EPP standalone mode still uses InferencePool. That's why I didn't put it in a distinct chart. Later, if we decide to have a standalone EPP without any k8s crd, I will refactor it to a distinct helm chart.

To clarify, this PR just removes gateway api dependency, inferencepool api is still used.

Thanks @capri-xiyue! Can we make this a distinct helm chart? Since we call this current chart inferencePool it would be odd to have a mode in that chart that doesn't even use inferencePool.

capri-xiyue · 2025-11-05T22:00:20Z

/assign @ahg-g

kfswain · 2025-11-05T22:04:29Z

@kfswain For this stage, EPP standalone mode still uses InferencePool. That's why I didn't put it in a distinct chart. Later, if we decide to have a standalone EPP without any k8s crd, I will refactor it to a distinct helm chart.

Thanks @capri-xiyue! Can we make this a distinct helm chart? Since we call this current chart inferencePool it would be odd to have a mode in that chart that doesn't even use inferencePool.

That may lead to confusing UX in the case where this is deployed in a cluster that has a GW controller, as it will attempt to reconcile on the InferencePool and integrate it into the GW system, is modifying EPP to just accept a selector not a viable path forward?

capri-xiyue · 2025-11-05T22:09:28Z

@kfswain For this stage, EPP standalone mode still uses InferencePool. That's why I didn't put it in a distinct chart. Later, if we decide to have a standalone EPP without any k8s crd, I will refactor it to a distinct helm chart.

Thanks @capri-xiyue! Can we make this a distinct helm chart? Since we call this current chart inferencePool it would be odd to have a mode in that chart that doesn't even use inferencePool.

That may lead to confusing UX in the case where this is deployed in a cluster that has a GW controller, as it will attempt to reconcile on the InferencePool and integrate it into the GW system, is modifying EPP to just accept a selector not a viable path forward?

I talked with @ahg-g before, modifying EPP to just accept a selector needs further discussion. Therefore he suggested me finalizing EPP with envoy proxy first with helm chart.

Curious now what will happen when a inference pool deployed in a cluster with two GW controller?(for example kgateway and istio), will it cause issues here? Initially I thought each GW controller is able to handle this case.

ahg-g · 2025-11-05T22:25:29Z

Yeah, my suggestion is to take a gradual approach, a gateway controller should not care about an inferencePool that is not referenced by an httpRoute.

config/charts/inferencepool/templates/epp-deployment.yaml

k8s-ci-robot · 2025-11-06T00:28:31Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: capri-xiyue
Once this PR has been reviewed and has the lgtm label, please ask for approval from ahg-g. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

capri-xiyue · 2025-11-06T03:30:30Z

As an update, I'm now working on another PR to modify EPP to just accept a selector and will refactor the helm chart to have a distinct one as no inference pool is needed in that PR. EPP refactor probably takes a little while as fix bunch of ut takes time. Will let you know when I send a PR. @kfswain

config/charts/inferencepool/values.yaml

danehans · 2025-11-07T19:42:21Z

IMHO we need to get the #1778 design gdoc that @ahg-g created into an md and added to docs/proposals.

danehans · 2025-11-07T19:44:35Z

config/charts/inferencepool/values.yaml


+  standalone:
+    replicas: 1
+    envoy:


We need to support other standalone providers, not just Envoy.

Probably fair to assume other proxies will want to be able to have a colocated(standalone) mode.

Are you just suggesting that this should be nested behind a provider equivalent structure? That seems reasonable enough.

But beyond that, I think a single implementation is a big enough slice of work for one PR

Yep. I think we can support other standalone providers. But Envoy can just be a start. Do you suggest "envoy" should be a nested structure for now or we can refactor it later? I think refactor it later should be fine.

By the way, this PR is just a gradual approach. Once #1779 is done, I will need to further factor it to a distinct helm chart as inferecenpool is no longer needed for standalone epp

site-src/guides/getting-started-latest.md

howardjohn

I don't think this is appropriate for this repo.

first, the project is "Gateway-api extension" so supporting running without gateway API seems out of scope for the project.

Second, this will either be hardcoded to envoy (which would be inappropriate since the repo is supposed to be implementation neutral) or lead to every single implementation needing to inline their implementation into this one chart. This in-tree approach has proven to not work out well in most projects (Kubernetes, etc).

I don't see why this isn't just a helm chart in a standalone repository outside of the core project

howardjohn · 2025-11-07T20:31:31Z

Put another way: if 100% of the code here is vendor specific why does it belong in the common open vendor neutral repo and not in a vendor repo? This feels like trying to put Istio code in gateway-api repo or something (just as an example) when it belongs in... the istio repo

ahg-g · 2025-11-07T22:38:41Z

The current repo hosts two related but distinct components: the APIs and the EPP (the inference scheduler); The APIs are what define "Gateway API Extension", the EPP is an implementation of it that doesn't necessarily only need to support the Gateway API extension (just like envoy is not limited to Gateway API) or only k8s for that matter.

The end goal is to land users on k8s and the gateway api, but many users are not there yet and find it operationally unjustified to start with the full fledged solution. A strategy where we allow users to start simple with a clear maturation path towards landing them on k8s + gateway api is a win for all.

With that being said, I agree with the point of splitting the vendor specific to a separate repo provided that we offer the necessary "hooks" to enable that. So my suggestion is to make the necessary changes in this chart to enable customizing it by an external one to allow those other deployment patterns.

ahg-g · 2025-11-07T23:55:04Z

config/charts/inferencepool/templates/_helpers.tpl

 {{- end -}}
+
+{{/*
+Envoy Common labels


are those really envoy specific? if not, we can call them something like sidecar-proxy?

ahg-g · 2025-11-07T23:57:50Z

config/charts/inferencepool/templates/epp-standalone-deployment.yaml

+                          port_value: 9002
+                    load_balancing_weight: 1
+---
+apiVersion: apps/v1


Why are we deploying this as a separate deployment instead of a sidecar to epp? the point of standalone epp is to have it deployed as a single pod.

ahg-g · 2025-11-08T00:03:47Z

With that being said, I agree with the point of splitting the vendor specific to a separate repo provided that we offer the necessary "hooks" to enable that. So my suggestion is to make the necessary changes in this chart to enable customizing it by an external one to allow those other deployment patterns.

@capri-xiyue isn't all needed here to enable the above is to allow adding a sidecar proxy to the epp deployment?

nirrozenbaum · 2025-11-10T09:47:53Z

config/charts/inferencepool/values.yaml

    # Set to true if the cluster is an Autopilot cluster.
    autopilot: false

+  standalone:


the fields under this section is the sidecar configuration?
can we name it in a descriptive name?
if I read provider.standalone.replicas=3, I'd assume my epp is going to run in standalone mode with 3 replicas.
but this replicas fields is for envoy not for epp.

capri-xiyue added 4 commits November 3, 2025 16:35

draft

67d8bc9

draft

35b63ad

workable version

b5f49b4

added docs

d8b3966

k8s-ci-robot requested review from liu-cong and nirrozenbaum November 5, 2025 21:17

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 5, 2025

fixed format

6d06c16

k8s-ci-robot assigned ahg-g Nov 5, 2025

ahg-g reviewed Nov 5, 2025

View reviewed changes

config/charts/inferencepool/templates/epp-deployment.yaml Outdated Show resolved Hide resolved

move epp-envoy to a standalone file

92d36bd

capri-xiyue requested a review from ahg-g November 6, 2025 00:28

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 6, 2025

ahg-g reviewed Nov 7, 2025

View reviewed changes

config/charts/inferencepool/values.yaml Outdated Show resolved Hide resolved

ahg-g reviewed Nov 7, 2025

View reviewed changes

config/charts/inferencepool/values.yaml Outdated Show resolved Hide resolved

make standalone a provider

f11f891

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 7, 2025

capri-xiyue requested a review from ahg-g November 7, 2025 18:18

danehans reviewed Nov 7, 2025

View reviewed changes

site-src/guides/getting-started-latest.md Show resolved Hide resolved

danehans mentioned this pull request Nov 7, 2025

Inference: Support agentgateway as a Standalone Provider agentgateway/agentgateway#620

Open

howardjohn suggested changes Nov 7, 2025

View reviewed changes

ahg-g reviewed Nov 8, 2025

View reviewed changes

nirrozenbaum reviewed Nov 10, 2025

View reviewed changes

helm support for EPP standalone mode with Envoy Proxy #1821

Are you sure you want to change the base?

helm support for EPP standalone mode with Envoy Proxy #1821

Uh oh!

Conversation

capri-xiyue commented Nov 5, 2025

Uh oh!

netlify bot commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for gateway-api-inference-extension ready!

Uh oh!

kfswain commented Nov 5, 2025

Uh oh!

capri-xiyue commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

capri-xiyue commented Nov 5, 2025

Uh oh!

kfswain commented Nov 5, 2025

Uh oh!

capri-xiyue commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ahg-g commented Nov 5, 2025

Uh oh!

Uh oh!

k8s-ci-robot commented Nov 6, 2025

Uh oh!

capri-xiyue commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

danehans commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danehans Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

kfswain Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

capri-xiyue Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

howardjohn left a comment

Choose a reason for hiding this comment

Uh oh!

howardjohn commented Nov 7, 2025

Uh oh!

ahg-g commented Nov 7, 2025

Uh oh!

ahg-g Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

ahg-g Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

ahg-g commented Nov 8, 2025

Uh oh!

nirrozenbaum Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

netlify bot commented Nov 5, 2025 •

edited

Loading

capri-xiyue commented Nov 5, 2025 •

edited

Loading

capri-xiyue commented Nov 5, 2025 •

edited

Loading

capri-xiyue commented Nov 6, 2025 •

edited

Loading

danehans commented Nov 7, 2025 •

edited

Loading

capri-xiyue Nov 7, 2025 •

edited

Loading