forked from kubernetes-sigs/gateway-api-inference-extension
-
Notifications
You must be signed in to change notification settings - Fork 0
Add flow controller. #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
LukeAVanDrie
wants to merge
53
commits into
main
Choose a base branch
from
scheduler
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ahg-g
reviewed
Apr 9, 2025
the file contains only two consts that are not used anywhere (same consts are defined in runserver.go Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
Refactored the environment variable utility (pkg/epp/util/env) to enhance code quality, readability, and maintainability. Key changes: - Introduced generic helper functions `parseEnvWithValue` and `getEnvWithParser` to centralize common logic for fetching and parsing environment variables, significantly reducing code duplication. - Standardized logging messages for consistency across all `GetEnv<Type>` functions. - Added `GetEnvDuration`.
* refactor schdeuler filters package to simplify and improve readability and maintainability Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * filter refactor finalizing Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> --------- Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
…ernetes-sigs#810) current implementation leaves dangling go routines and structs which will consume resources and hold unused objects from being GCd Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
* merge has capacity filter with sheddable filter. has capacity only use was for sheddable requests (passthrough for critical ones). Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * Update pkg/epp/scheduling/plugins/filter/filter_test.go Co-authored-by: Cong Liu <conliu@google.com> --------- Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> Co-authored-by: Cong Liu <conliu@google.com>
… setup (kubernetes-sigs#772) * Add inferencepool_lifecycle test. * Resolve setup issues and enable InferencePool test * correct Lint error Multiplication of durations * Fix missing containerPort, is missing * change gateway name from "gateway-conformance-app" to "conformance-gateway" * clarify why K8s types are needed. * Update conformance/conformance.go Co-authored-by: Lior Lieberman <liorlib7+riskified@gmail.com> * Update conformance/conformance.go Co-authored-by: Lior Lieberman <liorlib7+riskified@gmail.com> * remove for loop when adding SupportedFeatures * remove exessive logging * Update conformance/conformance.go Co-authored-by: Lior Lieberman <liorlib7+riskified@gmail.com> * move excess debug logs behind debug flag. * remove CONFORMANCE.GO prefix from logs. * change the pull logic and use default value from GatewayMustHaveAddress * fix mt.Sprintf can be replaced with string concatenation * add a function for logDebug * factor out ensureGatewayAvailableAndReady * removed todo comment in helper.go * remove CONFORMANCE.GO from log * error messages, should not be capitalized or end with punctuation --------- Co-authored-by: Lior Lieberman <liorlib7+riskified@gmail.com>
* Add prefix cache aware scheduling * Replace scheduler v2 with config v2 * Add score weight to XXScorerConfig * Address comments * Clean up * Change to use container/list lib * cleanup * Add TODO * make linter happy
Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
* generalize scheduling cycle state concept Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * typo Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * make linter happy Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * make prefix state struct internal to package instead of public Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> --------- Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
* remove Model field from LLMRequest Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * rebase handling Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> --------- Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
* Added the LLMResponse struct and RequestId to LLMRequest Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Updates due to NewSchedulerContext API change Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Populate the RequestId field of LLMRequest Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Updates to tests Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Added PostResponse plugins to scheduler config Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Added scheduler.OnResponse to handle responses Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Added dispatcher.HandleResponse to handle responses Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Refactored server response header handling to invoke PostResponse plugins Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Added simple test for PostResponse plugins Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Setup the logger in the SchedulerContext appropriately for reponses Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Updates due to rebase issues * merge functions in env utils (kubernetes-sigs#819) Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * generalize scheduling cycle state concept (kubernetes-sigs#818) * generalize scheduling cycle state concept Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * typo Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * make linter happy Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * make prefix state struct internal to package instead of public Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> --------- Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * remove Model field from LLMRequest (kubernetes-sigs#782) * remove Model field from LLMRequest Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * rebase handling Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> --------- Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * Added the LLMResponse struct and RequestId to LLMRequest Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> * Insure that wanted response header messages have all of the response headers in them Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> --------- Signed-off-by: Shmuel Kallner <kallner@il.ibm.com> Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> Co-authored-by: Nir Rozenbaum <nirro@il.ibm.com>
* Add prefex aware routing proposal * Update, add a diagram * Add future work * Update to PR number, clarify terminologies
Signed-off-by: Daneyon Hansen <daneyon.hansen@solo.io>
…es-sigs#822) Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
The reference will be from `_PULL_BASE_REF` variable from the cloud build: https://docs.prow.k8s.io/docs/jobs/ The change also fixes the commit label by using the right variable added in https://github.com/kubernetes/test-infra/pull/34755/files.
This commit adds a new `SaturationDetector` component responsible for determining if backend model servers are saturated. It bases its decision on observed metrics like queue depth and KV cache utilization, using configurable thresholds. The detector is designed to be a self-contained unit that can be leveraged by other components for admission control and capacity assessment. This is the first step in a larger refactoring to externalize and centralize saturation detection logic.
) * support extracting prompt from chat completions API Signed-off-by: Hang Yin <luying.yh@alibaba-inc.com> * typo fixes Signed-off-by: Hang Yin <luying.yh@alibaba-inc.com> * fix tests * supply more tests and heading boilerplate Signed-off-by: Hang Yin <luying.yh@alibaba-inc.com> --------- Signed-off-by: Hang Yin <luying.yh@alibaba-inc.com>
The TestMetricsRefresh test in pod_metrics_test.go was flaky due to a race condition. The `StopRefreshLoop` method would signal the metrics refresh goroutine to stop but did not wait for its actual termination. If the test updated the mock metrics client immediately after calling `StopRefreshLoop`, the refresh goroutine could, in rare cases, perform a final metrics fetch with the new data before fully exiting. This resulted in the test asserting against unexpected metric values. This commit resolves the issue by making adding a sleep for the metrics refresh interval in TestMetricsRefresh. Additionally, it adds the following for robustness in `StopRefreshLoop`. - `stopOnce` is used to ensure the `done` channel is only closed once (for idempotency and protection against concurrent calls). This change ensures that the refresh goroutine is guaranteed to have stopped before any test assertions are made, eliminating the race condition.
* Add inferencepool_lifecycle test. * Resolve setup issues and enable InferencePool test * removed todo comment in helper.go * Add InferencePoolLifecycle test * update comments in helper.go * remove Conformanc.go from log message * Remove lifecycle test. * Removed unused helper methods ( inference pool must have selector & must be deleted) * Set timeout values as constant * change timeout.go to timing.go
* Scheduler subsystem high level design proposal This sets down basic design principles of the current gateway scheduler. We also highlight who we are targeting as users, and why we prioritize the current approach. It also selects standard terminology for scheduling that the implementation should adopt. This is a high level design and thus sets general scope, without expecting to fully address all problems. * Review feedback --------- Co-authored-by: Kellen Swain <kfswain@google.com>
Fix TZ link
…netes-sigs#835) * small refactor of scheduler config handles how to register a plugin that implements multiple scheduler plugins interfaces with a single registration command Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * code review Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * minor change Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> --------- Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
* feat: migrate epp metric server Signed-off-by: nayihz <smartczy@outlook.com> * feat: migrate bbr metric server Signed-off-by: nayihz <smartczy@outlook.com> * fix: metric reset not effect Signed-off-by: nayihz <smartczy@outlook.com> * fix: add the stability level to the help message of the metric * fix: refactor custom inferencepool metric Signed-off-by: nayihz <smartczy@outlook.com> --------- Signed-off-by: nayihz <smartczy@outlook.com>
…narios by using gateway api inference extension (kubernetes-sigs#812) * added common cases * added more details Signed-off-by: Xiyue Yu <xiyue@google.com> * fixed comments * changed file location * fixed typo * Update site-src/guides/serve-multiple-lora-adapters.md Co-authored-by: Cong Liu <conliu@google.com> * Update site-src/guides/serve-multiple-lora-adapters.md Co-authored-by: Cong Liu <conliu@google.com> * Update mkdocs.yml Co-authored-by: Rob Scott <rob.scott87@gmail.com> * Update site-src/guides/serve-multiple-lora-adapters.md Co-authored-by: Rob Scott <rob.scott87@gmail.com> * Update site-src/guides/serve-multiple-genai-models.md Co-authored-by: Rob Scott <rob.scott87@gmail.com> * added subsession * fixed wording --------- Signed-off-by: Xiyue Yu <xiyue@google.com> Co-authored-by: Cong Liu <conliu@google.com> Co-authored-by: Rob Scott <rob.scott87@gmail.com>
* code review Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * minor change Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * add support for multi cycle scheduling Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * minor change Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * moved plugins under plugins dir Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * few more changes Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * moved RunCycle logic into SchedulerProfile Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * minor changes Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * linter Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * minor change in unit-test Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> --------- Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
0b69491 to
16ed2e2
Compare
…ernetes-sigs#807) * Add inferencepool_lifecycle test. * Resolve setup issues and enable InferencePool test * correct Lint error Multiplication of durations * Fix missing containerPort, is missing * change gateway name from "gateway-conformance-app" to "conformance-gateway" * clarify why K8s types are needed. * Update conformance/conformance.go Co-authored-by: Lior Lieberman <liorlib7+riskified@gmail.com> * Update conformance/conformance.go Co-authored-by: Lior Lieberman <liorlib7+riskified@gmail.com> * remove for loop when adding SupportedFeatures * remove exessive logging * Update conformance/conformance.go Co-authored-by: Lior Lieberman <liorlib7+riskified@gmail.com> * move excess debug logs behind debug flag. * remove CONFORMANCE.GO prefix from logs. * change the pull logic and use default value from GatewayMustHaveAddress * fix mt.Sprintf can be replaced with string concatenation * add a function for logDebug * factor out ensureGatewayAvailableAndReady * removed todo comment in helper.go * remove CONFORMANCE.GO from log * Add InferencePoolLifecycle test * update comments in helper.go * Initial commit for InferencePoolNoMatchingPodsRouteStatus test * resolve lint issue. * error messages, should not be capitalized or end with punctuation * Add inferencepool_lifecycle test. * Resolve setup issues and enable InferencePool test * removed todo comment in helper.go * Add InferencePoolLifecycle test * update comments in helper.go * remove Conformanc.go from log message * Remove lifecycle test. * Removed unused helper methods ( inference pool must have selector & must be deleted) * add back HTTPRouteMustHaveParentStatusConditions * Set timeout values as constant * change timeout.go to timing.go * remove duplicate log * remove excess comments and logs * add comment / todo for Reconciled * Update conformance/utils/kubernetes/helpers.go Co-authored-by: Rob Scott <rob.scott87@gmail.com> * change test to HTTPRouteInvalidInferencePoolRef * use TODO: instead of TODO() * yaml and todos based on code review --------- Co-authored-by: Lior Lieberman <liorlib7+riskified@gmail.com> Co-authored-by: Rob Scott <rob.scott87@gmail.com>
…bernetes-sigs#832) * WIP tests for inferencepool_resolvedrefs_condition * update condition check * Add helper method for inf pool parrent status check * update manifests * update the test to match manifest * fix yaml files. * add SupportInferencePool * Add a helper function for HTTPRouteMustBeAcceptedAndResolved * Add a helper method InferencePoolMustBeAcceptedByParent * add todo for ensure http requests are routed correctly kubernetes-sigs#865 * remove extra space
…d InferenceModel (kubernetes-sigs#870) * Update docs about InferencePool * Update docs about InferenceModel
…etes-sigs#873) Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
* remove the PreCycle plugin from scheduler Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * Apply suggestions from code review Co-authored-by: Cong Liu <conliu@google.com> --------- Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> Co-authored-by: Cong Liu <conliu@google.com>
6bb4506 to
c998236
Compare
… E2E request validation (kubernetes-sigs#866) * WIP tests for inferencepool_resolvedrefs_condition * update condition check * Add helper method for inf pool parrent status check * update manifests * update the test to match manifest * fix yaml files. * add SupportInferencePool * Add a helper function for HTTPRouteMustBeAcceptedAndResolved * Add a helper method InferencePoolMustBeAcceptedByParent * add todo for ensure http requests are routed correctly kubernetes-sigs#865 * Add http tests * update to use echo server instead * fix echo server port. * Add env var to include namespace and pod name for echo server resposne. * factor out the common HTTPResponse builder * shorten wait time * remove extra space * fix yaml formatting * clean up yaml file remove white space and optional fields. * change naming convention to primary secondary consistently. * add helper method for "MakeRequestAndExpectNotFound/Success * use config instead of inferenceconfig
b7f210d to
55031a4
Compare
* small changes to saturation detector Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> * var rename Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com> --------- Signed-off-by: Nir Rozenbaum <nirro@il.ibm.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.