-
-
Notifications
You must be signed in to change notification settings - Fork 11.5k
[Core] Cache vllm_is_batch_invariant
#28304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
`vllm_is_batch_invariant` is called many times during inference which is noticeable in a profile. We should cache it such that we don't need to repeatedly call `os.getenv()`. Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a performance optimization by caching the result of vllm_is_batch_invariant to avoid repeated os.getenv() calls. The approach is sound, but the implementation uses functools.cache, which is only available in Python 3.9+ and would break compatibility with Python 3.8. I've suggested using functools.lru_cache(maxsize=None) instead, which is equivalent and compatible. I've also pointed out that caching introduces global state that can affect test reliability and suggested adding a pytest fixture to clear the cache between tests to ensure a robust test suite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
Can you see if the batch invariant tests pass? We might need a more elegant solution, I removed the caching logic in #27856. It is hard to override the vllm_is_batch_invariant in the same environment, say you want to test an LLM instance with batch invariance and without, as it caches an envvar read |
heheda12345
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com> Signed-off-by: George D. Torres <gdavtor@gmail.com>
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com> Signed-off-by: Bram Wasti <bwasti@meta.com>
vllm_is_batch_invariantis called many times during inference which is noticeable in a profile. We should cache it such that we don't need to repeatedly callos.getenv().