Skip to content

Conversation

@lgeiger
Copy link
Contributor

@lgeiger lgeiger commented Nov 7, 2025

vllm_is_batch_invariant is called many times during inference which is noticeable in a profile. We should cache it such that we don't need to repeatedly call os.getenv().

`vllm_is_batch_invariant` is called many times during inference which is
noticeable in a profile. We should cache it such that we don't need to
repeatedly call `os.getenv()`.

Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a performance optimization by caching the result of vllm_is_batch_invariant to avoid repeated os.getenv() calls. The approach is sound, but the implementation uses functools.cache, which is only available in Python 3.9+ and would break compatibility with Python 3.8. I've suggested using functools.lru_cache(maxsize=None) instead, which is equivalent and compatible. I've also pointed out that caching introduces global state that can affect test reliability and suggested adding a pytest fixture to clear the cache between tests to ensure a robust test suite.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@PaulZhang12
Copy link
Contributor

Can you see if the batch invariant tests pass? We might need a more elegant solution, I removed the caching logic in #27856. It is hard to override the vllm_is_batch_invariant in the same environment, say you want to test an LLM instance with batch invariance and without, as it caches an envvar read

Copy link
Collaborator

@heheda12345 heheda12345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@heheda12345 heheda12345 enabled auto-merge (squash) November 11, 2025 07:56
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 11, 2025
@heheda12345 heheda12345 merged commit ac0bb2c into vllm-project:main Nov 12, 2025
47 checks passed
@lgeiger lgeiger deleted the cache-batch-invariant branch November 12, 2025 09:18
geodavic pushed a commit to geodavic/vllm that referenced this pull request Nov 16, 2025
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
Signed-off-by: George D. Torres <gdavtor@gmail.com>
bwasti pushed a commit to bwasti/vllm that referenced this pull request Nov 17, 2025
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
Signed-off-by: Bram Wasti <bwasti@meta.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants