Skip to content

Commit b156c31

Browse files
authored
Set VLLM_DISABLE_SHARED_EXPERTS_STREAM=1 by default for TPU inference (#1021)
Signed-off-by: Xing Liu <xingliu14@gmail.com>
1 parent 6a1da81 commit b156c31

File tree

2 files changed

+13
-0
lines changed

2 files changed

+13
-0
lines changed

tpu_inference/__init__.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,9 @@
11
import os
22

3+
# The environment variables override should be imported before any other
4+
# modules to ensure that the environment variables are set before any
5+
# other modules are imported.
6+
import tpu_inference.env_override # noqa: F401
37
from tpu_inference import tpu_info as ti
48
from tpu_inference.logger import init_logger
59

tpu_inference/env_override.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# SPDX-License-Identifier: Apache-2.0
2+
# SPDX-FileCopyrightText: Copyright contributors to the tpu-inference project
3+
4+
import os
5+
6+
# Disable CUDA-specific shared experts stream for TPU
7+
# This prevents errors when trying to create CUDA streams on TPU hardware
8+
# The issue was introduced by vllm-project/vllm#26440
9+
os.environ["VLLM_DISABLE_SHARED_EXPERTS_STREAM"] = "1"

0 commit comments

Comments
 (0)