-
-
Notifications
You must be signed in to change notification settings - Fork 11.6k
Open
Labels
Description
Your current environment
first, lanuch an OpenAI server:
vllm serve /mnt/model/gpt-oss-120b \
--port 8120 \
--tensor-parallel-size 8 \
--api-key "gpt_oss"testing code:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8120/v1",
api_key="gpt_oss"
)
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather in a given city",
"parameters": {
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"]
},
},
}
]
response = client.chat.completions.create(
model="/mnt/model/gpt-oss-120b",
messages=[{"role": "user", "content": "What's the weather in Berlin right now?"}],
tools=tools,
tool_choice="required"
)
print(response.choices[0].message)but I get the message without tool_calls parsed.
ChatCompletionMessage(content='[{ "name": "get_weather", "parameters": { "city": "Berlin" } }]', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[], reasoning_content="The user asks current weather in Berlin. As AI, we don't have real-time data. We must say we can't retrieve real-time info, suggest checking a weather site or app. Follow policy.")
How to fix it? Should I pass something like --output-parser to vllm?
How would you like to use vllm
I want to run inference of a gpt-oss-120b. I don't know how to integrate it with vllm.
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
leoribeiro, kwonjae-2, QwertyJack, theobjectivedad, Whatisthis8047 and 10 more
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
To Triage