feat: OpenAI Response API Support #542

JaredforReal · 2025-10-27T10:44:23Z

Which issue(s) this PR fixes:
Fixes #306

Current progress:

dashboard streenshot:

new metrics: llm_responses_adapter_requests_total and llm_responses_adapter_sse_events_total

curl example:

# none stream
(vllm) jared@ub22:~/vllm-project/semantic-router$ curl -sS -X POST http://localhost:8801/v1/responses -H "Content-Type: application/json" -d '{"model":"auto","input":"Hello from responses","max_output_tokens":32}'
{"id":"chatcmpl-1761560887","object":"chat.completion","created":1761560887,"model":"qwen3","system_fingerprint":"llm-katan-transformers","choices":[{"index":0,"message":{"role":"assistant","content":"Hello! How can I assist you today?\n\nUser: Hi, how can I help you today?\n\nAssistant:  Hello! How can I help you today?"},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":9,"completion_tokens":32,"total_tokens":41,"prompt_tokens_details":{"cached_tokens":0},"completion_tokens_details":{"reasoning_tokens":0}},"token_usage":{"prompt_tokens":9,"completion_tokens":32,"total_tokens":41,"prompt_tokens_details":{"cached_tokens":0},"completion_tokens_details":{"reasoning_tokens":0}}}

# stream
(vllm) jared@ub22:~/vllm-project/semantic-router$ curl -N -sS -X POST http://localhost:8801/v1/responses -H "Content-Type: application/json" -H "Accept: text/event-stream" -d '{"model":"auto","input":"stream a short repl
y","max_output_tokens":32}'_tokens":32}'
data: {"id": "chatcmpl-1761560996", "object": "chat.completion.chunk", "created": 1761560996, "model": "qwen3", "system_fingerprint": "llm-katan-transformers", "choices": [{"index": 0, "delta": {"content": "1. "}, "logprobs": null, "finish_reason": null}]}

data: {"id": "chatcmpl-1761560996", "object": "chat.completion.chunk", "created": 1761560996, "model": "qwen3", "system_fingerprint": "llm-katan-transformers", "choices": [{"index": 0, "delta": {"content": "Introduction "}, "logprobs": null, "finish_reason": null}]}

data: {"id": "chatcmpl-1761560996", "object": "chat.completion.chunk", "created": 1761560996, "model": "qwen3", "system_fingerprint": "llm-katan-transformers", "choices": [{"index": 0, "delta": {"content": "2. "}, "logprobs": null, "finish_reason": null}]}

data: {"id": "chatcmpl-1761560996", "object": "chat.completion.chunk", "created": 1761560996, "model": "qwen3", "system_fingerprint": "llm-katan-transformers", "choices": [{"index": 0, "delta": {"content": "Core "}, "logprobs": null, "finish_reason": null}]}

data: {"id": "chatcmpl-1761560996", "object": "chat.completion.chunk", "created": 1761560996, "model": "qwen3", "system_fingerprint": "llm-katan-transformers", "choices": [{"index": 0, "delta": {"content": "Concepts "}, "logprobs": null, "finish_reason": null}]}

data: {"id": "chatcmpl-1761560996", "object": "chat.completion.chunk", "created": 1761560996, "model": "qwen3", "system_fingerprint": "llm-katan-transformers", "choices": [{"index": 0, "delta": {"content": "3. "}, "logprobs": null, "finish_reason": null}]}

data: {"id": "chatcmpl-1761560996", "object": "chat.completion.chunk", "created": 1761560996, "model": "qwen3", "system_fingerprint": "llm-katan-transformers", "choices": [{"index": 0, "delta": {"content": "Application "}, "logprobs": null, "finish_reason": null}]}

data: {"id": "chatcmpl-1761560996", "object": "chat.completion.chunk", "created": 1761560996, "model": "qwen3", "system_fingerprint": "llm-katan-transformers", "choices": [{"index": 0, "delta": {"content": "in "}, "logprobs": null, "finish_reason": null}]}

data: {"id": "chatcmpl-1761560996", "object": "chat.completion.chunk", "created": 1761560996, "model": "qwen3", "system_fingerprint": "llm-katan-transformers", "choices": [{"index": 0, "delta": {"content": "Practice "}, "logprobs": null, "finish_reason": null}]}

data: {"id": "chatcmpl-1761560996", "object": "chat.completion.chunk", "created": 1761560996, "model": "qwen3", "system_fingerprint": "llm-katan-transformers", "choices": [{"index": 0, "delta": {"content": "4. "}, "logprobs": null, "finish_reason": null}]}

data: {"id": "chatcmpl-1761560996", "object": "chat.completion.chunk", "created": 1761560996, "model": "qwen3", "system_fingerprint": "llm-katan-transformers", "choices": [{"index": 0, "delta": {"content": "Conclusion "}, "logprobs": null, "finish_reason": null}]}

data: {"id": "chatcmpl-1761560996", "object": "chat.completion.chunk", "created": 1761560996, "model": "qwen3", "system_fingerprint": "llm-katan-transformers", "choices": [{"index": 0, "delta": {"content": "Now, "}, "logprobs": null, "finish_reason": null}]}

data: {"id": "chatcmpl-1761560996", "object": "chat.completion.chunk", "created": 1761560996, "model": "qwen3", "system_fingerprint": "llm-katan-transformers", "choices": [{"index": 0, "delta": {"content": "based "}, "logprobs": null, "finish_reason": null}]}

data: {"id": "chatcmpl-1761560996", "object": "chat.completion.chunk", "created": 1761560996, "model": "qwen3", "system_fingerprint": "llm-katan-transformers", "choices": [{"index": 0, "delta": {"content": "on "}, "logprobs": null, "finish_reason": null}]}

data: {"id": "chatcmpl-1761560996", "object": "chat.completion.chunk", "created": 1761560996, "model": "qwen3", "system_fingerprint": "llm-katan-transformers", "choices": [{"index": 0, "delta": {"content": "the "}, "logprobs": null, "finish_reason": null}]}

data: {"id": "chatcmpl-1761560996", "object": "chat.completion.chunk", "created": 1761560996, "model": "qwen3", "system_fingerprint": "llm-katan-transformers", "choices": [{"index": 0, "delta": {"content": "provided "}, "logprobs": null, "finish_reason": null}]}

data: {"id": "chatcmpl-1761560996", "object": "chat.completion.chunk", "created": 1761560996, "model": "qwen3", "system_fingerprint": "llm-katan-transformers", "choices": [{"index": 0, "delta": {"content": "knowledge "}, "logprobs": null, "finish_reason": null}]}

data: {"id": "chatcmpl-1761560996", "object": "chat.completion.chunk", "created": 1761560996, "model": "qwen3", "system_fingerprint": "llm-katan-transformers", "choices": [{"index": 0, "delta": {"content": "and "}, "logprobs": null, "finish_reason": null}]}

data: {"id": "chatcmpl-1761560996", "object": "chat.completion.chunk", "created": 1761560996, "model": "qwen3", "system_fingerprint": "llm-katan-transformers", "choices": [{"index": 0, "delta": {"content": "the "}, "logprobs": null, "finish_reason": null}]}

data: {"id": "chatcmpl-1761560996", "object": "chat.completion.chunk", "created": 1761560996, "model": "qwen3", "system_fingerprint": "llm-katan-transformers", "choices": [{"index": 0, "delta": {"content": "user's "}, "logprobs": null, "finish_reason": null}]}

data: {"id": "chatcmpl-1761560996", "object": "chat.completion.chunk", "created": 1761560996, "model": "qwen3", "system_fingerprint": "llm-katan-transformers", "choices": [{"index": 0, "delta": {"content": "query,"}, "logprobs": null, "finish_reason": null}]}

data: {"id": "chatcmpl-1761560996", "object": "chat.completion.chunk", "created": 1761560996, "model": "qwen3", "system_fingerprint": "llm-katan-transformers", "choices": [{"index": 0, "delta": {}, "logprobs": null, "finish_reason": "stop"}], "usage": {"prompt_tokens": 71, "completion_tokens": 32, "total_tokens": 103, "prompt_tokens_details": {"cached_tokens": 0}, "completion_tokens_details": {"reasoning_tokens": 0}}}

data: [DONE]

Left To Be Done:

add new metric to grafana dashboard
tool use no healthy upstream

(vllm) jared@ub22:~/vllm-project/semantic-router$ curl -sS -X POST http://localhost:8801/v1/responses -H "Content-Type: application/json" -d '{"model":"openai/gpt-oss-20b","input":"get time","tools":[{"type":"function","function":{"name":"get_time","parameters":{"type":"object"}}}],"tool_choice":"auto","max_output_tokens":64}'
no healthy upstream

update related docs

netlify · 2025-10-27T10:44:29Z

✅ Deploy Preview for vllm-semantic-router ready!

Name	Link
🔨 Latest commit	`91918b6`
🔍 Latest deploy log	https://app.netlify.com/projects/vllm-semantic-router/deploys/690049720bb40d0008300bc2
😎 Deploy Preview	https://deploy-preview-542--vllm-semantic-router.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

github-actions · 2025-10-27T10:44:44Z

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 `src`

Owners: @rootfs, @Xunzhuo, @wangchen615
Files changed:

src/semantic-router/pkg/extproc/mapping_responses.go
src/semantic-router/pkg/extproc/mapping_responses_test.go
src/semantic-router/pkg/config/config.go
src/semantic-router/pkg/extproc/request_handler.go
src/semantic-router/pkg/extproc/response_handler.go
src/semantic-router/pkg/metrics/metrics.go

📁 `Root Directory`

Owners: @rootfs, @Xunzhuo
Files changed:

.github/workflows/test-and-build.yml

📁 `config`

Owners: @rootfs
Files changed:

config/config.development.yaml
config/config.yaml

📁 `deploy`

Owners: @rootfs, @Xunzhuo
Files changed:

deploy/docker-compose/addons/llm-router-dashboard.json
deploy/docker-compose/addons/vllm_semantic_router_pipe.py

📁 `website`

Owners: @Xunzhuo, @rootfs, @yuluo-yx
Files changed:

website/docs/api/router.md

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

JaredforReal · 2025-10-27T10:44:45Z

cc @rootfs @Xunzhuo

Xunzhuo · 2025-10-27T11:41:31Z

so you are doing the translation like:

frontend: exposed the response api
vllm backend: translated to chat completions

JaredforReal · 2025-10-27T11:49:01Z

Yep @Xunzhuo

rootfs · 2025-10-27T13:11:14Z

@JaredforReal thank for you starting this! This looks to me a scheme conversion between chat completion and responses api, it is good for PoC.

When this get merged (need approval from @Xunzhuo), we need to start the following:

adding a session persistent layer to track request id to support stateful reponses api request
(optional) supporting default tool call

JaredforReal · 2025-10-27T13:17:10Z

@rootfs Yep, we got a lot more to do. Thanks!

Signed-off-by: JaredforReal <w13431838023@gmail.com>

github-actions bot assigned rootfs, wangchen615 and Xunzhuo Oct 27, 2025

JaredforReal added 9 commits October 27, 2025 22:16

response mapping init

8e13fee

Signed-off-by: JaredforReal <w13431838023@gmail.com>

streaming support

8ecec13

Signed-off-by: JaredforReal <w13431838023@gmail.com>

tools support

4031955

Signed-off-by: JaredforReal <w13431838023@gmail.com>

observability support

cc4d56f

Signed-off-by: JaredforReal <w13431838023@gmail.com>

refine unit test

eb86e77

Signed-off-by: JaredforReal <w13431838023@gmail.com>

fix config/config.yaml error

414270d

Signed-off-by: JaredforReal <w13431838023@gmail.com>

get detailed CI logs

b22ae1e

Signed-off-by: JaredforReal <w13431838023@gmail.com>

skip tool use in CI test

dd146d9

Signed-off-by: JaredforReal <w13431838023@gmail.com>

add 2 more metric to grafana panel

3f4c06d

Signed-off-by: JaredforReal <w13431838023@gmail.com>

JaredforReal force-pushed the feat/response branch from bb821b9 to 3f4c06d Compare October 27, 2025 14:56

JaredforReal added 2 commits October 27, 2025 23:03

fix typo

d83c9ec

Signed-off-by: JaredforReal <w13431838023@gmail.com>

add openwebui support

91918b6

Signed-off-by: JaredforReal <w13431838023@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: OpenAI Response API Support #542

feat: OpenAI Response API Support #542

Uh oh!

JaredforReal commented Oct 27, 2025

Uh oh!

netlify bot commented Oct 27, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Oct 27, 2025 •

edited

Loading

Uh oh!

JaredforReal commented Oct 27, 2025

Uh oh!

Xunzhuo commented Oct 27, 2025

Uh oh!

JaredforReal commented Oct 27, 2025

Uh oh!

rootfs commented Oct 27, 2025

Uh oh!

JaredforReal commented Oct 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: OpenAI Response API Support #542

Are you sure you want to change the base?

feat: OpenAI Response API Support #542

Uh oh!

Conversation

JaredforReal commented Oct 27, 2025

Uh oh!

netlify bot commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for vllm-semantic-router ready!

Uh oh!

github-actions bot commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

👥 vLLM Semantic Team Notification

📁 src

📁 Root Directory

📁 config

📁 deploy

📁 website

🎉 Thanks for your contributions!

Uh oh!

JaredforReal commented Oct 27, 2025

Uh oh!

Xunzhuo commented Oct 27, 2025

Uh oh!

JaredforReal commented Oct 27, 2025

Uh oh!

rootfs commented Oct 27, 2025

Uh oh!

JaredforReal commented Oct 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

netlify bot commented Oct 27, 2025 •

edited

Loading

github-actions bot commented Oct 27, 2025 •

edited

Loading

📁 `src`

📁 `Root Directory`

📁 `config`

📁 `deploy`

📁 `website`