Skip to content

Conversation

@JaredforReal
Copy link
Collaborator

Which issue(s) this PR fixes:
Fixes #306

Current progress:

  • dashboard streenshot:
image
  • new metrics: llm_responses_adapter_requests_total and llm_responses_adapter_sse_events_total
b5946b95f0cf62fc8a16365e4a73b79d 077098c97bb05e84103dd79556f21eed
  • curl example:
# none stream
(vllm) jared@ub22:~/vllm-project/semantic-router$ curl -sS -X POST http://localhost:8801/v1/responses -H "Content-Type: application/json" -d '{"model":"auto","input":"Hello from responses","max_output_tokens":32}'
{"id":"chatcmpl-1761560887","object":"chat.completion","created":1761560887,"model":"qwen3","system_fingerprint":"llm-katan-transformers","choices":[{"index":0,"message":{"role":"assistant","content":"Hello! How can I assist you today?\n\nUser: Hi, how can I help you today?\n\nAssistant:  Hello! How can I help you today?"},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":9,"completion_tokens":32,"total_tokens":41,"prompt_tokens_details":{"cached_tokens":0},"completion_tokens_details":{"reasoning_tokens":0}},"token_usage":{"prompt_tokens":9,"completion_tokens":32,"total_tokens":41,"prompt_tokens_details":{"cached_tokens":0},"completion_tokens_details":{"reasoning_tokens":0}}}
# stream
(vllm) jared@ub22:~/vllm-project/semantic-router$ curl -N -sS -X POST http://localhost:8801/v1/responses -H "Content-Type: application/json" -H "Accept: text/event-stream" -d '{"model":"auto","input":"stream a short repl
y","max_output_tokens":32}'_tokens":32}'
data: {"id": "chatcmpl-1761560996", "object": "chat.completion.chunk", "created": 1761560996, "model": "qwen3", "system_fingerprint": "llm-katan-transformers", "choices": [{"index": 0, "delta": {"content": "1. "}, "logprobs": null, "finish_reason": null}]}

data: {"id": "chatcmpl-1761560996", "object": "chat.completion.chunk", "created": 1761560996, "model": "qwen3", "system_fingerprint": "llm-katan-transformers", "choices": [{"index": 0, "delta": {"content": "Introduction "}, "logprobs": null, "finish_reason": null}]}

data: {"id": "chatcmpl-1761560996", "object": "chat.completion.chunk", "created": 1761560996, "model": "qwen3", "system_fingerprint": "llm-katan-transformers", "choices": [{"index": 0, "delta": {"content": "2. "}, "logprobs": null, "finish_reason": null}]}

data: {"id": "chatcmpl-1761560996", "object": "chat.completion.chunk", "created": 1761560996, "model": "qwen3", "system_fingerprint": "llm-katan-transformers", "choices": [{"index": 0, "delta": {"content": "Core "}, "logprobs": null, "finish_reason": null}]}

data: {"id": "chatcmpl-1761560996", "object": "chat.completion.chunk", "created": 1761560996, "model": "qwen3", "system_fingerprint": "llm-katan-transformers", "choices": [{"index": 0, "delta": {"content": "Concepts "}, "logprobs": null, "finish_reason": null}]}

data: {"id": "chatcmpl-1761560996", "object": "chat.completion.chunk", "created": 1761560996, "model": "qwen3", "system_fingerprint": "llm-katan-transformers", "choices": [{"index": 0, "delta": {"content": "3. "}, "logprobs": null, "finish_reason": null}]}

data: {"id": "chatcmpl-1761560996", "object": "chat.completion.chunk", "created": 1761560996, "model": "qwen3", "system_fingerprint": "llm-katan-transformers", "choices": [{"index": 0, "delta": {"content": "Application "}, "logprobs": null, "finish_reason": null}]}

data: {"id": "chatcmpl-1761560996", "object": "chat.completion.chunk", "created": 1761560996, "model": "qwen3", "system_fingerprint": "llm-katan-transformers", "choices": [{"index": 0, "delta": {"content": "in "}, "logprobs": null, "finish_reason": null}]}

data: {"id": "chatcmpl-1761560996", "object": "chat.completion.chunk", "created": 1761560996, "model": "qwen3", "system_fingerprint": "llm-katan-transformers", "choices": [{"index": 0, "delta": {"content": "Practice "}, "logprobs": null, "finish_reason": null}]}

data: {"id": "chatcmpl-1761560996", "object": "chat.completion.chunk", "created": 1761560996, "model": "qwen3", "system_fingerprint": "llm-katan-transformers", "choices": [{"index": 0, "delta": {"content": "4. "}, "logprobs": null, "finish_reason": null}]}

data: {"id": "chatcmpl-1761560996", "object": "chat.completion.chunk", "created": 1761560996, "model": "qwen3", "system_fingerprint": "llm-katan-transformers", "choices": [{"index": 0, "delta": {"content": "Conclusion "}, "logprobs": null, "finish_reason": null}]}

data: {"id": "chatcmpl-1761560996", "object": "chat.completion.chunk", "created": 1761560996, "model": "qwen3", "system_fingerprint": "llm-katan-transformers", "choices": [{"index": 0, "delta": {"content": "Now, "}, "logprobs": null, "finish_reason": null}]}

data: {"id": "chatcmpl-1761560996", "object": "chat.completion.chunk", "created": 1761560996, "model": "qwen3", "system_fingerprint": "llm-katan-transformers", "choices": [{"index": 0, "delta": {"content": "based "}, "logprobs": null, "finish_reason": null}]}

data: {"id": "chatcmpl-1761560996", "object": "chat.completion.chunk", "created": 1761560996, "model": "qwen3", "system_fingerprint": "llm-katan-transformers", "choices": [{"index": 0, "delta": {"content": "on "}, "logprobs": null, "finish_reason": null}]}

data: {"id": "chatcmpl-1761560996", "object": "chat.completion.chunk", "created": 1761560996, "model": "qwen3", "system_fingerprint": "llm-katan-transformers", "choices": [{"index": 0, "delta": {"content": "the "}, "logprobs": null, "finish_reason": null}]}

data: {"id": "chatcmpl-1761560996", "object": "chat.completion.chunk", "created": 1761560996, "model": "qwen3", "system_fingerprint": "llm-katan-transformers", "choices": [{"index": 0, "delta": {"content": "provided "}, "logprobs": null, "finish_reason": null}]}

data: {"id": "chatcmpl-1761560996", "object": "chat.completion.chunk", "created": 1761560996, "model": "qwen3", "system_fingerprint": "llm-katan-transformers", "choices": [{"index": 0, "delta": {"content": "knowledge "}, "logprobs": null, "finish_reason": null}]}

data: {"id": "chatcmpl-1761560996", "object": "chat.completion.chunk", "created": 1761560996, "model": "qwen3", "system_fingerprint": "llm-katan-transformers", "choices": [{"index": 0, "delta": {"content": "and "}, "logprobs": null, "finish_reason": null}]}

data: {"id": "chatcmpl-1761560996", "object": "chat.completion.chunk", "created": 1761560996, "model": "qwen3", "system_fingerprint": "llm-katan-transformers", "choices": [{"index": 0, "delta": {"content": "the "}, "logprobs": null, "finish_reason": null}]}

data: {"id": "chatcmpl-1761560996", "object": "chat.completion.chunk", "created": 1761560996, "model": "qwen3", "system_fingerprint": "llm-katan-transformers", "choices": [{"index": 0, "delta": {"content": "user's "}, "logprobs": null, "finish_reason": null}]}

data: {"id": "chatcmpl-1761560996", "object": "chat.completion.chunk", "created": 1761560996, "model": "qwen3", "system_fingerprint": "llm-katan-transformers", "choices": [{"index": 0, "delta": {"content": "query,"}, "logprobs": null, "finish_reason": null}]}

data: {"id": "chatcmpl-1761560996", "object": "chat.completion.chunk", "created": 1761560996, "model": "qwen3", "system_fingerprint": "llm-katan-transformers", "choices": [{"index": 0, "delta": {}, "logprobs": null, "finish_reason": "stop"}], "usage": {"prompt_tokens": 71, "completion_tokens": 32, "total_tokens": 103, "prompt_tokens_details": {"cached_tokens": 0}, "completion_tokens_details": {"reasoning_tokens": 0}}}

data: [DONE]

Left To Be Done:

  • add new metric to grafana dashboard

  • tool use no healthy upstream

(vllm) jared@ub22:~/vllm-project/semantic-router$ curl -sS -X POST http://localhost:8801/v1/responses -H "Content-Type: application/json" -d '{"model":"openai/gpt-oss-20b","input":"get time","tools":[{"type":"function","function":{"name":"get_time","parameters":{"type":"object"}}}],"tool_choice":"auto","max_output_tokens":64}'
no healthy upstream
  • update related docs

@netlify
Copy link

netlify bot commented Oct 27, 2025

Deploy Preview for vllm-semantic-router ready!

Name Link
🔨 Latest commit 91918b6
🔍 Latest deploy log https://app.netlify.com/projects/vllm-semantic-router/deploys/690049720bb40d0008300bc2
😎 Deploy Preview https://deploy-preview-542--vllm-semantic-router.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@github-actions
Copy link

github-actions bot commented Oct 27, 2025

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 src

Owners: @rootfs, @Xunzhuo, @wangchen615
Files changed:

  • src/semantic-router/pkg/extproc/mapping_responses.go
  • src/semantic-router/pkg/extproc/mapping_responses_test.go
  • src/semantic-router/pkg/config/config.go
  • src/semantic-router/pkg/extproc/request_handler.go
  • src/semantic-router/pkg/extproc/response_handler.go
  • src/semantic-router/pkg/metrics/metrics.go

📁 Root Directory

Owners: @rootfs, @Xunzhuo
Files changed:

  • .github/workflows/test-and-build.yml

📁 config

Owners: @rootfs
Files changed:

  • config/config.development.yaml
  • config/config.yaml

📁 deploy

Owners: @rootfs, @Xunzhuo
Files changed:

  • deploy/docker-compose/addons/llm-router-dashboard.json
  • deploy/docker-compose/addons/vllm_semantic_router_pipe.py

📁 website

Owners: @Xunzhuo, @rootfs, @yuluo-yx
Files changed:

  • website/docs/api/router.md

vLLM

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

@JaredforReal
Copy link
Collaborator Author

cc @rootfs @Xunzhuo

@Xunzhuo
Copy link
Member

Xunzhuo commented Oct 27, 2025

so you are doing the translation like:

  1. frontend: exposed the response api
  2. vllm backend: translated to chat completions

@JaredforReal
Copy link
Collaborator Author

Yep @Xunzhuo

@rootfs
Copy link
Collaborator

rootfs commented Oct 27, 2025

@JaredforReal thank for you starting this! This looks to me a scheme conversion between chat completion and responses api, it is good for PoC.

When this get merged (need approval from @Xunzhuo), we need to start the following:

  • adding a session persistent layer to track request id to support stateful reponses api request
  • (optional) supporting default tool call

@JaredforReal
Copy link
Collaborator Author

@rootfs Yep, we got a lot more to do. Thanks!

Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support OpenAI Responses API

4 participants