Implement Claude Web Search Support with Proper Streaming Translation #256

aeroxy · 2025-11-15T09:03:37Z

This PR adds comprehensive support for Claude's web search functionality (web_search_20250305 tool) in the proxy server, including proper handling of both streaming and non-streaming modes.

Key Features Added:

Web Search Request Detection & Translation
- Detect Claude requests containing web_search_20250305 tool type
- Convert these requests to /chat/retrieve endpoint format:
```
{
  "phase": "UNIFY",
  "query": "...",
  "enableIntention": false,
  "appCode": "COMPLEX_CHATBOT",
  "enableQueryRewrite": false
}
```
Endpoint Routing
- Route web search requests to /chat/retrieve instead of standard /chat/completions
- Standard Claude requests continue to use /chat/completions
Response Translation
- Convert /chat/retrieve responses back to proper Claude message format
- Handle both streaming and non-streaming response modes
- For streaming: Generate complete SSE event sequence for Claude clients:
  - event: message_start
  - event: content_block_start
  - Multiple event: content_block_delta events
  - event: content_block_stop
  - event: message_delta
  - event: message_stop
Streaming Mode Compatibility
- When upstream /messages request is streaming, convert complete /chat/retrieve JSON response to streaming SSE events
- When upstream /messages request is non-streaming, return complete Claude message
- Preserve original streaming behavior for standard OpenAI responses

Technical Implementation:

Request Processing: Enhanced ConvertClaudeRequestToOpenAI to detect and translate web search requests
Response Processing: Updated streaming translator to handle web search response format
Executor Logic: Modified OpenAICompatExecutor to route web search requests appropriately
State Management: Proper parameter state handling for streaming SSE event sequences

Files Changed:

internal/translator/openai/claude/openai_claude_request.go - Web search request detection and translation
internal/translator/openai/claude/openai_claude_response.go - Web search response conversion with streaming support
internal/runtime/executor/openai_compat_executor.go - Web search request routing and response handling

Compatibility:

✅ Maintains backward compatibility for existing Claude and OpenAI requests
✅ Proper streaming/non-streaming mode detection based on original request
✅ Correct SSE event sequence generation for Claude clients
✅ No breaking changes to existing functionality

Fixes web search handling where complete JSON responses from /chat/retrieve needed to be properly converted to streaming SSE events when the original upstream request was in streaming mode.

gemini-code-assist · 2025-11-15T09:03:58Z

Summary of Changes

Hello @aeroxy, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the proxy server's capabilities by introducing robust support for Claude's web search functionality. It establishes a seamless translation and routing mechanism for web search requests, converting them to an internal /chat/retrieve endpoint and then meticulously translating the responses back into Claude's native message format, ensuring full compatibility with both streaming and non-streaming interactions.

Highlights

Claude Web Search Integration: Adds comprehensive support for Claude's web_search_20250305 tool, enabling the proxy server to detect, translate, and process web search requests.
Request Translation and Routing: Claude web search requests are now identified, converted to the internal /chat/retrieve endpoint format, and routed appropriately, while standard requests continue to use /chat/completions.
Comprehensive Response Handling: Implements robust translation of /chat/retrieve responses back into the proper Claude message format, supporting both streaming (generating full SSE event sequences) and non-streaming modes.
Streaming Compatibility: Ensures correct streaming behavior for web search responses by converting complete JSON responses from /chat/retrieve into appropriate SSE events when the original upstream request was in streaming mode.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces comprehensive support for Claude's web search functionality by detecting, routing, and translating requests and responses appropriately for both streaming and non-streaming modes. The implementation is thorough and handles many edge cases, especially around generating the correct SSE event sequence for streaming web search responses. My review focuses on improving robustness, maintainability, and addressing a couple of potential issues. I've provided suggestions to refactor duplicated code, use safer JSON manipulation methods, remove redundant logic, and replace magic numbers with constants for better clarity.

gemini-code-assist · 2025-11-15T09:05:52Z

internal/runtime/executor/openai_compat_executor.go

+func isWebSearchRequest(translated []byte) bool {
+	// Check if the translated request has the web search marker
+	// This looks for the "_web_search_request":true field we add
+	return bytes.Contains(translated, []byte("\"_web_search_request\":true"))


Using bytes.Contains to check for the _web_search_request marker is fragile. It relies on the exact string representation, which could break if the JSON serialization changes (e.g., adds whitespace). Since gjson is already used in the project, it would be more robust to use it for this check.

Suggested change

return bytes.Contains(translated, []byte("\"_web_search_request\":true"))

return gjson.GetBytes(translated, "_web_search_request").Bool()

gemini-code-assist · 2025-11-15T09:05:52Z

internal/translator/openai/claude/openai_claude_request.go

+		result := make([]byte, len(webSearchRequest)+30)
+		copy(result, webSearchRequest[:len(webSearchRequest)-1]) // Copy everything except the closing brace
+		metadata := `,"_web_search_request":true}`
+		copy(result[len(webSearchRequest)-1:], metadata)
+		return result


The current implementation uses string manipulation and byte copying to add the _web_search_request field. This approach is brittle and can lead to invalid JSON if the input format changes. Since the rest of the file already uses the sjson library, it would be safer and more consistent to use it here as well. This also removes the need for the arbitrary buffer allocation of +30 bytes.

Suggested change

result := make([]byte, len(webSearchRequest)+30)

copy(result, webSearchRequest[:len(webSearchRequest)-1]) // Copy everything except the closing brace

metadata := `,"_web_search_request":true}`

copy(result[len(webSearchRequest)-1:], metadata)

return result

webSearchRequest, _ = sjson.SetBytes(webSearchRequest, "_web_search_request", true)

return webSearchRequest

gemini-code-assist · 2025-11-15T09:05:52Z

internal/runtime/executor/openai_compat_executor.go

+	// Check if this is a web search request (has special marker we added in translator)
+	isWebSearch := isWebSearchRequest(translated)
+
+	var url string
+	if isWebSearch {
+		url = strings.TrimSuffix(baseURL, "/") + "/chat/retrieve"
+	} else {
+		url = strings.TrimSuffix(baseURL, "/") + "/chat/completions"
+	}


The logic for determining if a request is for a web search and constructing the appropriate URL is duplicated in both Execute and ExecuteStream methods. To improve maintainability and avoid redundancy, consider extracting this logic into a private helper function.

For example, you could create a helper like this:

func (e *OpenAICompatExecutor) getRequestDetails(baseURL string, translated []byte) (string, bool) { isWebSearch := isWebSearchRequest(translated) path := "/chat/completions" if isWebSearch { path = "/chat/retrieve" } url := strings.TrimSuffix(baseURL, "/") + path return url, isWebSearch }

Then, you can call it in both Execute and ExecuteStream:
url, isWebSearch := e.getRequestDetails(baseURL, translated)

gemini-code-assist · 2025-11-15T09:05:52Z

internal/translator/openai/claude/openai_claude_request.go

+				if query == "" {
+					// Try to find text after common search phrases
+					searchPhrases := []string{
+						"perform a web search for the query:",
+						"perform a web search for:",
+						"web search for the query:",
+						"web search for:",
+						"search for the query:",
+						"search for:",
+						"query:",
+						"search query:",
+					}
+					for _, phrase := range searchPhrases {
+						phraseLower := strings.ToLower(phrase)
+						if idx := strings.Index(strings.ToLower(text), phraseLower); idx >= 0 {
+							query = strings.TrimSpace(text[idx+len(phrase):])
+							// Remove any trailing punctuation that might be part of the instruction
+							query = strings.TrimRight(query, ".!?")
+							if query != "" {
+								return false // stop iteration
+							}
+						}
+					}
+
+					// If still no query found, check if the entire text is a search-like query
+					if query == "" && (strings.Contains(strings.ToLower(text), "search") ||
+						strings.Contains(strings.ToLower(text), "find") ||
+						strings.Contains(strings.ToLower(text), "what") ||
+						strings.Contains(strings.ToLower(text), "how") ||
+						strings.Contains(strings.ToLower(text), "why") ||
+						strings.Contains(strings.ToLower(text), "when") ||
+						strings.Contains(strings.ToLower(text), "where")) {
+						trimmed := strings.TrimSpace(text)
+						if len(trimmed) > 5 { // Basic sanity check
+							query = trimmed
+							return false // stop iteration
+						}
+					}
+				}


This block for alternative query extraction is quite complex with multiple nested if statements and a hardcoded list of phrases. This can be simplified for better readability and maintainability.

The searchPhrases slice can be defined as a package-level constant.

The logic for checking keywords like search, find, what, etc., can be extracted into a helper function.

The nested if query == "" checks can be streamlined.

Here's a suggested refactoring:

// At package level var webSearchQueryPhrases = []string{ "perform a web search for the query:", "perform a web search for:", "web search for the query:", "web search for:", "search for the query:", "search for:", "query:", "search query:", } func isSearchLikeQuery(text string) bool { lowerText := strings.ToLower(text) keywords := []string{"search", "find", "what", "how", "why", "when", "where"} for _, kw := range keywords { if strings.Contains(lowerText, kw) { return true } } return false } // Inside extractWebSearchQuery, replacing the block: if query == "" { // Try to find text after common search phrases for _, phrase := range webSearchQueryPhrases { phraseLower := strings.ToLower(phrase) if idx := strings.Index(strings.ToLower(text), phraseLower); idx >= 0 { query = strings.TrimSpace(text[idx+len(phrase):]) query = strings.TrimRight(query, ".!?") if query != "" { return false // stop iteration } } } // If still no query found, check if the entire text is a search-like query if query == "" && isSearchLikeQuery(text) { trimmed := strings.TrimSpace(text) if len(trimmed) > 5 { // Basic sanity check query = trimmed return false // stop iteration } } }

gemini-code-assist · 2025-11-15T09:05:52Z

internal/translator/openai/claude/openai_claude_response.go

+		// Send content_block_delta with the content in chunks (to simulate streaming)
+		if len(contentText) > 0 {
+			// Break content into reasonable-sized chunks to simulate streaming (avoiding tiny chunks)
+			charsPerChunk := 100


There are a few magic numbers used for truncating search abstracts and chunking SSE data. It's a good practice to define these as named constants to improve readability and make them easier to modify in the future.

For example:

const ( webSearchAbstractMaxLength = 500 webSearchSSEChunkSize = 100 )

This applies to:

The abstract length limit of 500 on lines 792 and 899.

The SSE chunk size of 100 on line 925.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2025-11-15T09:06:57Z

internal/translator/openai/claude/openai_claude_request.go

+		webSearchRequest := createWebSearchRequestJSON(root)
+		// Add a metadata field to indicate this is a special web search request
+		result := make([]byte, len(webSearchRequest)+30)
+		copy(result, webSearchRequest[:len(webSearchRequest)-1]) // Copy everything except the closing brace
+		metadata := `,"_web_search_request":true}`
+		copy(result[len(webSearchRequest)-1:], metadata)
+		return result


Trim added metadata when emitting web search payload

In ConvertClaudeRequestToOpenAI the web-search branch allocates result := make([]byte, len(webSearchRequest)+30) and copies the base JSON plus ,"_web_search_request":true} (lines 30‑33). Because the slice is never resliced to the actual number of bytes written, the returned payload contains trailing \x00 bytes, which makes the HTTP body invalid JSON (Go’s encoding/json errors out with Extra data). Consequently every translated web-search request sent to /chat/retrieve will be rejected before it reaches the provider. Build the payload via append or slice result to the written length before returning so the JSON is valid.

Useful? React with 👍 / 👎.

luispater · 2025-11-16T11:07:58Z

I'm not quite clear on what the purpose of this PR is, where it's used, and which model provider offers the /chat/retrieve endpoint?

aeroxy · 2025-11-17T07:10:37Z

I'm not quite clear on what the purpose of this PR is, where it's used, and which model provider offers the /chat/retrieve endpoint?

Hey thx for the question. It's the open AI compatible standard for web search. I saw a similar feature has already been implemented for the codex provider but not the older open AI compatible provider.

- Added web search detection in request handling - Routes web search requests to /chat/retrieve endpoint - Implemented streaming and non-streaming response conversion - Added SSE event simulation for Claude Code compatibility - Updated dependencies and gitignore Files changed: - internal/runtime/executor/openai_compat_executor.go (+217 lines) - internal/translator/openai/claude/openai_claude_request.go (+167 lines) - internal/translator/openai/claude/openai_claude_response.go (+228 lines) - go.mod (dependency updates) - .gitignore (added /refs/* and .DS_Store)

luispater · 2025-11-17T12:48:09Z

Can you tell me which providers and clients support this feature?

How do I test the feature?

aeroxy · 2025-11-17T17:34:04Z

Can you tell me which providers and clients support this feature?

How do I test the feature?

You may try it with Minimax / Kimi / iFlow / Other Open AI Compatible providers

luispater · 2025-11-17T23:39:19Z

Can you tell me which providers and clients support this feature?
How do I test the feature?

You may try it with Minimax / Kimi / iFlow / Other Open AI Compatible providers

Which client?

aeroxy · 2025-11-18T02:19:51Z

Claude code or whichever client that uses messages end point

…

On Tue, Nov 18, 2025, 7:39 AM Luis Pater ***@***.***> wrote: *luispater* left a comment (router-for-me/CLIProxyAPI#256) <#256 (comment)> Can you tell me which providers and clients support this feature? How do I test the feature? You may try it with Minimax / Kimi / iFlow / Other Open AI Compatible providers Which client? — Reply to this email directly, view it on GitHub <#256 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAVCEW2LK7UIU44PH6VQZH335JMD3AVCNFSM6AAAAACMGAQF5OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTKNBUGMYDQNJZHA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

gemini-code-assist bot reviewed Nov 15, 2025

View reviewed changes

chatgpt-codex-connector bot reviewed Nov 15, 2025

View reviewed changes

aeroxy force-pushed the feat/web-search branch from 400c6bb to db6aade Compare November 17, 2025 08:58

	return bytes.Contains(translated, []byte("\"_web_search_request\":true"))
	return gjson.GetBytes(translated, "_web_search_request").Bool()

Uh oh!

Implement Claude Web Search Support with Proper Streaming Translation #256

Are you sure you want to change the base?

Implement Claude Web Search Support with Proper Streaming Translation #256

Conversation

aeroxy commented Nov 15, 2025

Key Features Added:

Technical Implementation:

Files Changed:

Compatibility:

Uh oh!

gemini-code-assist bot commented Nov 15, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 15, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 15, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 15, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 15, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 15, 2025

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Nov 15, 2025

Choose a reason for hiding this comment

Uh oh!

luispater commented Nov 16, 2025

Uh oh!

aeroxy commented Nov 17, 2025

Uh oh!

luispater commented Nov 17, 2025

Uh oh!

aeroxy commented Nov 17, 2025

Uh oh!

luispater commented Nov 17, 2025

Uh oh!

aeroxy commented Nov 18, 2025 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants