@@ -67,51 +67,94 @@ async def codebase_search(
6767
6868 include_content: Whether to include full file content in results (default: false).
6969
70- IMPORTANT - When to include content:
71- - For EXTERNAL repositories (not in your current working directory):
72- SET TO TRUE - you don't have file access, so you need the content.
73- - For CURRENT repository (the one you're working in):
74- SET TO FALSE - you already have file access via Read tool, so just get
75- file paths and read them directly for the latest content.
70+ CRITICAL RULE - When to include content:
71+ - CURRENT repository (user's working directory): include_content=false
72+ → You have file access via Read tool - get paths only, then read for latest content
73+ - EXTERNAL repositories (not in working directory): include_content=true
74+ → You cannot access files - must get content in search results
7675
77- How to identify current vs external repositories:
78- - Compare repository URLs from get_data_sources with your current git repo URL
79- - Current repo: Use include_content=false, then use Read tool on result paths
80- - External repos: Use include_content=true to get the content directly
76+ How to identify CURRENT vs EXTERNAL repositories (use ALL available clues):
8177
82- Note: Indexed content may be from a different branch than your local state.
78+ 1. **Repository name matching**:
79+ - Does the repo name match your current working directory name?
80+ - Example: Working in "/Users/bob/my-app" and repo name is "my-app" → likely CURRENT
81+
82+ 2. **Repository description analysis**:
83+ - Does the description match what you've observed in the codebase?
84+ - Check tech stack, architecture, features mentioned in description
85+ - Example: Description says "Python FastAPI server" and you see FastAPI files → likely CURRENT
86+
87+ 3. **User's question context**:
88+ - Does user say "this repo", "our code", "the current project", "my app"? → CURRENT
89+ - Does user reference "the X service", "external repo", "other project"? → EXTERNAL
90+
91+ 4. **URL matching** (when available):
92+ - Compare repo URL from get_data_sources with git remote URL
93+ - Note: May not always be available or matchable
94+
95+ 5. **Working context**:
96+ - Have you been reading/editing files that match this repo's structure?
97+ - Do file paths in your recent operations align with this repository?
98+
99+ **Default heuristic when uncertain**:
100+ - If user is asking about code in their working directory → CURRENT (include_content=false)
101+ - If user is asking about a different/external service → EXTERNAL (include_content=true)
102+ - When truly ambiguous, prefer include_content=false for repos that seem related to current work
83103
84104 Returns:
85105 Search results as JSON including source info, file paths, line numbers, and code snippets.
86106
87107 Examples:
88- 1. Search CURRENT repository (you have file access):
108+ 1. Search CURRENT repository (identified by directory name + context):
109+ # Working in "/Users/bob/codealive-mcp"
110+ # User asks: "Where is the search tool implemented in this project?"
111+ # Repo name from get_data_sources: "codealive-mcp"
112+ # → Name matches directory, user says "this project" → CURRENT
89113 codebase_search(
90- query="Where is user authentication handled ?",
91- data_sources=["my-current-repo "],
92- include_content=false # Get paths only, then use Read tool
114+ query="Where is the search tool implemented ?",
115+ data_sources=["codealive-mcp "],
116+ include_content=false # Current repo - get paths, use Read tool
93117 )
94- # Then read the files : Read(file_path="/path/from/results ")
118+ # Then: Read(file_path="/Users/bob/codealive-mcp/src/tools/search.py ")
95119
96- 2. Search EXTERNAL repository (no file access):
120+ 2. Search CURRENT repository (identified by description matching):
121+ # Working in Python FastMCP project
122+ # Description: "Python MCP server using FastMCP framework"
123+ # You've been reading FastMCP code in this directory → CURRENT
97124 codebase_search(
98- query="How does the payment service validate cards ?",
99- data_sources=["external-payments-repo "],
100- include_content=true # Need content, can't read files directly
125+ query="How is the lifespan context managed ?",
126+ data_sources=["my-mcp-server "],
127+ include_content=false # Description matches observed codebase
101128 )
102129
103- 3. Workspace-wide question across external repos:
130+ 3. Search EXTERNAL repository (different service):
131+ # Working in "frontend-app"
132+ # User asks: "How does the payments service handle refunds?"
133+ # Repo: "payments-service" → Different service → EXTERNAL
104134 codebase_search(
105- query="How do microservices talk to the billing API ?",
106- data_sources=["backend-team "],
107- include_content=true # External workspace, include content
135+ query="How are refunds processed ?",
136+ data_sources=["payments-service "],
137+ include_content=true # External service - need content
108138 )
109139
110- 4. Mixed query with known identifier:
140+ 4. Search EXTERNAL workspace (multiple external repos):
141+ # User asks about backend services, but you're in frontend repo
111142 codebase_search(
112- query="Where do we validate JWTs (AuthService)?",
113- data_sources=["repo123"],
114- include_content=false # Current repo, read files separately
143+ query="How do microservices authenticate API calls?",
144+ data_sources=["backend-workspace"],
145+ include_content=true # External workspace
146+ )
147+
148+ 5. Ambiguous case - use context:
149+ # User: "Check how authentication works in our API"
150+ # Working in "api-server" directory
151+ # Repo name: "company-api" (slightly different but plausible match)
152+ # Description: "REST API server with authentication"
153+ # → User says "our API", description matches → Likely CURRENT
154+ codebase_search(
155+ query="authentication implementation",
156+ data_sources=["company-api"],
157+ include_content=false # Context suggests current repo
115158 )
116159
117160 Note:
@@ -121,6 +164,10 @@ async def codebase_search(
121164 - Prefer natural-language questions; templates are unnecessary.
122165 - Start with "auto" for best semantic results; escalate to "deep" only if needed.
123166 - If you know precise symbols (functions/classes), include them to narrow scope.
167+
168+ CRITICAL: Always call get_data_sources() first to get repository names, descriptions, and URLs.
169+ Then use the heuristics above to determine include_content for each search.
170+ The description field is especially valuable for matching repositories to your working context.
124171 """
125172 context : CodeAliveContext = ctx .request_context .lifespan_context
126173
0 commit comments