Skip to content

Commit eb71419

Browse files
committed
Refine include_content documentation and heuristics: clarify current vs. external repository handling, enhance examples, and update data source matching logic with additional contextual signals.
1 parent 8509bf8 commit eb71419

File tree

3 files changed

+121
-46
lines changed

3 files changed

+121
-46
lines changed

src/codealive_mcp_server.py

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -54,13 +54,21 @@
5454
- Remember that context from previous messages is maintained in the same conversation
5555
5656
CRITICAL - include_content parameter usage:
57-
- For the CURRENT repository (user's working directory): Use include_content=false
58-
* You already have file access via Read tool
59-
* Get file paths from search, then read them directly for latest content
60-
- For EXTERNAL repositories (not in working directory): Use include_content=true
61-
* You cannot access these files directly
62-
* Content must be included in search results
63-
- Compare repository URLs from get_data_sources with current git repo to identify which is which
57+
You MUST intelligently determine if searching CURRENT or EXTERNAL repositories:
58+
59+
- CURRENT repository (user's working directory): include_content=false
60+
* You have file access → Get paths from search, then use Read tool for latest content
61+
- EXTERNAL repositories (not in working directory): include_content=true
62+
* No file access → Must include content in search results
63+
64+
Use these heuristics to identify CURRENT vs EXTERNAL (combine multiple signals):
65+
1. Repository/directory name matching (e.g., working in "my-app", repo named "my-app")
66+
2. Description matching observed codebase (tech stack, architecture, features)
67+
3. User's language ("this repo", "our code" = CURRENT; "the X service" = EXTERNAL)
68+
4. URL matching with git remote (when available)
69+
5. Working context (files you've been reading/editing match this repo)
70+
71+
When uncertain, use context: Is user asking about their current work or a different service?
6472
6573
Flexible data source usage:
6674
- You can use a workspace name as a single data source to search or chat across all its repositories at once

src/tools/datasources.py

Lines changed: 31 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -24,14 +24,16 @@ async def get_data_sources(ctx: Context, alive_only: bool = True) -> str:
2424
Returns:
2525
A formatted list of available data sources with the following information for each:
2626
- id: Unique identifier for the data source
27-
- name: Human-readable name of the repository or workspace, used in other API calls
28-
- description: Summary of the codebase contents to guide search and chat usage
27+
- name: Human-readable name - CRITICAL for matching with current working directory name
28+
- description: Summary of codebase contents - CRITICAL for identifying if this matches your
29+
current working codebase (compare tech stack, architecture, features you've observed)
2930
- type: The type of data source ("Repository" or "Workspace")
30-
- url: URL of the repository (for Repository type only)
31-
IMPORTANT: Use this URL to identify if a repository matches your current working directory.
32-
Compare with your local git remote URL to determine if it's the current or external repo.
31+
- url: Repository URL (for Repository type only) - useful for matching with git remote
3332
- state: The processing state of the data source (if alive_only=false)
3433
34+
Use name + description + url together to determine if a repository is the CURRENT one
35+
you're working in versus an EXTERNAL repository.
36+
3537
Examples:
3638
1. Get only ready-to-use data sources:
3739
get_data_sources()
@@ -44,12 +46,30 @@ async def get_data_sources(ctx: Context, alive_only: bool = True) -> str:
4446
Other states include "New" (just added), "Processing" (being indexed),
4547
"Failed" (indexing failed), etc.
4648
47-
CRITICAL for optimizing include_content parameter:
48-
- Compare repository URLs with your current git remote URL (git config --get remote.origin.url)
49-
- If URLs match: This is your CURRENT repository
50-
→ Use include_content=false in codebase_search, then read files with Read tool
51-
- If URLs don't match: This is an EXTERNAL repository
52-
→ Use include_content=true in codebase_search to get content directly
49+
CRITICAL - Use ALL available information to identify CURRENT vs EXTERNAL repositories:
50+
51+
Heuristic signals to combine (in order of reliability):
52+
1. **Name matching**: Does repo name match your current working directory name?
53+
Example: In "/Users/bob/my-app" and repo name is "my-app" → CURRENT
54+
55+
2. **Description matching**: Does description match what you've observed in the codebase?
56+
- Tech stack (Python, JavaScript, FastAPI, React, etc.)
57+
- Architecture patterns (microservices, monolith, MCP server, etc.)
58+
- Key features mentioned
59+
Example: Description says "FastAPI MCP server" and you see FastAPI + MCP code → CURRENT
60+
61+
3. **User context**: What is the user asking about?
62+
- "this repo", "our code", "my project" → CURRENT
63+
- "the payments service", "external API" → EXTERNAL
64+
65+
4. **URL matching** (when available): Compare with git remote URL
66+
Note: May have format differences (SSH vs HTTPS), but hostname + path should match
67+
68+
5. **Working history**: Have you been reading/editing files that align with this repo?
69+
70+
**Decision rule**:
71+
- CURRENT repo → include_content=false in codebase_search (use Read tool for files)
72+
- EXTERNAL repo → include_content=true in codebase_search (no file access)
5373
5474
Use the returned data source names with the codebase_search and codebase_consultant functions.
5575
"""

src/tools/search.py

Lines changed: 75 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -67,51 +67,94 @@ async def codebase_search(
6767
6868
include_content: Whether to include full file content in results (default: false).
6969
70-
IMPORTANT - When to include content:
71-
- For EXTERNAL repositories (not in your current working directory):
72-
SET TO TRUE - you don't have file access, so you need the content.
73-
- For CURRENT repository (the one you're working in):
74-
SET TO FALSE - you already have file access via Read tool, so just get
75-
file paths and read them directly for the latest content.
70+
CRITICAL RULE - When to include content:
71+
- CURRENT repository (user's working directory): include_content=false
72+
→ You have file access via Read tool - get paths only, then read for latest content
73+
- EXTERNAL repositories (not in working directory): include_content=true
74+
→ You cannot access files - must get content in search results
7675
77-
How to identify current vs external repositories:
78-
- Compare repository URLs from get_data_sources with your current git repo URL
79-
- Current repo: Use include_content=false, then use Read tool on result paths
80-
- External repos: Use include_content=true to get the content directly
76+
How to identify CURRENT vs EXTERNAL repositories (use ALL available clues):
8177
82-
Note: Indexed content may be from a different branch than your local state.
78+
1. **Repository name matching**:
79+
- Does the repo name match your current working directory name?
80+
- Example: Working in "/Users/bob/my-app" and repo name is "my-app" → likely CURRENT
81+
82+
2. **Repository description analysis**:
83+
- Does the description match what you've observed in the codebase?
84+
- Check tech stack, architecture, features mentioned in description
85+
- Example: Description says "Python FastAPI server" and you see FastAPI files → likely CURRENT
86+
87+
3. **User's question context**:
88+
- Does user say "this repo", "our code", "the current project", "my app"? → CURRENT
89+
- Does user reference "the X service", "external repo", "other project"? → EXTERNAL
90+
91+
4. **URL matching** (when available):
92+
- Compare repo URL from get_data_sources with git remote URL
93+
- Note: May not always be available or matchable
94+
95+
5. **Working context**:
96+
- Have you been reading/editing files that match this repo's structure?
97+
- Do file paths in your recent operations align with this repository?
98+
99+
**Default heuristic when uncertain**:
100+
- If user is asking about code in their working directory → CURRENT (include_content=false)
101+
- If user is asking about a different/external service → EXTERNAL (include_content=true)
102+
- When truly ambiguous, prefer include_content=false for repos that seem related to current work
83103
84104
Returns:
85105
Search results as JSON including source info, file paths, line numbers, and code snippets.
86106
87107
Examples:
88-
1. Search CURRENT repository (you have file access):
108+
1. Search CURRENT repository (identified by directory name + context):
109+
# Working in "/Users/bob/codealive-mcp"
110+
# User asks: "Where is the search tool implemented in this project?"
111+
# Repo name from get_data_sources: "codealive-mcp"
112+
# → Name matches directory, user says "this project" → CURRENT
89113
codebase_search(
90-
query="Where is user authentication handled?",
91-
data_sources=["my-current-repo"],
92-
include_content=false # Get paths only, then use Read tool
114+
query="Where is the search tool implemented?",
115+
data_sources=["codealive-mcp"],
116+
include_content=false # Current repo - get paths, use Read tool
93117
)
94-
# Then read the files: Read(file_path="/path/from/results")
118+
# Then: Read(file_path="/Users/bob/codealive-mcp/src/tools/search.py")
95119
96-
2. Search EXTERNAL repository (no file access):
120+
2. Search CURRENT repository (identified by description matching):
121+
# Working in Python FastMCP project
122+
# Description: "Python MCP server using FastMCP framework"
123+
# You've been reading FastMCP code in this directory → CURRENT
97124
codebase_search(
98-
query="How does the payment service validate cards?",
99-
data_sources=["external-payments-repo"],
100-
include_content=true # Need content, can't read files directly
125+
query="How is the lifespan context managed?",
126+
data_sources=["my-mcp-server"],
127+
include_content=false # Description matches observed codebase
101128
)
102129
103-
3. Workspace-wide question across external repos:
130+
3. Search EXTERNAL repository (different service):
131+
# Working in "frontend-app"
132+
# User asks: "How does the payments service handle refunds?"
133+
# Repo: "payments-service" → Different service → EXTERNAL
104134
codebase_search(
105-
query="How do microservices talk to the billing API?",
106-
data_sources=["backend-team"],
107-
include_content=true # External workspace, include content
135+
query="How are refunds processed?",
136+
data_sources=["payments-service"],
137+
include_content=true # External service - need content
108138
)
109139
110-
4. Mixed query with known identifier:
140+
4. Search EXTERNAL workspace (multiple external repos):
141+
# User asks about backend services, but you're in frontend repo
111142
codebase_search(
112-
query="Where do we validate JWTs (AuthService)?",
113-
data_sources=["repo123"],
114-
include_content=false # Current repo, read files separately
143+
query="How do microservices authenticate API calls?",
144+
data_sources=["backend-workspace"],
145+
include_content=true # External workspace
146+
)
147+
148+
5. Ambiguous case - use context:
149+
# User: "Check how authentication works in our API"
150+
# Working in "api-server" directory
151+
# Repo name: "company-api" (slightly different but plausible match)
152+
# Description: "REST API server with authentication"
153+
# → User says "our API", description matches → Likely CURRENT
154+
codebase_search(
155+
query="authentication implementation",
156+
data_sources=["company-api"],
157+
include_content=false # Context suggests current repo
115158
)
116159
117160
Note:
@@ -121,6 +164,10 @@ async def codebase_search(
121164
- Prefer natural-language questions; templates are unnecessary.
122165
- Start with "auto" for best semantic results; escalate to "deep" only if needed.
123166
- If you know precise symbols (functions/classes), include them to narrow scope.
167+
168+
CRITICAL: Always call get_data_sources() first to get repository names, descriptions, and URLs.
169+
Then use the heuristics above to determine include_content for each search.
170+
The description field is especially valuable for matching repositories to your working context.
124171
"""
125172
context: CodeAliveContext = ctx.request_context.lifespan_context
126173

0 commit comments

Comments
 (0)