-
Notifications
You must be signed in to change notification settings - Fork 6
Send chat data source names via names payload #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary of ChangesHello @sciapan, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refactors how data sources are handled and communicated within the chat consultant and search tools. The core change involves transitioning from using internal data source IDs to more user-friendly data source names in API requests and function parameters. This update streamlines the interface for specifying data sources, improves clarity in the codebase, and prepares for future enhancements, while also ensuring backward compatibility for existing integrations. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request effectively refactors the codebase to use data source names instead of IDs when communicating with the backend, simplifying client-side logic. The changes are well-tested and cover all relevant tool implementations and their documentation. I've identified one high-severity issue regarding an inconsistency in how data source names are processed in the search tool compared to the chat tool, which could lead to bugs. I've also included a couple of medium-severity suggestions to improve code maintainability in the utility functions. Overall, this is a good change that improves the tool's interface.
| # Normalize data source IDs (handles Claude Desktop serialization issues) | ||
| data_source_ids = normalize_data_source_ids(data_source_ids) | ||
| # Normalize data source names (handles Claude Desktop serialization issues) | ||
| data_source_names = normalize_data_source_names(data_sources) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's an inconsistency here compared to tools/chat.py. The codebase_consultant function uses both normalize_data_source_names and format_data_source_names to process the data_sources. This ensures names are cleaned (e.g., whitespace stripped and empty values removed). This function only normalizes but doesn't format, which could lead to sending un-trimmed or empty names to the API.
To ensure consistent behavior, you should also apply format_data_source_names here. Note that you'll need to import format_data_source_names from utils for this change to work.
| data_source_names = normalize_data_source_names(data_sources) | |
| data_source_names = normalize_data_source_names(data_sources) | |
| data_source_names = format_data_source_names(data_source_names) |
| elif isinstance(ds, dict): | ||
| name = ds.get("name") or ds.get("id") | ||
| if isinstance(name, str): | ||
| name = name.strip() | ||
| if name: | ||
| formatted.append(name) | ||
| elif name is not None: | ||
| formatted.append(str(name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic inside this elif block can be simplified to improve readability and handle both string and non-string names more consistently.
| elif isinstance(ds, dict): | |
| name = ds.get("name") or ds.get("id") | |
| if isinstance(name, str): | |
| name = name.strip() | |
| if name: | |
| formatted.append(name) | |
| elif name is not None: | |
| formatted.append(str(name)) | |
| elif isinstance(ds, dict): | |
| name = ds.get("name") or ds.get("id") | |
| if name is not None: | |
| name_str = str(name).strip() | |
| if name_str: | |
| formatted.append(name_str) |
| return formatted | ||
|
|
||
|
|
||
| # Backward compatibility aliases for legacy imports |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's good practice to add a TODO comment here to remind developers to remove these backward compatibility aliases in the future, once all call sites have been updated to use the new _names functions.
| # Backward compatibility aliases for legacy imports | |
| # TODO: Remove these backward compatibility aliases once all call sites are updated. | |
| # Backward compatibility aliases for legacy imports |
🔍 Code ReviewHello sciapan (@sciapan), Thanks for your work on sending chat data source names! This change introduces important functionality for handling data source identification, enhancing how chat interactions reference their origins. Overall, the implementation introduces a flexible approach to data source naming, but it also highlights critical areas for robustness and security that need attention. 🎯 Key Changes & Highlights✨ **Flexible Data Source Naming**The introduction of 🚨 Issues & Suggestions🐛 Missing JSON Error HandlingThe `normalize_data_source_names` function lacks error handling for invalid JSON strings, potentially causing application crashes.Problem: Fix: # Example of fix:
if stripped.startswith('['):
try:
parsed = json.loads(stripped)
if isinstance(parsed, list):
return parsed
except json.JSONDecodeError:
# Fallback to treating as a plain string if JSON parsing fails
pass🔐 Backend Security BurdenClient-side flexibility in data source naming places a significant security burden on the backend, requiring robust validation.Problem: Fix:
🏗️ Deprecate
|
| codebase_consultant( | ||
| question="How do the microservices communicate?", | ||
| data_sources=["workspace_123", "repo_456"] | ||
| data_sources=["workspace:platform", "repo:payments"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
некорректные примеры - нейронка наверняка подумает, что это спец. префиксы. вот так можно сделать:
data_sources=["repo or workspace name"]
| data_sources: List of data source names to search in (required). | ||
| Can be workspace names (search all repositories in the workspace) | ||
| or individual repository names for targeted searches. | ||
| Example: ["enterprise-platform", "workspace:payments-team"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
снова лишний префикс (workspace:)
| 3. Workspace-wide question: | ||
| codebase_search(query="How do microservices talk to the billing API?", data_source_ids=["workspace456"]) | ||
| codebase_search(query="How do microservices talk to the billing API?", data_sources=["workspace:backend-team"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
префикс
| 4. Mixed query with a known identifier: | ||
| codebase_search(query="Where do we validate JWTs (AuthService)?", data_source_ids=["repo123"]) | ||
| codebase_search(query="Where do we validate JWTs (AuthService)?", data_sources=["repo-main"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"main" лучше убрать из текста, чтобы не сбивать ллмку с толку о том, что это ветка
Summary
namesfield instead ofdataSourcesformat_data_source_namesto return plain name strings and refresh related unit testsTesting
https://chatgpt.com/codex/tasks/task_e_68eff26921a0832ba5f07cdb3d1e3f9a