MCPCat
diff --git a/‎plans/issue-tracking.md‎
Lines changed: 234 additions & 0 deletions b/‎plans/issue-tracking.md‎
Lines changed: 234 additions & 0 deletions
@@ -0,0 +1,234 @@
+# MCPcat Issue Tracking Implementation Plan
+
+## Context & Background
+
+### What is Issue Tracking?
+
+Issue tracking groups similar errors together so users can triage and resolve problems efficiently. Instead of viewing thousands of individual error events, users see dozens of unique "Issues" - each representing a distinct problem in their code.
+
+For example, if a tool fails with "Cannot read property 'x' of undefined" 500 times, that creates one Issue with 500 occurrences, not 500 separate events to investigate.
+
+### Why We're Building This
+
+MCPcat currently tracks tool usage analytics but doesn't provide error debugging capabilities. Users need to:
+
+- Identify which errors are occurring in their MCP servers
+- Understand error frequency and impact
+- See stack traces and context to debug issues
+- Track error resolution over time
+
+This feature makes MCPcat a complete observability platform for MCP servers - both analytics and error monitoring.
+
+### Architecture Approach
+
+We're following Sentry's proven approach to error grouping:
+
+**SDKs Capture Raw Data:**
+
+- Exception chains (root error + all causes)
+- Stack traces with frame details
+- Basic "in_app" detection (user code vs library code)
+- No fingerprinting or grouping logic
+
+**Backend Handles Grouping:**
+
+- Applies multiple fingerprinting strategies
+- Generates hashes from exception types and stack frames
+- Matches to existing Issues or creates new ones
+- Manages Issue lifecycle (open, resolved, ignored)
+
+This division of responsibility keeps SDKs lightweight while allowing backend flexibility to improve grouping algorithms without SDK updates.
+
+### Key Architecture Decisions
+
+1. **Automatic Capture**: No user configuration required - errors are captured transparently when they occur
+2. **Additive Integration**: New `exception_chains` field added to existing events, no breaking changes
+3. **Language Parity**: Same JSON schema across TypeScript, Python, and Go for backend compatibility
+4. **Privacy First**: No source code content or local variables initially (can add with opt-in later)
+5. **Backend Fingerprinting**: SDKs send raw data, backend determines how to group (allows algorithm improvements)
+
+### Learning Resources
+
+**Sentry Documentation:**
+
+- [Error Grouping Overview](https://docs.sentry.io/concepts/data-management/event-grouping/)
+- [Default Grouping Algorithms](https://docs.sentry.io/concepts/data-management/event-grouping/#default-error-grouping-algorithms)
+- [Event Payload Structure](https://develop.sentry.dev/sdk/event-payloads/)
+- [Exception Interface](https://develop.sentry.dev/sdk/event-payloads/exception/)
+- [Stack Trace Interface](https://develop.sentry.dev/sdk/event-payloads/stacktrace/)
+
+**Sentry SDK Source Code** (reference implementations):
+
+- [Python SDK - Exception Capture](https://github.com/getsentry/sentry-python/blob/master/sentry_sdk/integrations/excepthook.py)
+- [JavaScript SDK - Stack Trace Parsing](https://github.com/getsentry/sentry-javascript/tree/develop/packages/browser/src/stack-parsers.ts)
+
+**Research Document:**
+
+- See `plans/mcpcat_error_tracing_research.md` for detailed analysis of Sentry's approach
+
+---
+
+## 1. Overview
+
+Add automatic exception tracking to MCPcat SDKs. When errors occur, SDKs will capture exception chains with stack traces and send them to the backend as a new `exception_chains` field on existing events. Backend will handle fingerprinting and grouping into Issues.
+
+**Key Principles:**
+
+- Automatic capture, no configuration
+- Additive only - no breaking changes
+- Same JSON schema across all languages
+- SDKs capture raw data, backend does fingerprinting
+
+## 2. Data Structure
+
+Add `exception_chains` field to existing event payload:
+
+```typescript
+{
+  // ... existing event fields ...
+  exception_chains?: ExceptionChain[]  // NEW, optional
+}
+
+type ExceptionChain = {
+  exceptions: Exception[]  // [root, cause1, cause2, ...] - reverse chronological
+}
+
+type Exception = {
+  type: string            // e.g., "TypeError", "ValueError"
+  value: string           // error message
+  module?: string         // where exception is defined
+  mechanism: {
+    type: string          // "generic", "promise_rejection", "panic"
+    handled: boolean      // true if caught, false if unhandled
+    source?: string       // "tool_execution", "session_operation"
+  }
+  stacktrace?: {
+    frames: Frame[]       // [oldest, ..., newest] - reverse chronological
+  }
+}
+
+type Frame = {
+  filename: string        // relative/module-based filename
+  abs_path?: string       // absolute path
+  function: string        // function name or "<anonymous>"
+  module?: string         // language-specific module
+  lineno?: number         // line number
+  colno?: number          // column number
+  in_app: boolean         // true = user code, false = library code
+}
+```
+
+**Limits** (match Sentry):
+
+- Max 10 exceptions per chain
+- Max 50 frames per stack trace
+
+**Deferred fields** (add later):
+
+- `pre_context`, `context_line`, `post_context` - source code lines (requires reading files)
+- `vars` - local variables (PII concerns)
+
+## 3. TypeScript SDK
+
+**Core Task**: Create `src/modules/exceptions.ts` with exception capture logic
+
+**Key Functions:**
+
+- `captureExceptionChain(error: Error): ExceptionChain | null`
+- Parse V8 stack traces from `error.stack`
+- Unwrap `Error.cause` chains recursively
+- Detect "in_app": exclude paths containing `/node_modules/`
+
+**Integration Points:**
+
+- Tool call execution (primary) - wrap in try-catch
+- Event publishing error handling
+- Session operations error handling
+
+**Edge Cases:**
+
+- Unhandled promise rejections - set `mechanism.type = "promise_rejection"`, `handled = false`
+- Errors without stack traces - still capture type/value, omit stacktrace
+- Non-Error objects thrown - convert to string, set type = "NonError"
+
+## 4. Python SDK
+
+**Core Task**: Create `src/mcpcat/modules/exceptions.py` with exception capture logic
+
+**Key Functions:**
+
+- `capture_exception_chain(exc: BaseException) -> dict | None`
+- Use `traceback.extract_tb()` to get frames
+- Handle `__cause__` (explicit) and `__context__` (implicit) chains recursively
+- Detect "in_app": exclude paths containing `/site-packages/` or `/dist-packages/`
+
+**Integration Points:**
+
+- Tool execution wrappers (primary)
+- FastMCP monkey patches
+- Session operations error handling
+
+**Edge Cases:**
+
+- ExceptionGroups (Python 3.11+) - leave TODO, treat as single exception for now
+- Exceptions without `__traceback__` - still capture type/value, omit stacktrace
+- Circular exception references - track seen exception IDs to prevent loops
+
+## 5. Go SDK
+
+**Core Task**: Create `internal/exceptions/capture.go` with exception capture logic
+
+**Key Functions:**
+
+- `CaptureException(err error) *ExceptionChain` - for errors
+- `CapturePanic(recovered interface{}) *ExceptionChain` - for panics
+- Use `runtime.Stack()` to capture current stack trace
+- Unwrap errors via `errors.Unwrap()` recursively
+- Detect "in_app": compare package path to module name from go.mod
+
+**Integration Points:**
+
+- Tool execution hooks (primary) - wrap with defer/recover for panics, check returned errors
+- Session operations error handling
+
+**Edge Cases:**
+
+- Errors without stack traces - capture stack at handling point (limitation: not origin point)
+- Panics with non-error values - convert to string via `fmt.Sprintf("%v", recovered)`
+- Goroutine panics - only capture in goroutine with recovery (can't catch across goroutines)
+
+**Future Consideration** (leave TODO):
+
+- Detect error libraries like `pkg/errors` that preserve stack traces at origin
+
+## 6. Future Work (Not in Scope)
+
+- **Source code context**: Capture actual source lines (requires file I/O, privacy concerns)
+- **Local variables**: Capture frame variable values (PII concerns, needs redaction integration)
+- **Configuration options**: Enable/disable, sampling, max depths, etc.
+- **Manual capture API**: `mcpcat.captureException(error, options)`
+- **Python ExceptionGroups**: Proper handling of multiple exceptions
+- **Go error origin stacks**: Use library-provided stacks instead of capture-point stacks
+
+## 7. Implementation Notes
+
+**TypeScript**:
+
+- V8 stack format: `at functionName (filename:line:col)`
+- Multiple format variants between Chrome/Node.js
+
+**Python**:
+
+- Check `__cause__` before `__context__`
+- Normalize paths for site-packages detection (venv, user, system)
+- Use `os.path.normpath()` for path comparison
+
+**Go**:
+
+- Read module name from go.mod at initialization, cache for "in_app" detection
+- Stack format: function signature on one line, file:line on next
+- Panics vs errors need different handling (panics use defer/recover)
+
+---
+
+This plan provides the architectural decisions and integration points. Senior engineers will determine implementation details appropriate to each language's idioms.