|
| 1 | +# MCPcat Issue Tracking Implementation Plan |
| 2 | + |
| 3 | +## Context & Background |
| 4 | + |
| 5 | +### What is Issue Tracking? |
| 6 | + |
| 7 | +Issue tracking groups similar errors together so users can triage and resolve problems efficiently. Instead of viewing thousands of individual error events, users see dozens of unique "Issues" - each representing a distinct problem in their code. |
| 8 | + |
| 9 | +For example, if a tool fails with "Cannot read property 'x' of undefined" 500 times, that creates one Issue with 500 occurrences, not 500 separate events to investigate. |
| 10 | + |
| 11 | +### Why We're Building This |
| 12 | + |
| 13 | +MCPcat currently tracks tool usage analytics but doesn't provide error debugging capabilities. Users need to: |
| 14 | + |
| 15 | +- Identify which errors are occurring in their MCP servers |
| 16 | +- Understand error frequency and impact |
| 17 | +- See stack traces and context to debug issues |
| 18 | +- Track error resolution over time |
| 19 | + |
| 20 | +This feature makes MCPcat a complete observability platform for MCP servers - both analytics and error monitoring. |
| 21 | + |
| 22 | +### Architecture Approach |
| 23 | + |
| 24 | +We're following Sentry's proven approach to error grouping: |
| 25 | + |
| 26 | +**SDKs Capture Raw Data:** |
| 27 | + |
| 28 | +- Exception chains (root error + all causes) |
| 29 | +- Stack traces with frame details |
| 30 | +- Basic "in_app" detection (user code vs library code) |
| 31 | +- No fingerprinting or grouping logic |
| 32 | + |
| 33 | +**Backend Handles Grouping:** |
| 34 | + |
| 35 | +- Applies multiple fingerprinting strategies |
| 36 | +- Generates hashes from exception types and stack frames |
| 37 | +- Matches to existing Issues or creates new ones |
| 38 | +- Manages Issue lifecycle (open, resolved, ignored) |
| 39 | + |
| 40 | +This division of responsibility keeps SDKs lightweight while allowing backend flexibility to improve grouping algorithms without SDK updates. |
| 41 | + |
| 42 | +### Key Architecture Decisions |
| 43 | + |
| 44 | +1. **Automatic Capture**: No user configuration required - errors are captured transparently when they occur |
| 45 | +2. **Additive Integration**: New `exception_chains` field added to existing events, no breaking changes |
| 46 | +3. **Language Parity**: Same JSON schema across TypeScript, Python, and Go for backend compatibility |
| 47 | +4. **Privacy First**: No source code content or local variables initially (can add with opt-in later) |
| 48 | +5. **Backend Fingerprinting**: SDKs send raw data, backend determines how to group (allows algorithm improvements) |
| 49 | + |
| 50 | +### Learning Resources |
| 51 | + |
| 52 | +**Sentry Documentation:** |
| 53 | + |
| 54 | +- [Error Grouping Overview](https://docs.sentry.io/concepts/data-management/event-grouping/) |
| 55 | +- [Default Grouping Algorithms](https://docs.sentry.io/concepts/data-management/event-grouping/#default-error-grouping-algorithms) |
| 56 | +- [Event Payload Structure](https://develop.sentry.dev/sdk/event-payloads/) |
| 57 | +- [Exception Interface](https://develop.sentry.dev/sdk/event-payloads/exception/) |
| 58 | +- [Stack Trace Interface](https://develop.sentry.dev/sdk/event-payloads/stacktrace/) |
| 59 | + |
| 60 | +**Sentry SDK Source Code** (reference implementations): |
| 61 | + |
| 62 | +- [Python SDK - Exception Capture](https://github.com/getsentry/sentry-python/blob/master/sentry_sdk/integrations/excepthook.py) |
| 63 | +- [JavaScript SDK - Stack Trace Parsing](https://github.com/getsentry/sentry-javascript/tree/develop/packages/browser/src/stack-parsers.ts) |
| 64 | + |
| 65 | +**Research Document:** |
| 66 | + |
| 67 | +- See `plans/mcpcat_error_tracing_research.md` for detailed analysis of Sentry's approach |
| 68 | + |
| 69 | +--- |
| 70 | + |
| 71 | +## 1. Overview |
| 72 | + |
| 73 | +Add automatic exception tracking to MCPcat SDKs. When errors occur, SDKs will capture exception chains with stack traces and send them to the backend as a new `exception_chains` field on existing events. Backend will handle fingerprinting and grouping into Issues. |
| 74 | + |
| 75 | +**Key Principles:** |
| 76 | + |
| 77 | +- Automatic capture, no configuration |
| 78 | +- Additive only - no breaking changes |
| 79 | +- Same JSON schema across all languages |
| 80 | +- SDKs capture raw data, backend does fingerprinting |
| 81 | + |
| 82 | +## 2. Data Structure |
| 83 | + |
| 84 | +Add `exception_chains` field to existing event payload: |
| 85 | + |
| 86 | +```typescript |
| 87 | +{ |
| 88 | + // ... existing event fields ... |
| 89 | + exception_chains?: ExceptionChain[] // NEW, optional |
| 90 | +} |
| 91 | + |
| 92 | +type ExceptionChain = { |
| 93 | + exceptions: Exception[] // [root, cause1, cause2, ...] - reverse chronological |
| 94 | +} |
| 95 | + |
| 96 | +type Exception = { |
| 97 | + type: string // e.g., "TypeError", "ValueError" |
| 98 | + value: string // error message |
| 99 | + module?: string // where exception is defined |
| 100 | + mechanism: { |
| 101 | + type: string // "generic", "promise_rejection", "panic" |
| 102 | + handled: boolean // true if caught, false if unhandled |
| 103 | + source?: string // "tool_execution", "session_operation" |
| 104 | + } |
| 105 | + stacktrace?: { |
| 106 | + frames: Frame[] // [oldest, ..., newest] - reverse chronological |
| 107 | + } |
| 108 | +} |
| 109 | + |
| 110 | +type Frame = { |
| 111 | + filename: string // relative/module-based filename |
| 112 | + abs_path?: string // absolute path |
| 113 | + function: string // function name or "<anonymous>" |
| 114 | + module?: string // language-specific module |
| 115 | + lineno?: number // line number |
| 116 | + colno?: number // column number |
| 117 | + in_app: boolean // true = user code, false = library code |
| 118 | +} |
| 119 | +``` |
| 120 | +
|
| 121 | +**Limits** (match Sentry): |
| 122 | +
|
| 123 | +- Max 10 exceptions per chain |
| 124 | +- Max 50 frames per stack trace |
| 125 | +
|
| 126 | +**Deferred fields** (add later): |
| 127 | +
|
| 128 | +- `pre_context`, `context_line`, `post_context` - source code lines (requires reading files) |
| 129 | +- `vars` - local variables (PII concerns) |
| 130 | +
|
| 131 | +## 3. TypeScript SDK |
| 132 | +
|
| 133 | +**Core Task**: Create `src/modules/exceptions.ts` with exception capture logic |
| 134 | +
|
| 135 | +**Key Functions:** |
| 136 | +
|
| 137 | +- `captureExceptionChain(error: Error): ExceptionChain | null` |
| 138 | +- Parse V8 stack traces from `error.stack` |
| 139 | +- Unwrap `Error.cause` chains recursively |
| 140 | +- Detect "in_app": exclude paths containing `/node_modules/` |
| 141 | +
|
| 142 | +**Integration Points:** |
| 143 | +
|
| 144 | +- Tool call execution (primary) - wrap in try-catch |
| 145 | +- Event publishing error handling |
| 146 | +- Session operations error handling |
| 147 | +
|
| 148 | +**Edge Cases:** |
| 149 | +
|
| 150 | +- Unhandled promise rejections - set `mechanism.type = "promise_rejection"`, `handled = false` |
| 151 | +- Errors without stack traces - still capture type/value, omit stacktrace |
| 152 | +- Non-Error objects thrown - convert to string, set type = "NonError" |
| 153 | +
|
| 154 | +## 4. Python SDK |
| 155 | +
|
| 156 | +**Core Task**: Create `src/mcpcat/modules/exceptions.py` with exception capture logic |
| 157 | +
|
| 158 | +**Key Functions:** |
| 159 | +
|
| 160 | +- `capture_exception_chain(exc: BaseException) -> dict | None` |
| 161 | +- Use `traceback.extract_tb()` to get frames |
| 162 | +- Handle `__cause__` (explicit) and `__context__` (implicit) chains recursively |
| 163 | +- Detect "in_app": exclude paths containing `/site-packages/` or `/dist-packages/` |
| 164 | +
|
| 165 | +**Integration Points:** |
| 166 | +
|
| 167 | +- Tool execution wrappers (primary) |
| 168 | +- FastMCP monkey patches |
| 169 | +- Session operations error handling |
| 170 | +
|
| 171 | +**Edge Cases:** |
| 172 | +
|
| 173 | +- ExceptionGroups (Python 3.11+) - leave TODO, treat as single exception for now |
| 174 | +- Exceptions without `__traceback__` - still capture type/value, omit stacktrace |
| 175 | +- Circular exception references - track seen exception IDs to prevent loops |
| 176 | +
|
| 177 | +## 5. Go SDK |
| 178 | +
|
| 179 | +**Core Task**: Create `internal/exceptions/capture.go` with exception capture logic |
| 180 | +
|
| 181 | +**Key Functions:** |
| 182 | +
|
| 183 | +- `CaptureException(err error) *ExceptionChain` - for errors |
| 184 | +- `CapturePanic(recovered interface{}) *ExceptionChain` - for panics |
| 185 | +- Use `runtime.Stack()` to capture current stack trace |
| 186 | +- Unwrap errors via `errors.Unwrap()` recursively |
| 187 | +- Detect "in_app": compare package path to module name from go.mod |
| 188 | +
|
| 189 | +**Integration Points:** |
| 190 | +
|
| 191 | +- Tool execution hooks (primary) - wrap with defer/recover for panics, check returned errors |
| 192 | +- Session operations error handling |
| 193 | +
|
| 194 | +**Edge Cases:** |
| 195 | +
|
| 196 | +- Errors without stack traces - capture stack at handling point (limitation: not origin point) |
| 197 | +- Panics with non-error values - convert to string via `fmt.Sprintf("%v", recovered)` |
| 198 | +- Goroutine panics - only capture in goroutine with recovery (can't catch across goroutines) |
| 199 | +
|
| 200 | +**Future Consideration** (leave TODO): |
| 201 | +
|
| 202 | +- Detect error libraries like `pkg/errors` that preserve stack traces at origin |
| 203 | +
|
| 204 | +## 6. Future Work (Not in Scope) |
| 205 | +
|
| 206 | +- **Source code context**: Capture actual source lines (requires file I/O, privacy concerns) |
| 207 | +- **Local variables**: Capture frame variable values (PII concerns, needs redaction integration) |
| 208 | +- **Configuration options**: Enable/disable, sampling, max depths, etc. |
| 209 | +- **Manual capture API**: `mcpcat.captureException(error, options)` |
| 210 | +- **Python ExceptionGroups**: Proper handling of multiple exceptions |
| 211 | +- **Go error origin stacks**: Use library-provided stacks instead of capture-point stacks |
| 212 | +
|
| 213 | +## 7. Implementation Notes |
| 214 | +
|
| 215 | +**TypeScript**: |
| 216 | +
|
| 217 | +- V8 stack format: `at functionName (filename:line:col)` |
| 218 | +- Multiple format variants between Chrome/Node.js |
| 219 | +
|
| 220 | +**Python**: |
| 221 | +
|
| 222 | +- Check `__cause__` before `__context__` |
| 223 | +- Normalize paths for site-packages detection (venv, user, system) |
| 224 | +- Use `os.path.normpath()` for path comparison |
| 225 | +
|
| 226 | +**Go**: |
| 227 | +
|
| 228 | +- Read module name from go.mod at initialization, cache for "in_app" detection |
| 229 | +- Stack format: function signature on one line, file:line on next |
| 230 | +- Panics vs errors need different handling (panics use defer/recover) |
| 231 | +
|
| 232 | +--- |
| 233 | +
|
| 234 | +This plan provides the architectural decisions and integration points. Senior engineers will determine implementation details appropriate to each language's idioms. |
0 commit comments