Skip to content

Commit 7ccffc4

Browse files
committed
feat: add structured error tracking with stack trace parsing
1 parent 5e36d5f commit 7ccffc4

File tree

11 files changed

+2492
-20
lines changed

11 files changed

+2492
-20
lines changed

plans/issue-tracking.md

Lines changed: 234 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,234 @@
1+
# MCPcat Issue Tracking Implementation Plan
2+
3+
## Context & Background
4+
5+
### What is Issue Tracking?
6+
7+
Issue tracking groups similar errors together so users can triage and resolve problems efficiently. Instead of viewing thousands of individual error events, users see dozens of unique "Issues" - each representing a distinct problem in their code.
8+
9+
For example, if a tool fails with "Cannot read property 'x' of undefined" 500 times, that creates one Issue with 500 occurrences, not 500 separate events to investigate.
10+
11+
### Why We're Building This
12+
13+
MCPcat currently tracks tool usage analytics but doesn't provide error debugging capabilities. Users need to:
14+
15+
- Identify which errors are occurring in their MCP servers
16+
- Understand error frequency and impact
17+
- See stack traces and context to debug issues
18+
- Track error resolution over time
19+
20+
This feature makes MCPcat a complete observability platform for MCP servers - both analytics and error monitoring.
21+
22+
### Architecture Approach
23+
24+
We're following Sentry's proven approach to error grouping:
25+
26+
**SDKs Capture Raw Data:**
27+
28+
- Exception chains (root error + all causes)
29+
- Stack traces with frame details
30+
- Basic "in_app" detection (user code vs library code)
31+
- No fingerprinting or grouping logic
32+
33+
**Backend Handles Grouping:**
34+
35+
- Applies multiple fingerprinting strategies
36+
- Generates hashes from exception types and stack frames
37+
- Matches to existing Issues or creates new ones
38+
- Manages Issue lifecycle (open, resolved, ignored)
39+
40+
This division of responsibility keeps SDKs lightweight while allowing backend flexibility to improve grouping algorithms without SDK updates.
41+
42+
### Key Architecture Decisions
43+
44+
1. **Automatic Capture**: No user configuration required - errors are captured transparently when they occur
45+
2. **Additive Integration**: New `exception_chains` field added to existing events, no breaking changes
46+
3. **Language Parity**: Same JSON schema across TypeScript, Python, and Go for backend compatibility
47+
4. **Privacy First**: No source code content or local variables initially (can add with opt-in later)
48+
5. **Backend Fingerprinting**: SDKs send raw data, backend determines how to group (allows algorithm improvements)
49+
50+
### Learning Resources
51+
52+
**Sentry Documentation:**
53+
54+
- [Error Grouping Overview](https://docs.sentry.io/concepts/data-management/event-grouping/)
55+
- [Default Grouping Algorithms](https://docs.sentry.io/concepts/data-management/event-grouping/#default-error-grouping-algorithms)
56+
- [Event Payload Structure](https://develop.sentry.dev/sdk/event-payloads/)
57+
- [Exception Interface](https://develop.sentry.dev/sdk/event-payloads/exception/)
58+
- [Stack Trace Interface](https://develop.sentry.dev/sdk/event-payloads/stacktrace/)
59+
60+
**Sentry SDK Source Code** (reference implementations):
61+
62+
- [Python SDK - Exception Capture](https://github.com/getsentry/sentry-python/blob/master/sentry_sdk/integrations/excepthook.py)
63+
- [JavaScript SDK - Stack Trace Parsing](https://github.com/getsentry/sentry-javascript/tree/develop/packages/browser/src/stack-parsers.ts)
64+
65+
**Research Document:**
66+
67+
- See `plans/mcpcat_error_tracing_research.md` for detailed analysis of Sentry's approach
68+
69+
---
70+
71+
## 1. Overview
72+
73+
Add automatic exception tracking to MCPcat SDKs. When errors occur, SDKs will capture exception chains with stack traces and send them to the backend as a new `exception_chains` field on existing events. Backend will handle fingerprinting and grouping into Issues.
74+
75+
**Key Principles:**
76+
77+
- Automatic capture, no configuration
78+
- Additive only - no breaking changes
79+
- Same JSON schema across all languages
80+
- SDKs capture raw data, backend does fingerprinting
81+
82+
## 2. Data Structure
83+
84+
Add `exception_chains` field to existing event payload:
85+
86+
```typescript
87+
{
88+
// ... existing event fields ...
89+
exception_chains?: ExceptionChain[] // NEW, optional
90+
}
91+
92+
type ExceptionChain = {
93+
exceptions: Exception[] // [root, cause1, cause2, ...] - reverse chronological
94+
}
95+
96+
type Exception = {
97+
type: string // e.g., "TypeError", "ValueError"
98+
value: string // error message
99+
module?: string // where exception is defined
100+
mechanism: {
101+
type: string // "generic", "promise_rejection", "panic"
102+
handled: boolean // true if caught, false if unhandled
103+
source?: string // "tool_execution", "session_operation"
104+
}
105+
stacktrace?: {
106+
frames: Frame[] // [oldest, ..., newest] - reverse chronological
107+
}
108+
}
109+
110+
type Frame = {
111+
filename: string // relative/module-based filename
112+
abs_path?: string // absolute path
113+
function: string // function name or "<anonymous>"
114+
module?: string // language-specific module
115+
lineno?: number // line number
116+
colno?: number // column number
117+
in_app: boolean // true = user code, false = library code
118+
}
119+
```
120+
121+
**Limits** (match Sentry):
122+
123+
- Max 10 exceptions per chain
124+
- Max 50 frames per stack trace
125+
126+
**Deferred fields** (add later):
127+
128+
- `pre_context`, `context_line`, `post_context` - source code lines (requires reading files)
129+
- `vars` - local variables (PII concerns)
130+
131+
## 3. TypeScript SDK
132+
133+
**Core Task**: Create `src/modules/exceptions.ts` with exception capture logic
134+
135+
**Key Functions:**
136+
137+
- `captureExceptionChain(error: Error): ExceptionChain | null`
138+
- Parse V8 stack traces from `error.stack`
139+
- Unwrap `Error.cause` chains recursively
140+
- Detect "in_app": exclude paths containing `/node_modules/`
141+
142+
**Integration Points:**
143+
144+
- Tool call execution (primary) - wrap in try-catch
145+
- Event publishing error handling
146+
- Session operations error handling
147+
148+
**Edge Cases:**
149+
150+
- Unhandled promise rejections - set `mechanism.type = "promise_rejection"`, `handled = false`
151+
- Errors without stack traces - still capture type/value, omit stacktrace
152+
- Non-Error objects thrown - convert to string, set type = "NonError"
153+
154+
## 4. Python SDK
155+
156+
**Core Task**: Create `src/mcpcat/modules/exceptions.py` with exception capture logic
157+
158+
**Key Functions:**
159+
160+
- `capture_exception_chain(exc: BaseException) -> dict | None`
161+
- Use `traceback.extract_tb()` to get frames
162+
- Handle `__cause__` (explicit) and `__context__` (implicit) chains recursively
163+
- Detect "in_app": exclude paths containing `/site-packages/` or `/dist-packages/`
164+
165+
**Integration Points:**
166+
167+
- Tool execution wrappers (primary)
168+
- FastMCP monkey patches
169+
- Session operations error handling
170+
171+
**Edge Cases:**
172+
173+
- ExceptionGroups (Python 3.11+) - leave TODO, treat as single exception for now
174+
- Exceptions without `__traceback__` - still capture type/value, omit stacktrace
175+
- Circular exception references - track seen exception IDs to prevent loops
176+
177+
## 5. Go SDK
178+
179+
**Core Task**: Create `internal/exceptions/capture.go` with exception capture logic
180+
181+
**Key Functions:**
182+
183+
- `CaptureException(err error) *ExceptionChain` - for errors
184+
- `CapturePanic(recovered interface{}) *ExceptionChain` - for panics
185+
- Use `runtime.Stack()` to capture current stack trace
186+
- Unwrap errors via `errors.Unwrap()` recursively
187+
- Detect "in_app": compare package path to module name from go.mod
188+
189+
**Integration Points:**
190+
191+
- Tool execution hooks (primary) - wrap with defer/recover for panics, check returned errors
192+
- Session operations error handling
193+
194+
**Edge Cases:**
195+
196+
- Errors without stack traces - capture stack at handling point (limitation: not origin point)
197+
- Panics with non-error values - convert to string via `fmt.Sprintf("%v", recovered)`
198+
- Goroutine panics - only capture in goroutine with recovery (can't catch across goroutines)
199+
200+
**Future Consideration** (leave TODO):
201+
202+
- Detect error libraries like `pkg/errors` that preserve stack traces at origin
203+
204+
## 6. Future Work (Not in Scope)
205+
206+
- **Source code context**: Capture actual source lines (requires file I/O, privacy concerns)
207+
- **Local variables**: Capture frame variable values (PII concerns, needs redaction integration)
208+
- **Configuration options**: Enable/disable, sampling, max depths, etc.
209+
- **Manual capture API**: `mcpcat.captureException(error, options)`
210+
- **Python ExceptionGroups**: Proper handling of multiple exceptions
211+
- **Go error origin stacks**: Use library-provided stacks instead of capture-point stacks
212+
213+
## 7. Implementation Notes
214+
215+
**TypeScript**:
216+
217+
- V8 stack format: `at functionName (filename:line:col)`
218+
- Multiple format variants between Chrome/Node.js
219+
220+
**Python**:
221+
222+
- Check `__cause__` before `__context__`
223+
- Normalize paths for site-packages detection (venv, user, system)
224+
- Use `os.path.normpath()` for path comparison
225+
226+
**Go**:
227+
228+
- Read module name from go.mod at initialization, cache for "in_app" detection
229+
- Stack format: function signature on one line, file:line on next
230+
- Panics vs errors need different handling (panics use defer/recover)
231+
232+
---
233+
234+
This plan provides the architectural decisions and integration points. Senior engineers will determine implementation details appropriate to each language's idioms.

0 commit comments

Comments
 (0)