Skip to content

Conversation

@NathanFlurry
Copy link
Member

No description provided.

@vercel
Copy link

vercel bot commented Nov 7, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
rivetkit-serverless Ready Ready Preview Comment Nov 13, 2025 2:48am
3 Skipped Deployments
Project Deployment Preview Comments Updated (UTC)
rivet-cloud Ignored Ignored Nov 13, 2025 2:48am
rivet-inspector Ignored Ignored Preview Nov 13, 2025 2:48am
rivet-site Ignored Ignored Preview Nov 13, 2025 2:48am

Copy link
Member Author

NathanFlurry commented Nov 7, 2025

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link

claude bot commented Nov 7, 2025

PR Review: Close WebSockets on Out-of-Sequence Messages

Summary

This PR enhances hibernatable WebSocket reliability by detecting and closing connections when out-of-sequence messages are received (regressed, duplicate, or skipped message indices). The changes are generally well-implemented with good attention to detail.

Positive Aspects

  1. Improved Error Handling: The change properly detects three types of sequence violations:

    • Regressed indices (going backwards)
    • Duplicate indices
    • Skipped indices (gaps)
  2. Clear Logging: The structured logging is excellent, providing all necessary context for debugging:

    • Previous, expected, and received indices
    • Sequence type classification
    • Gap calculation for skipped messages
  3. Proper Close Codes: Using WebSocket close code 1008 (Policy Violation) is appropriate for this scenario per RFC 6455.

  4. Early Return Pattern: The code correctly returns early after closing the WebSocket to prevent processing invalid messages.

  5. Code Organization: Moving the onMessage handler call to the end (line 711) is a good architectural decision, ensuring all validation happens before processing.

Potential Issues & Suggestions

Critical: WebSocket Cleanup

Issue: When closing the WebSocket with wsContext.close(1008, closeReason) at line 672, the connection is terminated but the cleanup logic in the close event handler (lines 714-728) may not execute immediately or reliably.

Risk: This could leave stale entries in:

  • this.#hibernatableWebSocketAckQueue
  • actorHandler.actor[PERSIST_SYMBOL].hibernatableWebSockets

Recommendation: Consider explicitly calling cleanup after closing:

// Close the WebSocket and skip processing
wsContext.close(1008, closeReason);
this.#cleanupHibernatableWebSocket(
    actorId,
    requestIdBuf,
    requestId,
    "close",
    undefined
);
return;

Minor: Edge Case - First Message Handling

The code only checks sequence when currentEntry exists (line 633). The first message for a hibernatable WebSocket creates the initial entry without validation. This is likely correct behavior, but worth confirming that rivetMessageIndex should always start at a known value (presumably 0 or 1).

Question: Should there be validation that the first message has an expected starting index?

Documentation

The comment at line 617-620 is good, but could be enhanced to mention the new validation behavior:

// Handle hibernatable WebSockets:
// - Validate message sequence (close on out-of-sequence)
// - Check for out of sequence messages
// - Save msgIndex for WS restoration
// - Queue WS acks

Testing Recommendations

This change would benefit from driver tests covering:

  1. Duplicate message index - Send two messages with same index, verify connection closes
  2. Regressed message index - Send message with lower index than previous, verify connection closes
  3. Skipped message index - Send messages with gap (e.g., 1, 2, 5), verify connection closes
  4. Normal sequence - Send messages 1, 2, 3 in order, verify connection stays open

Consider adding these to /rivetkit-typescript/packages/rivetkit/src/driver-test-suite/tests/.

Performance Consideration

The code now blocks message processing for out-of-sequence messages, which is correct. However, there's no rate limiting on log spam if a buggy client repeatedly sends bad sequences. Consider adding log throttling if this becomes an issue in production.

Security Considerations

✅ The change improves security by preventing potential replay or sequence manipulation attacks on hibernatable WebSockets.

Code Quality

The code follows TypeScript best practices and maintains consistency with the existing codebase style. The structured logging approach aligns with the project's logging patterns.

Recommendation

Approve with minor changes: Address the cleanup concern and consider adding tests. The core logic is sound and improves system reliability.

@pkg-pr-new
Copy link

pkg-pr-new bot commented Nov 7, 2025

More templates

@rivetkit/actor

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/actor@3404

@rivetkit/cloudflare-workers

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/cloudflare-workers@3404

@rivetkit/core

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/core@3404

@rivetkit/db

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/db@3404

@rivetkit/framework-base

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/framework-base@3404

@rivetkit/next-js

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/next-js@3404

@rivetkit/react

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/react@3404

rivetkit

pnpm add https://pkg.pr.new/rivet-dev/rivet/rivetkit@3404

@rivetkit/sql-loader

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sql-loader@3404

@rivetkit/engine-runner

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner@3404

@rivetkit/engine-runner-protocol

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner-protocol@3404

commit: 775d512

@claude
Copy link

claude bot commented Nov 8, 2025

PR Review: Close WebSockets on Out-of-Sequence Messages

Summary

This PR improves the robustness of hibernatable WebSocket handling by detecting and terminating connections that receive out-of-sequence messages. The changes add proper validation to catch message index regressions, duplicates, and gaps.


Positive Aspects

1. Improved Error Detection
The PR correctly identifies three distinct types of message sequencing issues:

  • Regressed indices (message goes backward)
  • Duplicate indices (same message twice)
  • Skipped indices (gaps in sequence)

2. Clear WebSocket Close Codes
Using distinct close reasons (ws.message_index_regressed, ws.message_index_duplicate, ws.message_index_skip) will help with debugging and monitoring.

3. Proper Control Flow
Moving the wsHandlerPromise.then((x) => x.onMessage?.(event, wsContext)) call to the end (line 711) ensures that messages are only processed after validation passes. The early return on line 673 correctly prevents processing of invalid messages.

4. Better Logging
The warning log now clearly states "closing connection" and includes the closeReason field for better observability.


Issues & Concerns

🔴 Critical: Message Processing Still Occurs After Close

Location: rivetkit-typescript/packages/rivetkit/src/drivers/engine/actor-driver.ts:672-673, 711

Issue: After calling wsContext.close(1008, closeReason) and return on lines 672-673, the message handler at line 711 will still execute for subsequent messages because the event listener isn't removed.

Why This Matters:

  • WebSocket close is asynchronous - the connection doesn't terminate immediately
  • The return statement only exits the current message handler invocation
  • If another message arrives before the close completes, it will hit line 711 and be processed
  • This could allow out-of-sequence messages to still reach the application code

Recommended Fix:

// Option 1: Remove the event listener before closing
websocket.removeEventListener("message", messageHandler);
wsContext.close(1008, closeReason);
return;

// Option 2: Add a flag to prevent further processing
if (this.#closedWebSockets.has(requestId)) {
    return;
}

You'll need to store a reference to the message handler function to use removeEventListener, or maintain a Set of closed WebSocket request IDs.


🟡 Moderate: First Message Handling

Location: rivetkit-typescript/packages/rivetkit/src/drivers/engine/actor-driver.ts:632-683

Issue: When currentEntry doesn't exist (line 632), this is the first message for this WebSocket. The code initializes the queue with the received messageIndex (line 679-682), but doesn't validate that the first message has a valid starting index.

Potential Problems:

  • If hibernation restoration sends a high message index (e.g., resuming from index 1000), there's no validation
  • If the first message arrives with index 0 or 1, both might be valid depending on your protocol

Questions to Consider:

  1. Should the first message always start at a specific index (e.g., 0 or 1)?
  2. When restoring from hibernation, how do you know what the valid starting index should be?
  3. Is there risk of accepting the wrong starting point?

Recommended Action:
Add validation for the initial message index:

} else {
    // First message for this WebSocket
    const expectedStartIndex = 0; // or 1, depending on your protocol
    if (event.rivetMessageIndex !== expectedStartIndex) {
        logger().warn({
            msg: "first message has unexpected index",
            requestId,
            actorId,
            receivedIndex: event.rivetMessageIndex,
            expectedIndex: expectedStartIndex,
        });
        // Decide: accept it, reject it, or validate against hibernatableWs.msgIndex
    }
    this.#hibernatableWebSocketAckQueue.set(requestId, {
        requestIdBuf,
        messageIndex: event.rivetMessageIndex,
    });
}

🟡 Moderate: Race Condition in State Update Order

Location: rivetkit-typescript/packages/rivetkit/src/drivers/engine/actor-driver.ts:677, 685-696

Observation: The code updates currentEntry.messageIndex (line 677) before updating hibernatableWs.msgIndex (line 687). While this might be intentional, it creates a brief window where the two indices are out of sync.

Potential Impact:

  • If another part of the system reads these values between lines 677 and 687, they'll see inconsistent state
  • May not be an issue if the event loop guarantees atomicity, but worth documenting

Recommendation:
Either:

  1. Document why this order is safe, or
  2. Update both in the same order consistently, or
  3. Perform the hibernatableWs update first since it's the source of truth

🟢 Minor: WebSocket Close Code 1008

Location: rivetkit-typescript/packages/rivetkit/src/drivers/engine/actor-driver.ts:672

Observation: The PR uses close code 1008 (Policy Violation). This is semantically correct for protocol violations.

Note: According to RFC 6455:

  • 1008 indicates policy violation - appropriate here
  • 1002 (Protocol Error) could also be valid
  • 1007 (Invalid frame payload data) is for data type mismatches

The choice of 1008 is reasonable, but ensure client-side code handles this appropriately.


Test Coverage Concerns

Missing Test Cases

The existing test file (raw-websocket.ts) doesn't include tests for:

  1. Out-of-sequence message handling (the main feature of this PR)
  2. Hibernatable WebSocket restoration with message indices
  3. Message index validation edge cases

Recommended Tests:

test("should close connection on duplicate message index", async (c) => {
    // Send message with index N, then send another with index N
    // Expect WebSocket to close with code 1008
});

test("should close connection on regressed message index", async (c) => {
    // Send message with index N, then send one with index N-1
    // Expect WebSocket to close with code 1008
});

test("should close connection on skipped message index", async (c) => {
    // Send message with index N, then send one with index N+2
    // Expect WebSocket to close with code 1008
});

test("should accept sequential message indices", async (c) => {
    // Send messages with indices 0, 1, 2, 3...
    // Expect all to be processed normally
});

Note: These tests would require either:

  • Access to inject messages with specific rivetMessageIndex values, or
  • Mocking/stubbing capabilities in the test infrastructure

Performance Considerations

Positive:

  • The validation logic is O(1) - just integer comparisons
  • No impact on hot path performance
  • Logging only occurs on error conditions

Neutral:

  • The additional closeReason and sequenceType variable assignments (lines 638-652) are minimal overhead

Security Considerations

Positive:

  • Closing connections on protocol violations prevents potential exploits
  • Message index validation helps prevent replay attacks or message reordering attacks
  • Using specific close codes helps with security monitoring

Potential Concern:

  • If an attacker can inject messages with specific indices, they could force-close legitimate WebSocket connections (DoS)
  • Ensure message indices are cryptographically signed or otherwise authenticated at a lower layer

Code Quality

Strengths:

  • Clear variable names (closeReason, sequenceType)
  • Good logging with all relevant context
  • Follows existing code patterns
  • Comments explain the intent (lines 617-620)

Minor Style Notes:

  • The if-else-if-else chain (lines 641-652) is clear and readable
  • Consider extracting the close logic into a helper method if this pattern is reused elsewhere

Recommendations Summary

Must Address:

  1. ❗ Fix the race condition where messages can still be processed after wsContext.close() is called

Should Address:
2. Validate the first message index when initializing the ack queue
3. Add test coverage for out-of-sequence message handling
4. Document or fix the state update order between currentEntry and hibernatableWs

Nice to Have:
5. Consider extracting the close logic into a helper method
6. Document the expected message index protocol (0-based vs 1-based, etc.)


Overall Assessment

This is a solid improvement to the hibernatable WebSocket reliability. The core logic is sound, but there's a critical issue with the async nature of WebSocket closing that needs to be addressed. Once the message processing race condition is fixed, this will significantly improve protocol robustness.

Recommendation: Request changes to address the critical issue, then approve.

@claude
Copy link

claude bot commented Nov 8, 2025

Code Review: WebSocket Out-of-Sequence Message Handling

Thank you for this PR! This change adds important safeguards for hibernatable WebSocket message sequencing. Here's my detailed review:

Strengths

  1. Correct Security Posture: Using WebSocket close code 1008 (Policy Violation) is appropriate for protocol violations like out-of-sequence messages. This is the right choice over 1011 (Internal Server Error).

  2. Early Return Pattern: The change correctly returns early after closing the connection on sequence violations, preventing the out-of-sequence message from being processed (actor-driver.ts:673).

  3. Improved Message Ordering: Moving the wsHandlerPromise call to the end (actor-driver.ts:711) ensures all hibernation validation happens before message processing, which is the correct flow.

  4. Better Error Classification: The logic now properly distinguishes between three types of sequence violations:

    • Regressed (message index went backwards)
    • Duplicate (same index twice)
    • Gap/skip (indices jumped forward)
  5. Logging Improvements: Enhanced warning messages now include the closeReason and sequenceType, making debugging easier.

🔍 Concerns & Questions

1. Potential Race Condition (Medium Priority)

// Line 711: Message handler called AFTER validation
wsHandlerPromise.then((x) => x.onMessage?.(event, wsContext));

Issue: If the WebSocket is closed at line 672, there's a potential race where wsHandlerPromise.then() at line 711 could still execute if the promise resolves between the close call and the function return.

Recommendation: Consider checking if the WebSocket is still open before calling the handler:

// After all hibernation logic
if (wsContext.readyState === WebSocket.OPEN) {
    wsHandlerPromise.then((x) => x.onMessage?.(event, wsContext));
}

2. Change in Behavior for Non-Sequential Messages (High Priority - Breaking Change)

Previous behavior: Warned about out-of-sequence messages but still processed them and updated to the highest seen index.

New behavior: Immediately closes the connection on any sequence violation.

Concerns:

  • This is a breaking change that could affect clients experiencing network issues (packet reordering, retransmissions)
  • Are there legitimate scenarios where messages might arrive out of order? (e.g., network layer reordering)
  • Should there be different handling for "regression/duplicate" vs "gap"?
    • Regression/duplicate: likely indicates a serious protocol error → close immediately ✅
    • Gap: might indicate lost messages → could potentially request retransmission instead of closing?

Questions:

  • Has this been tested with real-world network conditions?
  • Is there telemetry to monitor how often this occurs in production?

3. Missing Test Coverage (High Priority)

I couldn't find test cases for the new out-of-sequence detection logic. This is critical functionality that should be tested.

Recommendation: Add tests to verify:

  • WebSocket closes on message index regression
  • WebSocket closes on duplicate message index
  • WebSocket closes on message index gaps
  • Proper close code (1008) and reason strings are used
  • Message handler is NOT called after close

Consider adding to rivetkit-typescript/packages/rivetkit/src/driver-test-suite/tests/ with a fixture actor that can simulate these conditions.

4. Inconsistency in Sequence Validation (Low Priority)

The validation only applies when:

  1. Actor handler exists (actor-driver.ts:622)
  2. Actor has hibernatable WebSocket tracking (actor-driver.ts:629)
  3. Previous entry exists in ack queue (actor-driver.ts:633)

Questions:

  • What happens for the first message (no previous entry)? It bypasses validation entirely.
  • Should we validate that the first message starts at index 0 or 1?
  • What if event.rivetMessageIndex is negative or non-numeric?

5. Type Safety (Low Priority)

invariant(event.rivetMessageIndex, "missing rivetMessageIndex");
// ...
if (event.rivetMessageIndex !== previousIndex + 1) {

The rivetMessageIndex is typed as number? but there's no validation that it's a valid positive integer after the invariant check.

📊 Performance Considerations

The changes appear performance-neutral:

  • No new allocations in the hot path
  • Logic moved but not significantly expanded
  • Early return on errors is actually more efficient

🔒 Security Assessment

Positive: This change improves security by preventing out-of-sequence message processing, which could potentially be exploited to:

  • Cause state inconsistencies in hibernatable actors
  • Bypass message acknowledgment logic
  • Trigger race conditions

Close code rationale: 1008 is appropriate and follows RFC 6455 guidelines for protocol violations.

📝 Recommendations

  1. Critical: Add comprehensive test coverage for out-of-sequence detection
  2. High: Consider if this breaking change needs:
    • Gradual rollout strategy
    • Metrics/monitoring for close events with these reasons
    • Documentation update for client library maintainers
  3. Medium: Add readyState check before calling message handler after validation
  4. Low: Consider adding validation for the initial message index
  5. Low: Document the expected message index behavior and edge cases

🎯 Overall Assessment

The core logic is sound and improves reliability of hibernatable WebSockets. The main concerns are:

  1. Lack of test coverage
  2. Potential breaking change impact on existing clients
  3. Minor race condition possibility

Recommendation: Request tests before merging, and consider adding telemetry to monitor the impact in production.


Great work on improving the robustness of WebSocket message handling! Let me know if you'd like me to help write tests for this functionality.

@claude
Copy link

claude bot commented Nov 9, 2025

Code Review

Thank you for this PR! This is a critical improvement for handling out-of-sequence WebSocket messages in the hibernatable WebSocket system. Here's my detailed review:

✅ Strengths

  1. Improved Security & Correctness: Closing connections on out-of-sequence messages prevents potential state corruption and security issues from duplicate/regressed/skipped messages.

  2. Clear Differentiation: The code now distinguishes between three types of sequence violations with specific close reasons:

    • ws.message_index_regressed - message index went backwards
    • ws.message_index_duplicate - same message index received twice
    • ws.message_index_skip - message index jumped forward (gap)
  3. Better Logging: Enhanced log message clearly indicates the action being taken ("closing connection") and provides detailed diagnostic information.

  4. Proper Event Ordering: Moving the wsHandlerPromise.then(...onMessage) call to the end (line 714) ensures that the hibernation logic and validation complete before message processing, which is correct given the early return on sequence violations.

🔍 Potential Issues & Suggestions

1. Race Condition Risk (Medium Priority)

The current code has moved the onMessage handler invocation to after all validation. However, there's a potential issue:

// Line 675: Close the WebSocket
wsContext.close(1008, closeReason);
return;

// Line 714: This still gets called for valid messages
wsHandlerPromise.then((x) => x.onMessage?.(event, wsContext));

Question: After calling wsContext.close(), is the WebSocket immediately closed, or is there a delay? If there's a delay, could subsequent messages arrive before the connection fully closes, potentially causing the same out-of-sequence message to be processed again?

Suggestion: Consider whether you need to track closed connections to prevent processing messages after a close has been initiated.

2. Test Coverage (High Priority)

I searched for test coverage and did not find any tests specifically for this hibernatable WebSocket message sequencing logic. This is a critical correctness and security feature that should have comprehensive test coverage.

Recommendation: Add tests for:

  • Message index regression (n → n-1)
  • Duplicate message index (n → n)
  • Message index gap (n → n+2)
  • Normal sequential messages (n → n+1)
  • First message initialization (currentEntry is undefined)
  • WebSocket close behavior on sequence violations

3. WebSocket Close Code (Low Priority)

You're using close code 1008 (Policy Violation). This is a reasonable choice, but worth documenting:

  • 1008 is typically used for policy violations
  • Alternative could be 1002 (Protocol Error) since this is essentially a protocol-level sequencing issue

Question: Is there a documented standard for WebSocket close codes in your system? Consider adding a comment or constant to clarify why 1008 was chosen.

4. Removed Behavior Change (Medium Priority - Needs Verification)

The old code had this logic:

// Update to the highest seen index
if (event.rivetMessageIndex > previousIndex) {
    currentEntry.messageIndex = event.rivetMessageIndex;
}

The new code always updates to the received index:

// Update to the next index
currentEntry.messageIndex = event.rivetMessageIndex;

Question: In the old version, if messages arrived out of order but the connection wasn't closed, it would track the highest index seen. Was this "forgiving" behavior intentional, or was it a bug? The new behavior is stricter (correct for sequential processing), but this is a behavioral change worth confirming.

5. Message Processing After Close (Low Priority - Clarification)

At line 714, you're still calling onMessage for all messages, including those that might trigger a close. Since you have an early return at line 676, this should be safe, but it's worth verifying that the return statement properly prevents the code from reaching line 714 in the out-of-sequence case.

Status: ✅ This appears correct on review, but worth double-checking.

📝 Code Quality

  • ✅ Follows TypeScript best practices
  • ✅ Consistent with existing code style
  • ✅ Good use of invariants for critical assertions
  • ✅ Structured logging with appropriate metadata

🔒 Security Considerations

  • ✅ Prevents potential state corruption from out-of-sequence messages
  • ✅ Proper connection termination with meaningful close reasons
  • ⚠️ Consider: Could an attacker deliberately send out-of-sequence messages to cause denial of service by repeatedly closing connections? (Probably fine, but worth considering rate limiting at a higher level)

🎯 Performance Considerations

  • ✅ Minimal overhead - only processes sequence checking for hibernatable WebSockets
  • ✅ Early return prevents unnecessary processing of invalid messages
  • ✅ No obvious performance concerns

📊 Summary

Overall Assessment: This is a solid improvement that enhances the robustness of the hibernatable WebSocket system. The main concern is the lack of test coverage for this critical functionality.

Recommended Actions Before Merge:

  1. Add comprehensive test coverage for message sequencing scenarios
  2. Verify behavior regarding the removed "highest index" tracking
  3. Consider documenting WebSocket close code choices

Optional Improvements:

  • Add constants for WebSocket close codes to improve code readability
  • Consider adding metrics/monitoring for sequence violations to track if this is happening in production

Great work on improving the reliability of this system! 🚀

@claude
Copy link

claude bot commented Nov 9, 2025

PR Review: Close WebSockets on Out of Sequence Messages

Summary

This PR improves the handling of hibernatable WebSocket message sequencing by actively closing connections when out-of-sequence messages are detected, rather than just logging warnings. The change also moves message processing to occur after validation.

Positive Changes

1. Stronger Sequence Validation

The PR changes from a permissive approach (accepting out-of-sequence messages with warnings) to a strict approach (closing connections on sequence violations). This is a good defensive practice for hibernatable WebSockets where message ordering is critical.

2. Clear Error Categorization

The code now properly categorizes sequence violations into three types:

  • ws.message_index_regressed - message index went backwards
  • ws.message_index_duplicate - duplicate message index
  • ws.message_index_skip - gap in message indices

3. Improved Control Flow

Moving the wsHandlerPromise.then((x) => x.onMessage?.(event, wsContext)) call to the end (line 714) ensures that:

  • Validation happens before message processing
  • Out-of-sequence messages are never processed
  • The comment accurately explains the reasoning

4. Better Log Message

The warning message now includes "closing connection" which makes the action being taken clear.

Issues & Concerns

1. Logical Issue: Index Update Should Not Happen on Error ⚠️

Lines 679-680:

// Update to the next index
currentEntry.messageIndex = event.rivetMessageIndex;

This updates the message index even for out-of-sequence messages, which happens AFTER the sequence check but BEFORE the early return. However, looking more carefully at the control flow:

  • Line 640-677: Check for out-of-sequence and early return if violated
  • Line 679-680: Only reached if sequence is valid

Actually, this is correct - the early return on line 676 prevents the update from happening for invalid messages. My initial concern was unfounded.

2. WebSocket Close Code Choice 🤔

Line 675:

wsContext.close(1008, closeReason);

Using close code 1008 (Policy Violation) is reasonable, but you might also consider:

  • 1002 (Protocol Error) - since this is a protocol-level sequencing violation
  • 1011 (Internal Server Error) - used elsewhere in the file (line 603)

The choice of 1008 is defensible, but it's worth confirming this aligns with your WebSocket protocol design.

3. Race Condition Concern ⚠️

Critical Issue: Moving wsHandlerPromise.then() from line 617 to line 714 creates a potential race condition.

Before: Message handler was invoked immediately (line 617), then hibernation logic ran.

After: Hibernation logic runs first, potentially closes the connection, THEN attempts to invoke the message handler.

The problem: If wsContext.close() is called on line 675, the connection is closing/closed, but then line 714 still tries to process the message through the handler. Depending on the handler implementation, this could:

  • Process invalid messages in a closing/closed state
  • Cause errors in the message handler
  • Lead to unexpected behavior

Recommendation: The early return on line 676 should prevent this, but verify that:

  1. The return statement properly skips the handler invocation on line 714
  2. Add a comment explaining why the early return is critical

Actually, looking again at the code structure - the return on line 676 IS inside the event listener callback, so it will prevent line 714 from executing. This is correct. However, adding a comment would make this more explicit.

4. Duplicate Message Index Handling 🤔

For duplicate message indices (line 647-651), closing the connection might be overly strict. Duplicates can occur legitimately due to:

  • Network retransmissions
  • Client-side retry logic
  • Load balancer behavior

Consideration: Should duplicate messages be:

  • Ignored (skip processing but don't close)?
  • Deduped and processed once?
  • Treated as errors (current approach)?

This depends on your protocol requirements. If idempotency is expected at a higher layer, the current approach might be too aggressive.

5. Message Index Progression After Close 🔍

The previous code updated messageIndex to the highest seen value (lines in old code):

// Update to the highest seen index
if (event.rivetMessageIndex > previousIndex) {
    currentEntry.messageIndex = event.rivetMessageIndex;
}

The new code updates it to the current (potentially out-of-sequence) index if valid, but only if it's exactly previousIndex + 1.

This is more correct for strict sequencing, but confirm this won't cause issues if:

  • The connection is being closed anyway
  • There are any cleanup paths that check the index

Minor Observations

6. Code Organization

The reordering of code (moving the msgIndex update from lines 634-644 to 688-699) improves readability by grouping related operations together.

7. Comment Accuracy

The updated comment on line 713 accurately describes the intent: "Process the message after all hibernation logic and validation in case the message is out of order"

Though the wording could be clearer: "...in case the message was out of order" → but we've already validated it's NOT out of order by this point. Consider: "Process the message after validation to ensure out-of-sequence messages are not processed"

Testing Recommendations

Given the criticality of this change, ensure you have test coverage for:

  1. Normal sequence: Messages arriving in order (1, 2, 3, 4...)
  2. Regression: Messages arriving with decreasing indices (3, 2)
  3. Duplicates: Same message index twice (2, 2)
  4. Gaps: Skipped indices (1, 2, 4)
  5. First message: Behavior when currentEntry doesn't exist yet
  6. Connection closure: Verify closed connections don't process messages
  7. Race conditions: Rapid message arrival during connection closing

Recommendation

Overall: Approve with minor suggestions

The core change is solid and improves system reliability by enforcing strict message ordering. The main concerns are:

  1. Consider whether duplicate message detection should close the connection or just skip processing
  2. Verify close code 1008 is appropriate for your protocol
  3. Add a test suite covering the scenarios above
  4. Consider improving the comment on line 713

This is a good defensive improvement for hibernatable WebSocket reliability.

@claude
Copy link

claude bot commented Nov 13, 2025

Code Review

Summary

This PR improves the handling of out-of-sequence WebSocket messages in the hibernatable WebSocket system by immediately closing connections when sequence violations are detected, rather than trying to handle them gracefully.

Positive Changes ✅

  1. Improved Security Posture: Closing connections on sequence violations (regression, duplicates, gaps) prevents potential protocol attacks or corrupted state.

  2. Better Code Organization: Moving the onMessage handler to the end of the function (line 714) ensures all validation and hibernation logic completes before processing the message. This is a critical improvement.

  3. Clearer Error Handling: The three distinct close reasons (ws.message_index_regressed, ws.message_index_duplicate, ws.message_index_skip) provide better observability for debugging.

  4. Simplified Logic: Removed the conditional logic that tried to handle out-of-sequence messages by taking the "highest seen index" - the new approach is more deterministic.

Issues and Concerns 🔍

1. Critical Race Condition (High Priority)

The original code called onMessage immediately at line 616 (before validation), while the new code moves it to line 714 (after validation). However, there's a potential race condition:

// Line 616: invariants check
invariant(event.rivetRequestId, "missing rivetRequestId");
invariant(event.rivetMessageIndex, "missing rivetMessageIndex");

// Lines 624-711: Hibernation logic that may close the connection

// Line 714: Process message AFTER potential close
wsHandlerPromise.then((x) => x.onMessage?.(event, wsContext));

Problem: If the connection is closed at line 675, the onMessage handler at line 714 will still execute because the return statement only exits the message event listener, not the entire function flow. The promise chain at line 714 is already queued and will execute even after wsContext.close() is called.

Recommendation: Add a check before calling onMessage:

if (wsContext.readyState !== WebSocket.CLOSED && wsContext.readyState !== WebSocket.CLOSING) {
    wsHandlerPromise.then((x) => x.onMessage?.(event, wsContext));
}

Or use a flag to track whether the message should be processed.

2. Inconsistent State After Close (Medium Priority)

When closing on line 675, the hibernatable WebSocket tracking state is not cleaned up. The entry remains in:

  • this.#hibernatableWebSocketAckQueue
  • hibernatableWs.msgIndex is not updated
  • hibernatableWs.lastSeenTimestamp is not updated

However, the close event handler (line 717) should trigger cleanup via #cleanupHibernatableWebSocket. Verify that this cleanup happens consistently.

3. WebSocket Close Code Usage (Low Priority)

Using close code 1008 (Policy Violation) is reasonable, but consider:

  • 1003 (Unsupported Data) - for duplicate/regression
  • 1002 (Protocol Error) - for sequence violations

The current 1008 is acceptable but might be too generic. The close reason strings are good for debugging.

4. Missing First Message Handling (Medium Priority)

When currentEntry doesn't exist (line 681-686), the code initializes tracking with messageIndex: event.rivetMessageIndex.

Issue: If the first message received has rivetMessageIndex > 0, this suggests messages were missed, but no validation occurs. Should the first message always be index 0 or 1?

Recommendation: Add validation for the first message:

} else {
    // First message for this connection
    if (event.rivetMessageIndex !== 0) {
        logger().warn({
            msg: "first hibernatable websocket message has non-zero index",
            requestId,
            actorId,
            firstMessageIndex: event.rivetMessageIndex,
        });
    }
    this.#hibernatableWebSocketAckQueue.set(requestId, {
        requestIdBuf,
        messageIndex: event.rivetMessageIndex,
    });
}

5. Timing Concern (Low Priority)

The hibernatableWs.lastSeenTimestamp update (line 691) now happens after validation, which is correct. However, ensure that hibernation/eviction logic doesn't inadvertently close a connection that has received an out-of-sequence message but hasn't been validated yet. The current implementation looks safe since validation happens synchronously.

Testing Recommendations 🧪

Add tests for:

  1. Out-of-sequence message detection (regression, duplicate, gap)
  2. Connection closure on sequence violations
  3. No message processing after connection closure
  4. First message with non-zero index
  5. Race conditions between message validation and hibernation

Example test case locations:

  • rivetkit-typescript/packages/rivetkit/src/driver-test-suite/tests/
  • Consider creating driver-test-hibernatable-websocket-sequence.ts

Performance Considerations ⚡

No significant performance concerns. The changes:

  • Add minimal conditional logic
  • Remove the "take highest index" conditional (slight improvement)
  • Early return on validation failure (good for performance)

Security Considerations 🔒

Positive: Closing on sequence violations prevents:

  • Replay attacks
  • Message reordering attacks
  • State corruption from duplicate processing

Consider: Ensure that legitimate network reordering scenarios don't trigger false positives. WebSocket should maintain message order, but if proxies/load balancers are involved, verify this assumption holds.

Style & Conventions ✨

The code follows the existing patterns in the file well:

  • Structured logging with tracing-style parameters ✅
  • Lowercase log messages ✅
  • Consistent error handling ✅

Overall Assessment

This is a good security improvement but has one critical issue (race condition on line 714) that should be addressed before merging. The fundamental approach of closing on sequence violations is sound and improves system reliability.

Recommendation: Request changes to address the race condition, then approve.

Suggested Changes

  1. Critical: Add connection state check before calling onMessage at line 714
  2. Recommended: Add first-message index validation
  3. Optional: Consider more specific WebSocket close codes
  4. Testing: Add driver test coverage for sequence violation scenarios

Review completed following CLAUDE.md conventions

@NathanFlurry NathanFlurry force-pushed the 11-06-chore_rivetkit_close_websockets_on_out_of_sequence_messages branch from aaad3ae to 775d512 Compare November 13, 2025 02:47
@NathanFlurry NathanFlurry force-pushed the 11-06-fix_rivetkit_properly_apply_clientendpoint_from_metadata_lookup branch from 6e38356 to 5270367 Compare November 13, 2025 02:47
@claude
Copy link

claude bot commented Nov 13, 2025

Code Review: Close WebSockets on Out-of-Sequence Messages

Summary

This PR improves the robustness of hibernatable WebSocket handling by detecting and closing connections when message indices are out of sequence (regressed, duplicate, or skipped). The implementation is solid with minor suggestions for improvement.


✅ Strengths

  1. Improved Correctness: The change correctly identifies three types of sequence violations:

    • Regression (index goes backward)
    • Duplicate (same index received twice)
    • Gap/skip (index jumps forward)
  2. Better Ordering: Moving the onMessage handler call to the end (line 714) is correct - it ensures validation happens before message processing, and early return on error prevents processing invalid messages.

  3. Appropriate Error Handling: Using WebSocket close code 1008 (Policy Violation) is semantically correct for sequence violations.

  4. Good Logging: The structured logging includes all relevant debugging information (previousIndex, expectedIndex, receivedIndex, gap, sequenceType, closeReason).

  5. Early Exit Pattern: The return after wsContext.close() prevents further processing of out-of-sequence messages.


🔍 Issues & Suggestions

1. Race Condition Risk (Medium Priority)

The current implementation has a potential race condition. The code now validates and closes BEFORE calling onMessage, but there's a timing issue:

Line 714: wsHandlerPromise.then((x) => x.onMessage?.(event, wsContext));

If the WebSocket is closed at line 675, the onMessage handler at line 714 might still fire asynchronously after the connection is closed. Consider:

// After line 676, add a flag check or use wsContext.readyState
if (wsContext.readyState !== wsContext.OPEN) {
    return;
}

// Or wrap the final onMessage call:
if (wsContext.readyState === wsContext.OPEN) {
    wsHandlerPromise.then((x) => x.onMessage?.(event, wsContext));
}

2. Inconsistent State Management (Low Priority)

When closing due to out-of-sequence messages, the code doesn't update hibernatableWs.msgIndex or currentEntry.messageIndex before closing. This is probably fine since the connection is being terminated, but it leaves the state inconsistent if any cleanup code reads these values. Consider whether cleanup handlers need accurate state.

3. Missing Test Coverage (High Priority)

I searched for tests covering hibernatable WebSocket behavior and found none specifically testing:

  • Message index regression
  • Duplicate message indices
  • Skipped message indices
  • Proper connection closure on sequence violations

Recommendation: Add tests to verify:

// Test cases needed:
// 1. Message index 1, 2, 3 -> success
// 2. Message index 1, 2, 1 -> closes with "ws.message_index_regressed"
// 3. Message index 1, 2, 2 -> closes with "ws.message_index_duplicate"  
// 4. Message index 1, 2, 5 -> closes with "ws.message_index_skip"
// 5. onMessage should not fire after close on sequence violation

4. Close Reason String Format (Low Priority)

The close reasons use dot-notation (ws.message_index_regressed). Verify this format is:

  • Consistent with other close reasons in your codebase
  • Parseable by monitoring/logging infrastructure
  • Compatible with WebSocket clients expecting these reasons

🛡️ Security Considerations

Positive: This change improves security by preventing potential replay attacks or message reordering attacks. Out-of-sequence messages could indicate:

  • Network tampering
  • Client bugs causing retransmission
  • Server-side hibernation/restoration bugs

Closing the connection on sequence violations is the correct defensive approach.


⚡ Performance Considerations

Neutral/Positive:

  • The validation logic is O(1) and runs before message processing
  • Early return prevents wasted processing on invalid messages
  • The conditional checks are straightforward and won't impact performance

📋 Code Quality

Minor: Consider extracting the sequence validation into a helper method to reduce complexity in the event handler:

private validateMessageSequence(
    requestId: string,
    messageIndex: number,
    wsContext: any
): boolean {
    const currentEntry = this.#hibernatableWebSocketAckQueue.get(requestId);
    if (!currentEntry) return true; // First message
    
    const previousIndex = currentEntry.messageIndex;
    if (messageIndex === previousIndex + 1) return true; // Valid sequence
    
    // Handle out-of-sequence...
    return false;
}

🎯 Recommendations

Before Merging:

  1. ✅ Add test coverage for sequence violation scenarios
  2. ✅ Address the race condition in the onMessage handler
  3. ✅ Verify close reason string format is consistent with codebase conventions

Optional:
4. Consider refactoring sequence validation into a helper method
5. Document the expected behavior when sequence violations occur in code comments


Overall, this is a solid improvement to connection reliability. The main concern is test coverage and the potential race condition with the async onMessage handler.

@claude
Copy link

claude bot commented Nov 13, 2025

Code Review

Summary

This PR improves the robustness of hibernatable WebSocket message handling by detecting out-of-sequence messages and closing connections when sequencing violations occur. The changes move message processing to happen after validation, preventing potentially corrupted state from being processed.

Strengths

  1. Critical Bug Fix: Moving wsHandlerPromise message processing to the end (line 714) is essential. Previously, messages were processed before sequence validation, which could lead to race conditions and state corruption.

  2. Clear Categorization: The PR properly categorizes out-of-sequence scenarios into regressed, duplicate, and gap/skip cases.

  3. Appropriate WebSocket Close Code: Using code 1008 (Policy Violation) is semantically correct for protocol violations.

  4. Good Logging: The warning logs provide comprehensive debugging information including gap size calculation.

Potential Issues and Suggestions

1. Race Condition with Early Return (Medium Priority)

The early return at line 676 prevents message processing, but wsContext.close() is asynchronous. If close does not complete immediately, subsequent messages might still arrive and be processed. Consider adding a flag to track connections being terminated.

2. Missing Test Coverage (High Priority)

The PR lacks tests for out-of-sequence detection logic. Recommended test cases:

  • Duplicate message index
  • Regressed message index
  • Skipped message index
  • Valid sequential messages
  • Edge cases: first message with non-zero index, very large gaps

3. Invariant Assumptions (Low Priority)

Lines 617-618 use invariants on rivetRequestId and rivetMessageIndex, but these fields are marked experimental and optional in websocket-interface.ts. Non-hibernatable WebSockets might not provide these fields.

4. Documentation Gap (Low Priority)

The significant behavior change (closing on out-of-sequence) should be documented with comments explaining the rationale.

5. State Management Order (Low Priority)

State is updated before message processing. If onMessage throws, state is still advanced. Confirm this is intended behavior.

Security Considerations

Positive: Closing on sequence violations prevents replay attacks and state corruption. Validation happens before user handlers execute.

Performance Considerations

Minimal impact: O(1) sequence check only for hibernatable WebSockets.

Recommendation

Conditional Approval - The core fix is solid, but recommend:

  1. Add test coverage for out-of-sequence scenarios (blocker)
  2. Consider race condition with terminating connections (nice-to-have)
  3. Clarify invariant behavior documentation

Review generated by Claude Code

@graphite-app
Copy link
Contributor

graphite-app bot commented Nov 13, 2025

Merge activity

  • Nov 13, 8:47 PM UTC: NathanFlurry added this pull request to the Graphite merge queue.
  • Nov 13, 8:48 PM UTC: CI is running for this pull request on a draft pull request (#3463) due to your merge queue CI optimization settings.
  • Nov 13, 8:50 PM UTC: Merged by the Graphite merge queue via draft PR: #3463.

@graphite-app graphite-app bot closed this Nov 13, 2025
@graphite-app graphite-app bot deleted the 11-06-chore_rivetkit_close_websockets_on_out_of_sequence_messages branch November 13, 2025 20:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants