Skip to content

Conversation

@fcostaoliveira
Copy link
Collaborator

@fcostaoliveira fcostaoliveira commented Sep 24, 2025

This PR introduces connection error tracking and a configurable auto-reconnect mechanism to memtier_benchmark. The goal is to make benchmarks more resilient when running against clusters where TLS errors, timeouts, or dropped connections can otherwise abort runs prematurely.

Connection Error Tracking

  • Tracks and reports total connection errors per client group
  • Adds connection error metrics in both text and JSON outputs (Connection Errors and Connection Errors/sec)
  • Connection errors are displayed in live progress reports only when errors occur (backwards compatible):
    [RUN #1 45%, 900 secs] 45 threads 200 conns 12 conn errors: 1000000 ops...
    

New CLI Options for Reconnection

  • --reconnect-on-error → Enable automatic reconnection when connection errors occur (default: disabled)
  • --max-reconnect-attempts=N → Limit reconnection retries (default: 0, unlimited)
  • --reconnect-backoff-factor=F → Exponential backoff multiplier for retry delays (default: 0, no backoff)
  • --connection-timeout=SECS → Timeout in seconds before considering a connection attempt failed (default: 0, disabled)

Client Behavior

  • On BEV_EVENT_ERROR, BEV_EVENT_EOF, or connection timeout → update error stats, attempt reconnection (if enabled)
  • Supports exponential backoff for retries with configurable backoff factor
  • Backoff delay resets on successful connection
  • Reconnection attempts are logged to stderr for visibility
  • If max attempts exceeded or reconnect disabled, the client terminates gracefully

Backwards Compatibility

All new features are opt-in and fully backwards compatible:

  • Default behavior unchanged: Reconnection is disabled by default (--reconnect-on-error must be explicitly enabled)
  • Configuration defaults: All new options default to 0 (disabled/unlimited)
  • Output format: Connection errors only appear in progress display when errors actually occur
  • Existing workflows: No impact on existing scripts, benchmarks, or automation

@kamran-redis
Copy link

Any reason --reconnect-on-error should not be enabled by default other than backward compatibility?

@fcostaoliveira fcostaoliveira changed the title WIP/Open for Feedback: Connection Error Tracking & Auto-Reconnect Connection Error Tracking & Auto-Reconnect Nov 17, 2025
@fcostaoliveira
Copy link
Collaborator Author

Any reason --reconnect-on-error should not be enabled by default other than backward compatibility?

@kamran-redis like you mentioned just to be backwards compatible. all added features on this PR have the default behaviour that's backwards compatible

paulorsousa
paulorsousa previously approved these changes Nov 18, 2025
@fcostaoliveira fcostaoliveira merged commit 7ae9f65 into master Nov 18, 2025
39 checks passed
@fcostaoliveira fcostaoliveira deleted the error.reconnect branch November 18, 2025 14:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants