Skip to content

Conversation

@jaredmdobson
Copy link
Contributor

@jaredmdobson jaredmdobson commented Aug 28, 2025

I'm not trying to cause problems i swear 😂 i know this is as beast and we can schedule a call to go over it or whatever is needed.

I really needed connection pooling and the test file was insanely big 😂 and so i had to 'yak shave' 🪒 to get there.

mysql:
  host: "localhost"
  port: 9307
  user: "root"
  password: "admin"
  pool_size: 3 # Reduced for tests to avoid connection exhaustion
  max_overflow: 2
  charset: "utf8mb4" # Explicit charset for MariaDB compatibility
  collation: "utf8mb4_unicode_ci" # Explicit collation for MariaDB compatibility

Also for mariadb this adds in breaking changes as you must specify the charset and the collation for the connection pool to work. So we'd need to increment a major or minor version etc.

I'm working on fixing the build.

- Updated `CLAUDE.md` to clarify the usage of the test script for full suite testing.
- Enhanced `run_tests.sh` to accept optional pytest parameters, allowing for more flexible test execution.
- Introduced a new method in `DataTestMixin` for normalizing datetime comparisons between MySQL and ClickHouse, improving accuracy in test assertions.
- Refactored integration tests to ensure proper setup and teardown for MariaDB configurations, addressing known timing issues in replication tests.
…xecution

- Introduced a new `PARALLEL_TESTING.md` file detailing the implementation of parallel test execution, achieving significant runtime reduction from 60-90 minutes to 10-15 minutes.
- Updated `docker-compose-tests.yaml` to optimize health check parameters for the MySQL service.
- Enhanced `pytest.ini` with new markers for parallel-safe and serial-only tests.
- Modified `requirements-dev.txt` to include `pytest-xdist` for enabling parallel execution.
- Refactored `run_tests.sh` to support parallel execution and CI reporting, allowing for flexible test runs with various options.
- Improved test isolation in `conftest.py` to ensure unique database names for each test, preventing conflicts during parallel execution.
- Updated integration tests to utilize the new parallel testing framework and ensure proper database context handling.
- Adjusted health check parameters in `docker-compose-tests.yaml` for the MySQL service to improve reliability during testing.
- Added `pytest-xdist` version 3.8.0 to `requirements-dev.txt` and `pyproject.toml` to support parallel test execution.
- Updated `requirements-dev.txt` to ensure compatibility with the latest testing frameworks.
- Refactored test setup in `conftest.py` to ensure unique database names for improved isolation in tests.
- Removed obsolete integration test files to streamline the test suite and enhance maintainability.
…el execution

- Updated `.gitignore` to include patterns for binlog data directories to prevent clutter.
- Enhanced `CLAUDE.md` with detailed testing architecture and critical fixes for parallel execution, including database isolation and connection pooling configurations.
- Modified `Dockerfile` to ensure proper permissions for the binlog directory, addressing Docker volume mount issues.
- Refactored `run_tests.sh` to support intelligent parallel execution and CI reporting, optimizing test runs.
- Implemented critical directory creation logic in `config.py` to ensure binlog directory writability, preventing race conditions during parallel test execution.
- Updated various test files to utilize the new `IsolatedBaseReplicationTest` for improved test isolation and reliability.
- Cleaned up obsolete files in the `binlog_json_parser` directory to streamline the codebase.
- Deleted `test-report.html` and `test-results.xml` as they are no longer needed.
- Updated `.gitignore` to include new patterns for `test-report.html` and `test-results.xml` to prevent future clutter.
- Added `.pytest_cache/` to `.gitignore` to prevent caching files from cluttering the repository.
@bakwc
Copy link
Owner

bakwc commented Sep 1, 2025

Could you please split it into several PRs? Like tests separation - first PR, feature1 implementation - second PR, feature2 implementation 3d PR? Currently it's pretty hard to review so big PR.

- Resolved database timing issues by implementing a complete dynamic database isolation system, allowing tests to run safely in parallel.
- Enhanced `CLAUDE.md` with detailed descriptions of the new isolation features and centralized configuration management.
- Updated `docker-compose-tests.yaml` for improved MySQL service configuration, including health checks and volume management.
- Refactored `run_tests.sh` to include pre-test infrastructure monitoring and support for intelligent parallel execution.
- Improved test setup in `conftest.py` to ensure unique database names and streamlined cleanup processes.
- Removed the obsolete `PARALLEL_TESTING.md` file and integrated its content into existing documentation.
- Updated various integration tests to utilize the new isolation framework and ensure proper database context handling.
…liability

- Implemented a centralized TestIdManager to resolve subprocess isolation issues, resulting in a 4x improvement in test pass rate (from 18.8% to 69.9%).
- Updated CLAUDE.md to reflect the new status and improvements in test infrastructure, including detailed descriptions of recent fixes and enhancements.
- Refactored run_tests.sh to streamline test execution and improve performance monitoring.
- Enhanced dynamic configuration management to ensure proper isolation and prevent database context issues during parallel execution.
- Migrated several integration tests to utilize the enhanced configuration framework, ensuring better reliability and consistency in test results.
- Improved error handling and logging in various test files to facilitate debugging and maintainability.
…n and configuration

- Refactored `run_tests.sh` to change the phase of infrastructure monitoring to post-startup and removed redundant Docker service startup command.
- Updated `rules.mdc` to set `alwaysApply` to false, enhancing configuration management for test execution.
- Improved code readability and organization in `converter.py` by standardizing string quotes and optimizing import statements.
… and organization

- Expanded .gitignore to include additional log files, environment variables, and editor-specific directories to prevent clutter in the repository.
- Updated CLAUDE.md to reflect recent changes in test infrastructure, including detailed descriptions of fixes and enhancements related to test reliability and performance.
- Refactored run_tests.sh to improve performance monitoring and streamline test execution processes.
- Enhanced comments and documentation throughout the codebase to clarify the purpose and functionality of various components, ensuring better maintainability.
- Marked tasks for improving source code documentation and fixing critical process startup issues as done.
- Updated the status of individual failing tests to in-progress.
- Refactored test runners in `conftest.py` to use `python3` and absolute paths for better compatibility in container environments.
- Added debug logging in `BaseReplicationTest` to improve error handling and visibility during test execution.
- Updated `docker-compose-tests.yaml` to create a named volume for binlog data and ensure proper permissions for the binlog directory.
- Improved directory creation logic in `binlog_replicator.py` and `db_replicator.py` to handle missing parent directories more robustly.
- Refactored integration tests in `test_basic_process_management.py` and `test_parallel_initial_replication.py` to utilize isolated configurations for better test isolation and reliability.
- Updated task status in `tasks.json` to reflect progress in fixing individual failing tests.
- Standardized string formatting across command initialization in `runner.py` for better consistency.
- Enhanced the structure of the `DbReplicatorRunner` class by using multi-line arguments for improved readability.
- Updated logging messages in both `runner.py` and `utils.py` to use consistent string formatting.
- Improved import organization in `runner.py` and `utils.py` for better clarity and maintainability.
- Added helper methods in `base_replication_test.py` to streamline replication setup and target database creation in tests.
- Added detailed documentation to the `run` method in `ProcessRunner` to clarify the importance of test isolation during pytest execution.
- Implemented critical checks to ensure test ID logic only runs in testing environments, preventing unnecessary warnings in production.
- Improved comments to explain the rationale behind the test isolation system and its impact on database operations during parallel test execution.
- Updated CLAUDE.md to reflect current test status: 126 passed, 47 failed, 11 skipped (68.5% pass rate).
- Implemented critical fixes for process startup reliability, including increased timeouts and enhanced error diagnostics.
- Improved database detection logic to handle temporary and final database transitions more effectively.
- Enhanced dynamic isolation features for parallel test execution, ensuring worker-specific database management.
- Removed outdated documentation files and consolidated relevant information into existing guides for clarity.
…ocesses

- Marked multiple tasks as done in tasks.json, reflecting the completion of test categorization and error handling improvements.
- Enhanced directory creation logic in binlog_replicator.py and db_replicator.py to ensure robust handling of parent directories, preventing startup failures.
- Improved error diagnostics and logging for directory creation to facilitate better debugging during test execution.
- Removed outdated and flaky tests to streamline the test suite and improve overall reliability.
- Added support for mapping the MySQL 'boolean' type to 'Bool' in the converter, improving type handling consistency.
…ltime

- Added error handling for OperationalError (Error 1236) to detect binlog index file corruption.
- Implemented automatic deletion of the corrupted binlog directory and clean exit for process restart.
- Enhanced logging for better diagnostics during recovery attempts.
- Added a new module for handling MySQL binlog corruption (Error 1236) with automatic recovery functionality.
- Integrated recovery logic into both DbReplicatorRealtime and BinlogReplicator to streamline error handling and process restart.
- Updated .gitignore to exclude the binlog directory instead of files for better management.
… security and clarity

- Updated DbReplicator to pass raw primary key values to mysql_api, eliminating manual quote handling for parameterized queries.
- Enhanced MySQLApi to use parameterized queries for pagination, preventing SQL injection and improving query safety.
- Added detailed logging for query execution and parameters to aid in debugging and error handling.
- Refactored directory creation handling to ensure robust creation of parent directories, preventing potential startup failures.
- Enhanced logging for directory creation errors to provide clearer diagnostics during execution.
- Cleaned up whitespace for better code readability.
…bReplicator

- Enhanced the `recreate_database` method in ClickhouseApi to include retry logic for dropping and creating databases, improving robustness against concurrent operations.
- Updated logging to provide clearer insights during database creation and error handling.
- Modified DbReplicator to conditionally run real-time replication based on the `initial_only` flag, ensuring better control over replication processes.
- Improved logging for replication completion to include execution time, aiding in performance monitoring.
- Updated the bug report for the critical replication issue, clarifying the status and latest findings regarding the infinite loop on the `api_key` table.
- Improved logging in the `perform_initial_replication` method to track table processing and error handling, allowing for better diagnostics during replication.
- Added exception handling to ensure that individual table failures do not halt the entire replication process, enhancing robustness.
- Implemented detailed logging for worker processes, including primary key advancement tracking and iteration counts, to aid in debugging.
- Enhanced SQL query logging in MySQLApi to provide better visibility into executed queries and parameters, improving overall error handling.
- Replaced print statements with logging calls in binlog_replicator.py, clickhouse_api.py, and other modules to enhance consistency and debuggability.
- Improved error handling in ClickhouseApi to ensure database qualification is always required, preventing UNKNOWN_TABLE errors.
- Enhanced logging in DbReplicatorInitial to track worker processes and primary key advancements, providing better diagnostics for replication issues.
- Updated MySQLApi to log query results and primary key ranges for improved visibility into data operations.
- Streamlined log forwarding from subprocesses to the main logger, ensuring real-time visibility of worker outputs.
- Simplified the initial replication process by removing unnecessary error handling for individual table failures, ensuring all tables are processed without interruption.
- Enhanced logging to confirm successful completion of all tables during initial replication, improving visibility into the replication status.
- Updated logging configuration in main.py to output to stdout for real-time visibility, addressing previous buffering issues with stderr.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.