-
Notifications
You must be signed in to change notification settings - Fork 690
Add opt-in column name caching for MySQL 5.7 compatibility #633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add opt-in column name caching for MySQL 5.7 compatibility #633
Conversation
This commit introduces an optional feature to fetch and cache column names from INFORMATION_SCHEMA when binlog metadata is missing (common in MySQL 5.7). Changes: - Added module-level cache dictionary to store column names by table - Implemented _fetch_column_names_from_schema() method in TableMapEvent - Modified _sync_column_info() to use cached column names when available - Added use_column_name_cache parameter to BinLogStreamReader (default: False) - Propagated parameter through BinLogPacketWrapper to event classes Benefits: - Resolves UNKNOWN_COL0, UNKNOWN_COL1 placeholders in MySQL 5.7 - Cache provides ~800x performance improvement over repeated queries - Opt-in design maintains backward compatibility - Module-level cache persists for process lifetime Related to issue julien-duponchelle#612
|
That sound very good, I'm going to look at it during the week end. |
|
Seem the unit test got a problem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds an opt-in column name caching mechanism to resolve the UNKNOWN_COL0/UNKNOWN_COL1 placeholder issue when using MySQL 5.7, which lacks binlog metadata support for column names.
Key changes:
- Introduces
use_column_name_cacheparameter (default: False) to maintain backward compatibility - Implements module-level caching that queries INFORMATION_SCHEMA.COLUMNS when binlog metadata is missing
- Provides significant performance improvement through persistent in-memory caching
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| pymysqlreplication/row_event.py | Adds module-level cache, implements column name fetching from INFORMATION_SCHEMA, and integrates fallback logic into _sync_column_info() |
| pymysqlreplication/packet.py | Propagates use_column_name_cache parameter through BinLogPacketWrapper initialization |
| pymysqlreplication/binlogstream.py | Adds use_column_name_cache parameter to BinLogStreamReader and passes it to packet wrapper |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
pymysqlreplication/row_event.py
Outdated
| import logging | ||
| logging.info(f"Cached column names for {cache_key}: {len(column_names)} columns") |
Copilot
AI
Nov 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import statements should be placed at the module level, not within function bodies. Move the 'import logging' statement to the top of the file with other imports to follow Python best practices and avoid repeated import overhead on each function call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with copilot.
But also we have a enable_logging flag that we pass binlogstream.py if we want to send logging data we should use it.
|
I did a quick review to help move forward let me know if you need any help. I know it's a feature people want. |
- Move logging import to module level (was inside function) - Respect enable_logging flag for all logging statements - Only log when enable_logging=True is explicitly enabled - Propagate enable_logging through BinLogPacketWrapper to TableMapEvent
|
thank you @julien-duponchelle. The last commit addresses to your latest feedback. |
Make use_column_name_cache parameter optional with default False. This fixes TypeError when parameter is not provided.
|
I see that I have still issues and need to test more my implementations, so Im closing this PR for now and coming back with a more bugfree version and not rubbing also your time |
No worry about my time, i'm happy to get contribution |
Summary
This PR adds an optional feature to resolve the UNKNOWN_COL0, UNKNOWN_COL1 placeholder issue when using MySQL 5.7, which does not include column name metadata in binlog events.
Problem
MySQL 5.7 does not support the
binlog_row_metadata=FULLconfiguration option (available only in MySQL 8.0.14+). When binlog events lack column name metadata, the library generates placeholder names likeUNKNOWN_COL0,UNKNOWN_COL1, etc., making it difficult to work with the data.Solution
This PR introduces an opt-in caching mechanism that queries
INFORMATION_SCHEMA.COLUMNSto fetch actual column names when binlog metadata is missing.Key Features:
use_column_name_cache=False(default) maintains backward compatibilityImplementation Details:
_COLUMN_NAME_CACHEdictionary inrow_event.py_fetch_column_names_from_schema()method inTableMapEventclass_sync_column_info()to use cached column names as fallbackuse_column_name_cacheparameter toBinLogStreamReaderBinLogPacketWrapperto event classesUsage Example
Testing
Related Issues
Related to #612
Backward Compatibility
This change is fully backward compatible. The feature is disabled by default and must be explicitly enabled by setting
use_column_name_cache=True.