Skip to content

Commit 1f42c42

Browse files
Add documentation on MySQL 5.7 support
1 parent 13ddcc4 commit 1f42c42

File tree

4 files changed

+87
-13
lines changed

4 files changed

+87
-13
lines changed

docs/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ Contents
2626

2727
installation
2828
limitations
29+
mysql57_support
2930
binlogstream
3031
events
3132
examples

docs/mysql57_support.rst

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
.. _mysql57_support:
2+
3+
MySQL 5.7, MySQL 8.0+ and `use_column_name_cache`
4+
==================================================
5+
6+
In MySQL 5.7 and earlier, the binary log events for row-based replication do not include column name metadata. This means that `python-mysql-replication` cannot map column values to their names directly from the binlog event.
7+
8+
Starting with MySQL 8.0.1, the `binlog_row_metadata` system variable was introduced to control the amount of metadata written to the binary log. The default value for this variable is `MINIMAL`, which provides the same behavior as MySQL 5.7.
9+
10+
The Problem
11+
-----------
12+
13+
When column metadata is not present in the binlog (as in MySQL 5.7 and earlier, or when `binlog_row_metadata` is set to `MINIMAL` in MySQL 8.0+), the `values` dictionary in a `WriteRowsEvent`, `UpdateRowsEvent`, or `DeleteRowsEvent` will contain integer keys corresponding to the column index, not the column names.
14+
15+
For example, for a table `users` with columns `id` and `name`, an insert event might look like this:
16+
17+
.. code-block:: python
18+
19+
{0: 1, 1: 'John Doe'}
20+
21+
This can make your replication logic harder to write and maintain, as you need to know the column order.
22+
23+
The Solution: `use_column_name_cache`
24+
-------------------------------------
25+
26+
To address this, `python-mysql-replication` provides the `use_column_name_cache` parameter for the `BinLogStreamReader`.
27+
28+
When you set `use_column_name_cache=True`, the library will perform a query to the `INFORMATION_SCHEMA.COLUMNS` table to fetch the column names for a given table the first time it encounters an event for that table. The column names are then cached in memory for subsequent events for the same table, avoiding redundant queries.
29+
30+
This allows you to receive row data with column names as keys.
31+
32+
MySQL 8.0+ with `binlog_row_metadata=FULL`
33+
------------------------------------------
34+
35+
In MySQL 8.0.1 and later, you can set `binlog_row_metadata` to `FULL`. When this setting is enabled, the column names are included directly in the binlog events, and `use_column_name_cache` is not necessary.
36+
37+
Example
38+
-------
39+
40+
Here is how to enable the column name cache when needed:
41+
42+
.. code-block:: python
43+
44+
from pymysqlreplication import BinLogStreamReader
45+
46+
mysql_settings = {'host': '127.0.0.1', 'port': 3306, 'user': 'root', 'passwd': ''}
47+
48+
# Enable the column name cache for MySQL 5.7 or MySQL 8.0+ with binlog_row_metadata=MINIMAL
49+
stream = BinLogStreamReader(
50+
connection_settings=mysql_settings,
51+
server_id=100,
52+
use_column_name_cache=True
53+
)
54+
55+
for binlogevent in stream:
56+
if isinstance(binlogevent, WriteRowsEvent):
57+
# Now you can access values by column name
58+
user_id = binlogevent.rows[0]["values"]["id"]
59+
user_name = binlogevent.rows[0]["values"]["name"]
60+
print(f"New user: id={user_id}, name={user_name}")
61+
62+
stream.close()
63+
64+
Important Considerations
65+
------------------------
66+
67+
* **Performance:** Enabling `use_column_name_cache` will result in an extra query to the database for each new table encountered in the binlog. The results are cached, so the performance impact should be minimal after the initial query for each table.
68+
* **Permissions:** The MySQL user used for replication must have `SELECT` privileges on the `INFORMATION_SCHEMA.COLUMNS` table.
69+
* **Default Behavior:** This feature is disabled by default (`use_column_name_cache=False`) to maintain backward compatibility and to avoid making extra queries unless explicitly requested.

pymysqlreplication/tests/base.py

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,12 @@
1-
import pymysql
21
import copy
3-
from pymysqlreplication import BinLogStreamReader
4-
import os
52
import json
3+
import os
4+
import unittest
5+
6+
import pymysql
67
import pytest
78

8-
import unittest
9+
from pymysqlreplication import BinLogStreamReader
910

1011

1112
def get_databases():

pymysqlreplication/tests/test_basic.py

Lines changed: 12 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,19 @@
11
import io
22
import time
33
import unittest
4+
from unittest.mock import patch
5+
6+
from pymysql.protocol import MysqlPacket
47

5-
from pymysqlreplication.json_binary import JsonDiff, JsonDiffOperation
6-
from pymysqlreplication.tests import base
78
from pymysqlreplication import BinLogStreamReader
8-
from pymysqlreplication.gtid import GtidSet, Gtid
9-
from pymysqlreplication.event import *
109
from pymysqlreplication.constants.BINLOG import *
1110
from pymysqlreplication.constants.NONE_SOURCE import *
12-
from pymysqlreplication.row_event import *
11+
from pymysqlreplication.event import *
12+
from pymysqlreplication.gtid import Gtid, GtidSet
13+
from pymysqlreplication.json_binary import JsonDiff, JsonDiffOperation
1314
from pymysqlreplication.packet import BinLogPacketWrapper
14-
from pymysql.protocol import MysqlPacket
15-
from unittest.mock import patch
16-
15+
from pymysqlreplication.row_event import *
16+
from pymysqlreplication.tests import base
1717

1818
__all__ = [
1919
"TestBasicBinLogStreamReader",
@@ -276,7 +276,10 @@ def test_fetch_column_names_from_schema(self):
276276
if not self.isMySQL57AndMore():
277277
self.skipTest("Test for MySQL 5.7+ where binlog_row_metadata can be MINIMAL")
278278

279-
self.execute("SET SESSION binlog_row_metadata = 'MINIMAL'")
279+
# Minimal is only supported for MySQL 8 and later
280+
if self.isMySQL80AndMore():
281+
self.execute("SET SESSION binlog_row_metadata = 'MINIMAL'")
282+
280283
query = "CREATE TABLE test_column_cache (id INT NOT NULL AUTO_INCREMENT, data VARCHAR (50) NOT NULL, PRIMARY KEY (id))"
281284
self.execute(query)
282285
self.execute("INSERT INTO test_column_cache (data) VALUES('Hello')")

0 commit comments

Comments
 (0)