1. Background
In a master-slave replication setup, a damaged binlog on the primary prevented the replica from synchronizing data, requiring rebuilding the replica. The clone feature introduced in MySQL 8.0.17 was used for recovery. For two days after recovery, master-replica data conflict errors occurred.
Analysis of the binlog showed that both primary and replica executed the same statement at the same time. The application team confirmed scheduled jobs should only run on the primary.
Investigation found that scheduled events on the cloned replica were also enabled, so both nodes ran the scheduled job simultaneously, causing replication errors. Disabling the scheduled event on the replica resolved the issue.
2. Reproducing the issue
Test environment: GreatSQL 8.0.32-24
greatsql> SELECT VERSION(); +-----------+ | VERSION() | +-----------+ | 8.0.32-24 | +-----------+ 1 row in set (0.00 sec)
1. Set up a master-slave environment
Omitted.
2. Create event
greatsql> create database test; greatsql> use test; greatsql> CREATE TABLE `test` ( `id` int(11) unsigned NOT NULL AUTO_INCREMENT COMMENT 'ID', `now` datetime DEFAULT NULL COMMENT 'Time', PRIMARY KEY (`id`)); greatsql> CREATE EVENT event_test ON SCHEDULE EVERY 1 MINUTE ON COMPLETION PRESERVE ENABLE COMMENT 'Insert records into the test table every one minute' DO INSERT INTO test VALUES(NULL, now());
3. Check event status
On the primary, the event status is ENABLED by default.
greatsql> show events; +------+------------+---------+-----------+-----------+------------+----------------+----------------+---------------------+------+---------+------------+----------------------+----------------------+--------------------+ | Db | Name | Definer | Time zone | Type | Execute at | Interval value | Interval field | Starts | Ends | Status | Originator | character_set_client | collation_connection | Database Collation | +------+------------+---------+-----------+-----------+------------+----------------+----------------+---------------------+------+---------+------------+----------------------+----------------------+--------------------+ | test | event_test | root@% | SYSTEM | RECURRING | NULL | 1 | MINUTE | 2023-10-12 17:11:14 | NULL | ENABLED | 1 | utf8mb4 | utf8mb4_unicode_ci | utf8mb4_unicode_ci | +------+------------+---------+-----------+-----------+------------+----------------+----------------+---------------------+------+---------+------------+----------------------+----------------------+--------------------+ 1 row in set (0.00 sec)
On the replica, the event status is SLAVESIDE_DISABLED by default.
greatsql> show events; +------+------------+---------+-----------+-----------+------------+----------------+----------------+---------------------+------+--------------------+------------+----------------------+----------------------+--------------------+ | Db | Name | Definer | Time zone | Type | Execute at | Interval value | Interval field | Starts | Ends | Status | Originator | character_set_client | collation_connection | Database Collation | +------+------------+---------+-----------+-----------+------------+----------------+----------------+---------------------+------+--------------------+------------+----------------------+----------------------+--------------------+ | test | event_test | root@% | SYSTEM | RECURRING | NULL | 1 | MINUTE | 2023-10-12 17:11:14 | NULL | SLAVESIDE_DISABLED | 1 | utf8mb4 | utf8mb4_unicode_ci | utf8mb4_unicode_ci | +------+------------+---------+-----------+-----------+------------+----------------+----------------+---------------------+------+--------------------+------------+----------------------+----------------------+--------------------+ 1 row in set (0.00 sec)
4. Inspect data
greatsql> select * from test.test; +----+---------------------+ | id | now | +----+---------------------+ | 1 | 2023-08-08 16:39 | | 2 | 2023-08-08 16:39 | | 3 | 2023-08-08 16:39 | +----+---------------------+ 3 rows in set (0.00 sec)
5. Clone the replica
# Install clone plugin on both master and replica
greatsql> install plugin clone soname 'mysql_clone.so';
# Perform clone on the replica
greatsql> set global clone_valid_donor_list='172.17.137.162:6001'; greatsql> clone instance from root@'172.17.137.162':6001 identified by 'greatsql';
6. Reconfigure replication
greatsql> change master to master_user='root',master_password='greatsql',master_host='172.17.137.162',master_port=6001,master_auto_position=1; Query OK, 0 rows affected, 7 warnings (0.04 sec) greatsql> start slave; Query OK, 0 rows affected, 1 warning (0.04 sec)
7. Check replication status
greatsql> show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for source to send event Master_Host: 172.17.137.162 Master_User: root Master_Port: 6001 Connect_Retry: 60 Master_Log_File: binlog.000001 Read_Master_Log_Pos: 2959 Relay_Log_File: relaylog.000002 Relay_Log_Pos: 395 Relay_Master_Log_File: binlog.000001 Slave_IO_Running: Yes Slave_SQL_Running: No Last_Errno: 1062 Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction 'e8bf88f9-2acd-11ee-a98a-00163e605c74:8' at master log binlog.000001, end_log_pos 2606. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any. Skip_Counter: 0 Exec_Master_Log_Pos: 2307 Relay_Log_Space: 1242 Until_Condition: None greatsql> select * from performance_schema.replication_applier_status_by_worker limit 1; *************************** 1. row *************************** CHANNEL_NAME: WORKER_ID: 1 THREAD_ID: NULL SERVICE_STATE: OFF LAST_ERROR_NUMBER: 1062 LAST_ERROR_MESSAGE: Worker 1 failed executing transaction 'e8bf88f9-2acd-11ee-a98a-00163e605c74:8' at master log binlog.000001, end_log_pos 2606; Could not execute Write_rows event on table test.test; Duplicate entry '5' for key 'test.PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 2606 LAST_ERROR_TIMESTAMP: 2023-08-08 16:39.033240 APPLYING_TRANSACTION: e8bf88f9-2acd-11ee-a98a-00163e605c74:8 APPLYING_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 2023-08-08 16:45.795753 APPLYING_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 2023-08-08 16:45.795753 APPLYING_TRANSACTION_START_APPLY_TIMESTAMP: 2023-08-08 16:39.032510 1 row in set (0.00 sec)
The replica error shows a primary key conflict.
8. Check scheduled event status on the replica
The event on the replica is currently ENABLED.
greatsql> show events; +------+------------+---------+-----------+-----------+------------+----------------+----------------+---------------------+------+---------+------------+----------------------+----------------------+--------------------+ | Db | Name | Definer | Time zone | Type | Execute at | Interval value | Interval field | Starts | Ends | Status | Originator | character_set_client | collation_connection | Database Collation | +------+------------+---------+-----------+-----------+------------+----------------+----------------+---------------------+------+---------+------------+----------------------+----------------------+--------------------+ | test | event_test | root@% | SYSTEM | RECURRING | NULL | 1 | MINUTE | 2023-08-08 15:58:45 | NULL | ENABLED | 1 | utf8mb4 | utf8mb4_unicode_ci | utf8mb4_unicode_ci | +------+------------+---------+-----------+-----------+------------+----------------+----------------+---------------------+------+---------+------------+----------------------+----------------------+--------------------+ 1 row in set (0.00 sec)
Because the scheduled event on the replica was executing and wrote data, it caused the primary key conflict.
9. Resolution
greatsql> alter event event_test DISABLE; Query OK, 0 rows affected (0.01 sec)
Disable the event on the replica, skip the primary key conflict error, and then restart replication.
Summary
- If the primary runs scheduled jobs, after cloning a replica, disable scheduled jobs on the replica to avoid simultaneous execution that can cause replication failures.
- When cloning, if the donor contains replication configuration, the cloned recipient will copy that information and may automatically start replication after the instance restarts. To avoid this, add skip-slave-start to the recipient's configuration to prevent automatic start after reboot.
ALLPCB