背景

线上很多实例都是mysql5.7.17版本，但是这个版本有一个严重的bug可能会造成主从数据不一致或者其他错误,官方在5.7.19解决了这个问题，以下是官方的描述:

英文：Replication: In certain cases, the master could write to the binary log a last_committed value which was smaller than it should have been. 
     This could cause the slave to execute in parallel transactions which should not have been, leading to inconsistencies or other errors. (Bug #84471, Bug #25379659)

中文：在某些场景下，master记录到binlog里面的last_committed要比他本应该记的last_committed小，这将会导致主从数据不一致或者其他错误

疑问

1.这个bug一定会导致数据不一致吗
2.如何复现这个bug
3.如何用这个bug复现主从不一致
4.有没有可能主从复制两个线程正常，但是数据已经不一致了
5.官方是如何解决的这个bug
6.评估当前线上mysql5.7.17是否需要升级

课前知识（并行复制-COMMIT_ORDER）

master端：
    1.binlog组提交的三个阶段
            trans_commit_stmt
                ha_commit_trans
                    MYSQL_BIN_LOG::prepare
                        xxxx
                    MYSQL_BIN_LOG::commit
                        MYSQL_BIN_LOG::ordered_commit
                            FLUSH        /*阶段1*/
                            SYNC         /*阶段2*/
                            COMMIT    /*阶段3*/
    2.主库哪些事物可以放到一个组里提交
       ordered_commit:FLUSH----->ordered_commit:COMMIT:process_commit_stage_queue，这个阶段的事物肯定互相无影响，会分配相同last_committed及不同的sequence_number
    3.一起提交的事物组如何区别
        last_committed相同的是一个组，组内区别不同事物根据sequence_number区别
    4.last_committed与sequence_number关系
        last_committed是上一组sequence_number最大值
    5.主库binlog记录格式
        #200628 19:27:52 server id 1001  end_log_pos 259 CRC32 0x5682e65e   GTID    last_committed=0    sequence_number=1
        #200628 19:28:00 server id 1001  end_log_pos 534 CRC32 0x999fefc6   GTID    last_committed=0    sequence_number=2
        #200628 19:28:08 server id 1001  end_log_pos 809 CRC32 0x9a7ad89c   GTID    last_committed=0    sequence_number=3
slave端：
    1.slave端有哪些线程参与
        io线程接收master binlog写入自己relay log
        coordinator线程读取自己的relay log进行分发
        worker线程获取分发到的日志进行回放
    2.并行的单位是什么
        以GTID或者匿名GTID事物为单位进行分发
    3.哪些事物可以并行
        当前有空闲的worker(逻辑见下方源码)
        事物的last_committed <= lwm_estimate (逻辑见下方源码)

# slave端能否并行的2处关键代码及注释解释
1.获取空闲worker
Mts_submode_logical_clock::get_free_worker(Relay_log_info *rli)
{
  for (Slave_worker **it= rli->workers.begin(); it != rli->workers.end(); ++it)
  {
    Slave_worker *w_i= *it;
    if (w_i->jobs.len == 0)
      return w_i;
  }
  return 0;
}
2.GAQ中lwm_estimate解释
              the last time index containg lwm
                  +------+
                  | LWM  |
                  |  |   |
                  V  V   V
   GAQ:   xoooooxxxxxXXXXX...X
                ^   ^
                |   | LWM+1
                |
                +- tne new current_lwm(一定要保证之前所有的事物都已经commited，而不能找最左面最大的"x"，因为有可能前面存在未完成的事物"o")
         <---- logical (commit) time ----
here `x' stands for committed, `X' for committed and discarded from the running range of the queue, `o' for not committed.
3.对比last_committed与lwm_estimate
if (!is_new_group)
{
  longlong lwm_estimate= estimate_lwm_timestamp();   /*返回last_lwm_timestamp*/
  /*last_committed为本次要分发的last_committed与主库binlog中的last_committed对应,lwm_estimate为GAQ队列里面标记为已经完成commit的并且前面所有事物都已经完成提交的seq_number*/
  /*当 a "<=" b 返回 true, 否者返回 false，也就是说 last_committed<=lwm_estimate可以并行分发，如果同一组last_committed肯定<lwm_estimate可以并行分发，如果是2组交界会出现last_committed>=lwm_estimate，如果last_committed>lwm_estimate，说明上一组事物还没有完全commit,此时不能并行分发，如果last_committed=lwm_estimate此时上一组最后一个事物已经完成了，可以并行分发*/
  /*rli->gaq->assigned_group_index为正在分发的group在GAQ中的位置，rli->gaq->entry为GAQ中的头，GAQ队列为空的时候这两个相等*/
  if (!clock_leq(last_committed, lwm_estimate) && rli->gaq->assigned_group_index != rli->gaq->entry)
  {
    if (wait_for_last_committed_trx(rli, last_committed, lwm_estimate))
    {
      DBUG_RETURN(-1);
    }
    if (gap_successor)
      last_lwm_timestamp= sequence_number - 1;
    DBUG_ASSERT(!clock_leq(sequence_number, estimate_lwm_timestamp()));
  }
  delegated_jobs++;
  DBUG_ASSERT(!force_new_group);
}

疑问1分析(这个bug一定会导致数据不一致吗)

一、测试case
1.搭建主从测试环境，开启并行复制(COMMIT_ORDER)
2.sysbench压测oltp_read_write.lua 24小时
3.观察并行复制是否生效
4.pt-checksum校验数据是否一致
二、测试结果
pt-checksum校验后发现数据并没有出现不一致，也就是说这个bug并不能百分百能出现，只有在特定场景下才能复现，例如下面疑问3疑问4人为造出的场景或者其他特殊场景下

疑问2分析(如何复现这个bug)

1.模拟主库组提交并打印组提交binlog
session1:
    mysql> SET DEBUG_SYNC = "bgc_after_enrolling_for_commit_stage SIGNAL insert1_ready WAIT_FOR continue_commit_stage";
    mysql> INSERT INTO t1 VALUES(1, 1);
session2:
    mysql> INSERT INTO t1 VALUES(2, 1);
session3:
    mysql> INSERT INTO t1 VALUES(3, 1);
session4:
    mysql> SET DEBUG_SYNC = "now SIGNAL continue_commit_stage";
#binglog输出日志（正常日志）:
    #200701 12:15:25 server id 1001  end_log_pos 259 CRC32 0x57f565f3   GTID    last_committed=0    sequence_number=1
    #200701 12:15:43 server id 1001  end_log_pos 517 CRC32 0x29626007   GTID    last_committed=0    sequence_number=2
    #200701 12:15:46 server id 1001  end_log_pos 775 CRC32 0xea48f246   GTID    last_committed=0    sequence_number=3

2.模拟bug中提出的组提交过程中实际写入binlog中的last_committed比应该写入binlog的last_committed小
session1:
    mysql> SET DEBUG_SYNC = "bgc_after_enrolling_for_commit_stage SIGNAL insert1_ready WAIT_FOR continue_commit_stage";
    mysql> INSERT INTO t1 VALUES(1, 1);
session2:
    mysql> SET DEBUG_SYNC = "now WAIT_FOR insert1_ready";
    mysql> SET DEBUG_SYNC = "reached_finish_commit WAIT_FOR insert2_finish";
    mysql> INSERT INTO t1 VALUES(2, 1);
session3:
    mysql> SET DEBUG_SYNC = "now SIGNAL continue_commit_stage";
    mysql> UPDATE t1 SET c2 = 2 WHERE c1 = 2;
    mysql> SET DEBUG_SYNC = "now SIGNAL insert2_finish";
#binlog输出日志（异常日志, last_committed应该是上一组最大sequence_number，应该是2而不应该是1）
    #200701 12:38:01 server id 1001  end_log_pos 259 CRC32 0x424f636d   GTID    last_committed=0    sequence_number=1
    #200701 12:38:32 server id 1001  end_log_pos 517 CRC32 0x75f98373   GTID    last_committed=0    sequence_number=2
    #200701 12:40:50 server id 1001  end_log_pos 775 CRC32 0x3a792138   GTID    last_committed=1    sequence_number=3

疑问3分析(如何用这个bug复现主从不一致)

一、思路
1.让主库实现组提交，并出现bug中提出的实际写入binlog中的last_committed比应该写入binlog的last_committed小
2.让从库实现并行分发 /*这块自己改代码,让worker干活慢点，保证并行分发，否则事物会一直分发给worker 0*/
3.让从库并行的worker满足：先分发给事物的worker晚于后分发事物的worker完成事物提交  /*这块改代码让前面的worker慢点干活，后面的worker正常执行，确保后来的worker先干完，事物分发是在前面做的，这里面人为调整worker快慢理论不影响正确性*/
二、验证
# 主库
-------------session1-------------
truncate table t1;
flush logs;
#快速执行两条语句让lwm_estimate变化，否则其初始值为0，下面的语句并发不起来
delete from emp limit 1;
delete from emp limit 1;
SET DEBUG_SYNC = "bgc_after_enrolling_for_commit_stage SIGNAL insert1_ready WAIT_FOR continue_commit_stage";
INSERT INTO t1 VALUES(1, 1);

-------------session2-------------
SET DEBUG_SYNC = "now WAIT_FOR insert1_ready";
SET DEBUG_SYNC = "reached_finish_commit WAIT_FOR insert2_finish";
INSERT INTO t1 VALUES(2, 1);

-------------session3-------------
SET DEBUG_SYNC = "now SIGNAL continue_commit_stage";
UPDATE t1 SET c2 = 2 WHERE c1 = 2;
SET DEBUG_SYNC = "now SIGNAL insert2_finish";

#主库binlog
    #200702 15:13:47 server id 1001  end_log_pos 259 CRC32 0xd219aa0c   GTID    last_committed=0    sequence_number=1
    #200702 15:13:47 server id 1001  end_log_pos 530 CRC32 0x2cd5ba1c   GTID    last_committed=1    sequence_number=2
    #200702 15:13:47 server id 1001  end_log_pos 801 CRC32 0xf592e3b9   GTID    last_committed=2    sequence_number=3
    #200702 15:13:53 server id 1001  end_log_pos 1059 CRC32 0xb1735f8d  GTID    last_committed=2    sequence_number=4
    #200702 15:13:57 server id 1001  end_log_pos 1317 CRC32 0xbea30d9c  GTID    last_committed=3    sequence_number=5

# 从库
mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 127.0.0.1
                  Master_User: replication
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000216
          Read_Master_Log_Pos: 1520
               Relay_Log_File: localhost-relay-bin.000569
                Relay_Log_Pos: 1207
        Relay_Master_Log_File: mysql-bin.000216
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 1032
                   Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 0 failed executing transaction 'e5a31538-b488-11ea-b6f5-0920853da06e:28169' at master log mysql-bin.000216, end_log_pos 1489. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 994
              Relay_Log_Space: 2031
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 1032
               Last_SQL_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 0 failed executing transaction 'e5a31538-b488-11ea-b6f5-0920853da06e:28169' at master log mysql-bin.000216, end_log_pos 1489. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1001
                  Master_UUID: e5a31538-b488-11ea-b6f5-0920853da06e
             Master_Info_File: /Users/gaochao/Desktop/mysql_souce_debug/multi/3316/data/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State:
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp: 200702 15:14:08
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set: e5a31538-b488-11ea-b6f5-0920853da06e:10-28169
            Executed_Gtid_Set: e5a31538-b488-11ea-b6f5-0920853da06e:1-28168
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name:
           Master_TLS_Version:
1 row in set (0.00 sec)

疑问4分析(有没有可能主从复制链路正常，但是数据已经不一致了)

# 结论：可能出现主从复制链路正常，但是数据已经不一致了，下面为复现步骤
# 主库操作及输出
-------------session1-------------
truncate table t1;
flush logs;
#快速执行两条语句让lwm_estimate变化(lwm_estimate在上一次出现无法并行的情况下才会变化)，否则其初始值为0一直小于等于last_committed，导致后续的测试语句无法并发而无法复现
INSERT INTO t1 VALUES(3, 1);
delete from emp limit 1;
SET DEBUG_SYNC = "bgc_after_enrolling_for_commit_stage SIGNAL insert1_ready WAIT_FOR continue_commit_stage";
INSERT INTO t1 VALUES(1, 1);
-------------session2-------------
SET DEBUG_SYNC = "now WAIT_FOR insert1_ready";
SET DEBUG_SYNC = "reached_finish_commit WAIT_FOR insert2_finish";
update t1 set c2=100 where c1=3;
-------------session3-------------
SET DEBUG_SYNC = "now SIGNAL continue_commit_stage";
update t1 set c2=200 where c1=3;
SET DEBUG_SYNC = "now SIGNAL insert2_finish";
# 主库binlog输出
    #200703 11:06:47 server id 1001  end_log_pos 259 CRC32 0x004fd38a   GTID    last_committed=0    sequence_number=1
    #200703 11:06:47 server id 1001  end_log_pos 517 CRC32 0x236a84cc   GTID    last_committed=1    sequence_number=2
    #200703 11:06:47 server id 1001  end_log_pos 788 CRC32 0x5ab78655   GTID    last_committed=2    sequence_number=3
    #200703 11:06:55 server id 1001  end_log_pos 1046 CRC32 0x78d7f95c  GTID    last_committed=2    sequence_number=4
    #200703 11:07:00 server id 1001  end_log_pos 1314 CRC32 0xf9ea7b41  GTID    last_committed=3    sequence_number=5
# 主库最后结果
mysql> select * from t1;
+----+------+
| c1 | c2   |
+----+------+
|  1 |    1 |
|  3 |  200 |
+----+------+
2 rows in set (0.00 sec)

# 从库操作及输出
mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 127.0.0.1
                  Master_User: replication
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000294
          Read_Master_Log_Pos: 1517
               Relay_Log_File: localhost-relay-bin.000057
                Relay_Log_Pos: 1730
        Relay_Master_Log_File: mysql-bin.000294
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 1517
              Relay_Log_Space: 2028
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1001
                  Master_UUID: e5a31538-b488-11ea-b6f5-0920853da06e
             Master_Info_File: /Users/gaochao/Desktop/mysql_souce_debug/multi/3316/data/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set: e5a31538-b488-11ea-b6f5-0920853da06e:28389-28501
            Executed_Gtid_Set: e5a31538-b488-11ea-b6f5-0920853da06e:1-28501
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name:
           Master_TLS_Version:
1 row in set (0.00 sec)
# 从库最后结果
mysql> select * from t1;
+----+------+
| c1 | c2   |
+----+------+
|  1 |    1 |
|  3 |  100 |
+----+------+
2 rows in set (0.00 sec)

疑问5分析(官方如何解决的这个bug)

# 有问题的代码(应该组内每个事物都应该调用update_max_committed)
for (THD *head= first ; head ; head = head->next_to_commit)
  {
#ifndef DBUG_OFF
    stage_manager.clear_preempt_status(head);
#endif
    if (thd->get_transaction()->sequence_number != SEQ_UNINIT)
      update_max_committed(head);  /*参见下方定义*/
    xxxxxxx
  }
# 修复后的代码
for (THD *head= first ; head ; head = head->next_to_commit)
  {
#ifndef DBUG_OFF
    stage_manager.clear_preempt_status(head);
#endif
    if (head->get_transaction()->sequence_number != SEQ_UNINIT)
      update_max_committed(head);   /*参见下方定义*/
    xxxxxxx
  }
# update_max_committed定义
void MYSQL_BIN_LOG::update_max_committed(THD *thd)
{
  Transaction_ctx *trn_ctx= thd->get_transaction();
  max_committed_transaction.set_if_greater(trn_ctx->sequence_number);
  trn_ctx->sequence_number= SEQ_UNINIT;
}

问题6分析(评估当前线上mysql5.7.17是否需要升级)

1.如果环境允许建议升级，虽然比较难出现这个bug，但是确实是一个潜在的风险。
2.在不影响业务的前提下pt-checksum校验当前环境主从是否存在不一致。

mysql5.7.17并行复制bug复现

mysql5.7.17并行复制bug复现

背景

疑问

课前知识（并行复制-COMMIT_ORDER）

疑问1分析(这个bug一定会导致数据不一致吗)

疑问2分析(如何复现这个bug)

疑问3分析(如何用这个bug复现主从不一致)

疑问4分析(有没有可能主从复制链路正常，但是数据已经不一致了)

疑问5分析(官方如何解决的这个bug)

问题6分析(评估当前线上mysql5.7.17是否需要升级)