【MySQL】部分5.6版本罕见复制报错 ERROR 1837
场景:
1、简单的一主一从,版本MySQL-5.6.20
2、master_auto_position=0
3、开启gtid
报错如下:
Last_SQL_Errno: 1837
Last_SQL_Error: Error 'When @@SESSION.GTID_NEXT is set to a GTID, you must explicitly set it to a different value after a COMMIT or ROLLBACK. Please check GTID_NEXT variable manual page for detailed explanation. Current @@SESSION.GTID_NEXT is 'c44bd915-440d-11e6-8ea0-6c92bf24b8c0:71844624'.' on query. Default database: '$db'. Query: 'DELETE FROM `db2`.`tb2`'
看完报错一脸懵逼,莫非主库在做什么骚操作?
检查一下主库binlog对应的GTID点,可以发现点什么:
(已做数据脱敏,如上两张表分别用db1.tb1和db2.tb2来区分)
发现到GTID为【c44bd915-440d-11e6-8ea0-6c92bf24b8c0:71844624】和【71844625】之间做了如下操作:
-
USE db3;
-
DELETE FROM `db1`.`tb1`;
-
DELETE FROM `db2`.`tb2`;
DELETE FROM `db_1`.`t1`与DELETE FROM `db_2`.`t2`之间并没有更多的:SET @@SESSION.GTID_NEXT。
这似乎就违反了GTID的限制,一个事务应该对应一个GTID号才对。
结合报错信息,怀疑此时在执行到第二个DELETE时,因为第二个DELETE没有对应的GTID_NEXT,就报错了:
-
Last_SQL_Errno: 1837 …… Default database: '$db'. Query: 'DELETE FROM `db2`.`tb2`'
再检查一下存储引擎,发现db1.tb1和db2.tb2这两张表均为memory,即为非事务引擎。
可能与这个有关。
此处,为了修复这个复制故障,在从库上做如下操作:
-
〇 SET SESSION sql_log_bin=0;
-
〇 手动执行未执行的事务,此处为:DELETE FROM `db2`.`tb2`;
-
〇 SET SESSION sql_log_bin=1;
-
〇 STOP SLAVE sql_thread; SET @@SESSION.GTID_NEXT= 'AUTOMATIC'; START SLAVE sql_thread;
至于为什么在ENFORCE_GTID_CONSISTENCY为ON的情况下,产生这样违反GTID的events,我搜了一下bug库:
更多讨论如下:
https://bugs.mysql.com/bug.php?id=71695
该问题发生在5.6.20及以前的5.6版本。
并在5.6.21以后的版本修复了这个问题。
在文档中找到:
-
Replication: When mysqlbinlog processed multiple binary log files into a single output file, this file was not in a useful
-
state for point-in-time recovery, when it failed with the error, When @@SESSION.GTID_NEXT is set to a GTID, you must
-
explicitly set it to a different value after a COMMIT or ROLLBACK. Please check GTID_NEXT variable manual page for
-
detailed explanation. Current @@SESSION.GTID_NEXT is 'xyz'. When mysqlbinlog processes a binary log containing GTIDs,
-
it outputs SET gtid_next statements, but gtid_next is set to undefined whenever a commit occurs; this left gtid_next
-
undefined when the server had finished processing the output from mysqlbinlog. When the next binary log file started
-
with one or more anonymous statements or transactions, the combination of gtid_next being left undefined at the end
-
of the first binary log and the second binary log containing anonymous transactions to the error described previously
-
(Error 1837, ER_GTID_NEXT_TYPE_UNDEFINED_GROUP).
-
-
To fix this issue, now, whenever mysqlbinlog encounters this situation, it inserts SET gtid_next = AUTOMATIC
-
if required to avoid leaving the previous binary log with gtid_next undefined.
-
-
In addition, as a result of this fix, mysqlbinlog no longer outputs session variable information for every binary log;
-
now, this value is printed only once unless it changes. (Bug #18258933, Bug #71695)
大致原因是:
当mysqlbinlog处理包含GTID的binlog时,它会输出gtid_next,但是当提交时,gtid_next会被设置为“undefined”。
当服务器处理完来自mysqlbinlog的输出后,就留下了binlog undefined。
简单的来说:
因为gtid_next可能会被设置为undefined,导致复制出现1837。
为了修复这个问题,在MySQL5.6.21版本中,做出了如下修复:
每当mysqlbinlog遇到这种情况,会自动加入如下语句:
-
“SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;”
以避免使用gtid_next保留之前binlog undefined。
(这个可以在开启GTID时,轻易测试得出)
后来又搜了一下,在使用INSERT DELAYED语法时,也可能出现这个问题,更多可以参考2014年9月的淘宝内核月报:
http://blog.csdn.net/longxibendi/article/details/39297025
(在http://mysql.taobao.org/monthly/上已经找不到2014年9月的月报了
不知道阿里的同学是不是偷偷把之前的藏起来了。)
虽然文档描述好像和这个case不太像,但总之也是有收获的:
〇 升级到更高版本的MySQL。
〇 尽量使用事务引擎,避免在一个事务中同时操作事务表和非事务表的可能性。
(虽然enforce_gtid_consistency开启,但也有可能出现突破GTID限制的语句,尽量从业务上限制)
〇 尽量避免使用INSERT DELAYED语法。
参考:
https://dev.mysql.com/doc/relnotes/mysql/5.6/en/news-5-6-21.html
话说某云计算厂商控制台上创建的MySQL 5.6就是5.6.20……
正文到此结束