MySQL5.7版本出来有很长时间了,之前也装了玩了下,大概了解了部分功能和特性,但没有系统的学习和测试其主要新增功能,最近也特意抽出时间想去多了解了解,学习学习5.7的主要新特性,这里主要是针对其最突出的功能之一MySQL Group Replication(MGR 组复制)学习下,针对其部署过程做个简要记录。
一、环境准备
名称
|
版本
|
备注
|
操作系统
|
RHEL6.5_X86_64
|
|
数据库
|
5.7.18-15
|
Percona二进制版本
|
复制节点
|
10.26.7.129
|
node1
|
|
10.26.7.142
|
node2
|
|
10.26.7.166
|
node3
|
二、MGR详细部署步骤
1、MYSQL5.7安装
本次学习实验采用的是Percona-Server-5.7.18-15-Linux.x86_64.ssl101二进制版本,具体安装过程略
2、节点1my.cnf参数配置(主写节点)
#replicate
server-id=1001
skip-slave-start = false
read-only = false
expire_logs_days = 2
max_binlog_size = 1G
max_binlog_cache_size = 2G
log-bin = /home/mysql/mysql-bin
log-bin-index = /home/mysql/bin-index
binlog_format = row
log-slave-updates = 1
sync_binlog = 1
log-slow-slave-statements = 1
max-relay-log-size = 1G
relay-log = /home/mysql/mysql-relay
relay-log-index = /home/mysql/relay-index
gtid_mode=ON
enforce_gtid_consistency=ON
master_info_repository=TABLE
relay_log_info_repository=TABLE
binlog_checksum=NONE
#group replication
transaction_write_set_extraction =XXHASH64
loose-group_replication_group_name ="2faa74b8-66bb-11e7-ab7e-b8ca3a6a61ec" #务必以uuid形式配置
loose-group_replication_start_on_boot =off
loose-group_replication_local_address ="10.26.7.129:24001" #不同节点配置不同节点本身的IP地址和端口,区分MYSQL自身的3306端口
loose-group_replication_group_seeds ="10.26.7.129:24001,10.26.7.142:24001,10.26.7.166:24001"
loose-group_replication_bootstrap_group =off
3、创建复制账户(主写节点)
set sql_log_bin=0;
create user rpl_user@'%';
grant replication slave on *.* to rpl_user@'%' identified by 'rpl_pass';
flush privileges;
set sql_log_bin=1;
change master to master_user='rpl_user',master_password='rpl_pass' for channel 'group_replication_recovery';
4、安装组复制插件并启动组复制(主写节点)
安装插件
install plugin group_replication soname 'group_replication.so';
检查插件是否正确安装
show plugins
+-----------------------------+----------+--------------------+----------------------+---------+
| Name | Status | Type | Library | License |
+-----------------------------+----------+--------------------+----------------------+---------+
| group_replication | ACTIVE | GROUP REPLICATION | group_replication.so | GPL |
+-----------------------------+----------+--------------------+----------------------+---------+
启动组复制
set global group_replication_bootstrap_group=ON;
start group_replication;
set global group_replication_bootstrap_group=OFF;
检查组复制成员及状态
select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | 2faa74b8-66bb-11e7-ab7e-b8ca3a6a61ec | node1 | 3306 | ONLINE |
5、添加组复制成员实例node2 和node3
***添加节点前,务必做DNS解析,如果没有配置DNS解析服务器,需要在每个节点配置hosts解析 /etc/hosts***
10.26.7.166 node3
10.26.7.142 node2
10.26.7.129 node1
***节点my.cnf参数文件server-id和loose-group_replication_local_address ="node2:24001需要分别更改"***
node2
set sql_log_bin=0;
create user rpl_user@'%';
grant replication slave on *.* to rpl_user@'%' identified by 'rpl_pass';
flush privileges;
set sql_log_bin=1;
change master to master_user='rpl_user',master_password='rpl_pass' for channel 'group_replication_recovery';
install plugin group_replication soname 'group_replication.so';
show plugins
set global group_replication_allow_local_disjoint_gtids_join=ON;
start group_replication;
node3同样执行上述命令
然后检查组复制信息
select * from performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| CHANNEL_NAME | MEMBER_ID | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
| group_replication_applier | 2faa74b8-66bb-11e7-ab7e-b8ca3a6a61ec | node1 | 3306 | ONLINE |
| group_replication_applier | 35e38786-66bb-11e7-bcc3-b8ca3a6a61a4 | node2 | 3306 | ONLINE |
| group_replication_applier | 3bbedb1e-66bb-11e7-8fc0-b8ca3a6a7c48 | node3 | 3306 | ONLINE |
+---------------------------+--------------------------------------+-------------+-------------+--------------+
测试组复制是否正常:
(root:localhost:Sat Jul 15 13:26:33 2017)[(none)]>create database dbtest;
Query OK, 1 row affected (0.01 sec)
(root:localhost:Sat Jul 15 13:26:40 2017)[(none)]>use dbtest;
Database changed
(root:localhost:Sat Jul 15 13:26:45 2017)[dbtest]>create table t1(id int primary key);
Query OK, 0 rows affected (0.01 sec)
(root:localhost:Sat Jul 15 13:26:54 2017)[dbtest]>insert into t1 values(1);
Query OK, 1 row affected (0.00 sec)
node2和node3执行查询
(root:localhost:Sat Jul 15 12:57:32 2017)[db01]>use dbtest;
Database changed
(root:localhost:Sat Jul 15 13:27:26 2017)[dbtest]>select * from t1;
+----+
| id |
+----+
| 1 |
+----+
1 row in set (0.00 sec)
三、错误问题以及汇总:
1、错误案例01
错误信息:2017-07-15T01:36:06.929941Z 4 [ERROR] Plugin group_replication reported: 'The group name 'group-replication-test' is not a valid UUID'
错误原因:loose-group_replication_group_name参数没有按照UUID格式指定,被认为设置该参数无效
解决方案:更改loose-group_replication_group_name参数值为,loose-group_replication_group_name ="2faa74b8-66bb-11e7-ab7e-b8ca3a6a61ec"
2、错误案例02
错误信息:
2017-07-15T01:29:27.271909Z 0 [Warning] unknown variable 'loose-group_replication_group_name=group-replication-test'
2017-07-15T01:29:27.271926Z 0 [Warning] unknown variable 'loose-group_replication_start_on_boot=off'
2017-07-15T01:29:27.271930Z 0 [Warning] unknown variable 'loose-group_replication_local_address=10.26.7.129:3306'
2017-07-15T01:29:27.271935Z 0 [Warning] unknown variable 'loose-group_replication_group_seeds=10.26.7.129:3306,10.26.7.142:3306,10.26.7.166:3306'
2017-07-15T01:29:27.271939Z 0 [Warning] unknown variable 'loose-group_replication_bootstrap_group=off'
错误原因:因为先设置了这些参数,而没有装group_replication插件,导致数据库实例无法识别这些参数
解决方案:安装group replication插件,install plugin group_replication soname 'group_replication.so'; (uninstall plugin group_replication 卸载,show plugins查看)
3、错误案例03
错误信息:
2017-07-15T01:54:54.447829Z 0 [Note] Plugin group_replication reported: 'Unable to bind to 0.0.0.0:3306 (socket=60, errno=98)!'
2017-07-15T01:54:54.447948Z 0 [ERROR] Plugin group_replication reported: 'Unable to announce tcp port 3306. Port already in use?'
2017-07-15T01:54:54.448101Z 0 [ERROR] Plugin group_replication reported: '[GCS] Error joining the group while waiting for the network layer to become ready.'
错误原因:配置的组复制监听端口和MYSQL实例端口冲突
解决方案:调整下面参数
loose-group_replication_local_address
="10.26.7.129:24001"
#不同节点配置不同节点本身的IP地址和端口,区分MYSQL自身的3306端口
loose-group_replication_group_seeds ="10.26.7.129:24001,10.26.7.142:24001,10.26.7.166:24001"
4、错误案例04
错误信息:
2017-07-15T04:20:01.249529Z 21 [ERROR] Slave I/O for channel 'group_replication_recovery': error connecting to master 'rpl_user@node2:3306' - retry-time: 60 retries: 1, Error_code: 2005
错误原因:没有配置DNS解析或者hosts解析,节点无法连接其他数据库
解决方案:配置hosts解析,每个节点/etc/hosts添加如下内容
10.26.7.166 node3
10.26.7.142 node2
10.26.7.129 node1
5、错误案例05
错误信息
2017-07-15T03:42:45.395407Z 288 [ERROR] Slave SQL for channel 'group_replication_recovery': Error 'Can't create database 'db01'; database exists' on query. Default database: 'db01'. Query: 'create database db01', Error_code: 1007
2017-07-15T03:42:45.395472Z 288 [Warning] Slave: Can't create database 'db01'; database exists Error_code: 1007
2017-07-15T03:42:45.395503Z 288 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000005' position 434
错误原因:这个错误是由于节点由于各种原因退出mgr组,后面又加入了mgr组,但之前存在的数据依旧存在
解决方案:删除要加入组复制节点存在的数据库即可,但其他节点不是主写节点,需要先调整参数set global super_read_only=0;然后执行drop database db01;再重新加入组
set global group_replication_allow_local_disjoint_gtids_join=ON;
start group_replication;
6、错误案例06
错误信息:
2017-07-15T03:44:09.982428Z 18 [ERROR] Slave SQL for channel 'group_replication_recovery': Error 'Can't create database 'db01'; database exists' on query. Default database: 'db01'. Query: 'create database db01', Error_code: 1007
2017-07-15T03:44:09.982493Z 18 [Warning] Slave: Can't create database 'db01'; database exists Error_code: 1007
2017-07-15T03:44:09.982522Z 18 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000005' position 434
错误原因:同上错误案例05
解决方案:同上错误案例05
7、错误案例07
错误信息:
2017-07-15T03:49:10.370846Z 0 [ERROR] Plugin group_replication reported: 'This member has more executed transactions than those present in the group. Local transactions: 2faa74b8-66bb-11e7-ab7e-b8ca3a6a61ec:1-4,35e38786-66bb-11e7-bcc3-b8ca3a6a61a4:1 > Group transactions: 2faa74b8-66bb-11e7-ab7e-b8ca3a6a61ec:1-16'
错误原因:同上错误案例05,在从库执行了多余的事务
解决方案:同上错误案例05,直接重新加入节点即可
set global group_replication_allow_local_disjoint_gtids_join=ON;
start group_replication;
8、错误案例08
错误信息
ERROR 3098 (HY000): The table does not comply with the requirements by an external plugin.
错误原因:由于主节点创建了表t1,但没有指定主键(此时表结构可以复制到各节点,一旦插入数据DML操作即会报错)
解决方案:为表增加主键,然后做DML操作(MGR需要各表都有主键)
alter table t1 add primary key(id);
insert into t1 values(1),(2);
9、错误案例09
错误信息:
mysqldump -R -E --triggers --single-transaction --master-data=2 -B db01 >db01.sql
Warning: A partial dump from a server that has GTIDs will by default include the GTIDs of all transactions, even those that changed suppressed parts of the database. If you don't want to restore GTIDs, pass --set-gtid-purged=OFF. To make a complete dump, pass --all-databases --triggers --routines --events.
mysqldump: Couldn't execute 'SAVEPOINT sp': The MySQL server is running with the --transaction-write-set-extraction!=OFF option so it cannot execute this statement (1290)
错误原因:mgr不支持mysqldump的事务一致性备份,因为其不支持savepoint
解决方案:通过xtrabackup或者不加--single-transaction备份
10、错误案例10
错误信息:
create table t2 as select * from t1;
ERROR 1786 (HY000): Statement violates GTID consistency: CREATE TABLE ... SELECT.
错误原因:配置MGR,开启了GTID,所有GTID不支持的操作,MGR也不支持
解决方案:使用create table t2 like t1; insert into t2 select * from t;分开两个事务执行