We were working on setting up Percona XtraDB (Galera) Cluster following the Documentation for version 5.6+. This blog post is a use-case of a debugging / troubleshooting & fixing different SSTÂ issues we faced while bringing-up the cluster nodes.
WSREP: Failed to read uuid:seqno from joiner script. 2015-01-15 17:33:19 24618
WSREP: SST failed: 32 (Broken pipe)
We already have a successfully bootstrapped node-1 and now we’re adding second node (node-2) which was causing following error.
"2015-01-15 17:33:19 24618 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup --role 'joiner' --address '10.10.10.1' --auth 'sstuser:xxx' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --parent '24618' '' : 32 (Broken pipe) 2015-01-15 17:33:19 24618 [ERROR] WSREP: Failed to read uuid:seqno from joiner script. 2015-01-15 17:33:19 24618 [ERROR] WSREP: SST failed: 32 (Broken pipe) 2015-01-15 17:33:19 24618 [ERROR] Aborting"
Verified that the SST user has proper privileges and concluded to check on the donor side.
Checked “innobackup.backup.log” file on master which logs the xtrabackup output during SST:
[root@node-1 mysql]# tail -5 innobackup.backup.log 150115 17:33:19 innobackupex: Connecting to MySQL server with DSN 'dbi:mysql:;mysql_read_default_file=/etc/my.cnf;mysql_read_default_group=xtrabackup;mysql_socket=/var/lib/mysql/mysql.sock' as 'sstuser' (using password: YES). innobackupex: got a fatal error with the following stacktrace: at /usr//bin/innobackupex line 2990 main::mysql_connect('abort_on_error', 1) called at /usr//bin/innobackupex line 1530 innobackupex: Error: Failed to connect to MySQL server as DBD::mysql module is not installed at /usr//bin/innobackupex line 2990.
As the above log, innobackup.backup.log, conveys it’s missing DBD module causing the error. Checked to install perl’s DBD module for MySQL through yum as follows:
[root@node-1 mysql]# yum install perl-DBD-MySQL Loaded plugins: fastestmirror Loading mirror speeds from cached hostfile * base: mirror.netaddicted.ca * epel: mirror.csclub.uwaterloo.ca * extras: mirror.netaddicted.ca * updates: mirror.netaddicted.ca Setting up Install Process Package perl-DBD-MySQL-4.013-3.el6.x86_64 already installed and latest version Nothing to do
Well this resulted that we already have the DBD installed! Tried running the backup command manually and received the same error described above:
[root@node-1 mysql]# innobackupex --user=sstuser --password=xxxx --defaults-file=/etc/my.cnf --slave-info --tmpdir=/tmp --stream=tar ./ 2> /tmp/xtrabackup.log |gzip > backup.tar.gz [root@node-1 mysql]# vi /tmp/xtrabackup.log
Attempted checking mysql version using following command, pointed missing dependencies for DBD:
[root@node-1 mysql]# perl -MDBD::mysql -e 'print $DBD::mysql::VERSION' Can't load '/usr/lib64/perl5/auto/DBD/mysql/mysql.so' for module DBD::mysql: libmysqlclient.so.16: cannot open shared object file: No such file or directory at /usr/lib64/perl5/DynaLoader.pm line 200. at -e line 0 Compilation failed in require. BEGIN failed--compilation aborted.
Now as it’s clear about the missing dependencies, tried looking, identifying & fixing the dependencies:
[root@node-1 mysql]# locate libmysqlclient.so.16 [root@node-1 mysql]# yum list installed| grep perl-DBD-MySQL perl-DBD-MySQL.x86_64 4.013-3.el6 @base [root@node-1 mysql]# yum deplist perl-DBD-MySQL.x86_64|grep mysql dependency: libmysqlclient.so.16()(64bit) provider: mysql-libs.x86_64 5.1.73-3.el6_5 dependency: libmysqlclient.so.16(libmysqlclient_16)(64bit) provider: mysql-libs.x86_64 5.1.73-3.el6_5
Installed missing dependencies:
[root@node-1 mysql]# yum install mysql-libs.x86_64 Loaded plugins: fastestmirror Loading mirror speeds from cached hostfile * base: mirror.netaddicted.ca * epel: mirror.csclub.uwaterloo.ca * extras: mirror.netaddicted.ca * updates: mirror.netaddicted.ca Setting up Install Process Package mysql-libs is obsoleted by Percona-Server-shared-51, trying to install Percona-Server-shared-51-5.1.73-rel14.12.624.rhel6.x86_64 instead Resolving Dependencies --> Running transaction check ---> Package Percona-Server-shared-51.x86_64 0:5.1.73-rel14.12.624.rhel6 will be installed --> Finished Dependency Resolution Dependencies Resolved ======================================================================================================================================================================== Package Arch Version Repository Size ======================================================================================================================================================================== Installing: Percona-Server-shared-51 x86_64 5.1.73-rel14.12.624.rhel6 percona-release-x86_64 2.1 M Transaction Summary ======================================================================================================================================================================== Install 1 Package(s) Total download size: 2.1 M Installed size: 5.9 M Is this ok [y/N]: y Downloading Packages: Percona-Server-shared-51-5.1.73-rel14.12.624.rhel6.x86_64.rpm | 2.1 MB 00:00 Running rpm_check_debug Running Transaction Test Transaction Test Succeeded Running Transaction Installing : Percona-Server-shared-51-5.1.73-rel14.12.624.rhel6.x86_64 1/1 Verifying : Percona-Server-shared-51-5.1.73-rel14.12.624.rhel6.x86_64 1/1 Installed: Percona-Server-shared-51.x86_64 0:5.1.73-rel14.12.624.rhel6 Complete!
Now as the perl-DBD-MySQL dependency is fixed, ran the backup manually on node-1 and it completed successfully.
[root@node-1 mysql]# innobackupex --user=sstuser --password=xxx --defaults-file=/etc/my.cnf --slave-info --tmpdir=/tmp --stream=tar ./ 2> /tmp/xtrabackup.log |gzip > backup.tar.gz
I hoped this should resolve the issue but…
On node-1, cleared datadir and attempted restarting the node again failed complaining as follows:
2016-15 18:54:14 25385 [Note] WSREP: Requesting state transfer: success, donor: 0 2015-01-15 18:54:16 25385 [Note] WSREP: (ce9728c5, 'tcp://0.0.0.0:4567') turning message relay requesting off 2015-01-15 18:54:19 25385 [Note] WSREP: 0.0 (node-1.com): State transfer to 1.0 (node-2.com) complete. 2015-01-15 18:54:19 25385 [Note] WSREP: Member 0.0 (node-1.com) synced with group. WSREP_SST: [ERROR] xtrabackup process ended without creating '/var/lib/mysql//xtrabackup_galera_info' (20150115 18:54:19.507) ... ... WSREP_SST: [ERROR] Cleanup after exit with status:32 (20150115 18:54:19.543) WSREP_SST: [INFO] Removing the sst_in_progress file (20150115 18:54:19.548) 2015-01-15 18:54:19 25385 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup --role 'joiner' --address '10.10.10.1' --auth 'sstuser:s3cret' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --parent '25385' '' : 32 (Broken pipe) 2015-01-15 18:54:19 25385 [ERROR] WSREP: Failed to read uuid:seqno from joiner script. 2015-01-15 18:54:19 25385 [ERROR] WSREP: SST failed: 32 (Broken pipe)
Looked further in details of configuration and found that the xtrabackup_sst_method was different on node-1 vs on node-2:
Node-1:
mysql> show global variables like '%sst%'; +---------------------------------+---------------+ | Variable_name | Value | +---------------------------------+---------------+ | wsrep_sst_auth | xxxxxxxx | | wsrep_sst_donor | | | wsrep_sst_donor_rejects_queries | OFF | | wsrep_sst_method | xtrabackup-v2 | | wsrep_sst_receive_address | AUTO | +---------------------------------+---------------+
Node-2:
$] grep wsrep_sst_method /etc/my.cnf wsrep_sst_method=xtrabackup
The wsrep_sst_method variable sets up the method for taking the State Snapshot Transfer (SST). Correced wsrep_sst_method=xtrabackup-v2 and the node came up clean.
From the docs:
The xtrabackup-v2 wsrep_sst_method is same as xtrabackup SST except that it uses newer protocol, hence is not compatible. This is the recommended option for PXC 5.5.34 and above.
There are other SST methods like rsync & mysqldump depending on the method you prefer. (Check references)
We progressed further but while adding a third node (node-3) we faced another issue and it did not join the cluster. We again checked the donor’s innobackup.backup.log file to trace the issue:
[root@node-1 mysql]# vi innobackup.backup.log ... ... innobackupex: Using server version 5.6.21-70.1-56-log innobackupex: Created backup directory /tmp/tmp.fFyzHDRH1f ^Gxbstream: Error writing file 'UNOPENED' (Errcode: 32 - Broken pipe) innobackupex: 'xbstream -c' returned with exit code 1. innobackupex: got a fatal error with the following stacktrace: at /usr//bin/innobackupex line 4865 main::backup_file_via_stream('/tmp/tmp.HE5T8DeArs', 'backup-my.cnf') called at /usr//bin/innobackupex line 4914 main::backup_file('/tmp/tmp.HE5T8DeArs', 'backup-my.cnf', '/tmp/tmp.HE5T8DeArs/backup-my.cnf') called at /usr//bin/innobackupex line 4938 main::write_to_backup_file('/tmp/tmp.HE5T8DeArs/backup-my.cnf', '# This MySQL options file was generated by innobackupex.\x{a}\x{a}# T...') called at /usr//bin/innobackupex line 3746 main::write_backup_config_file('/tmp/tmp.HE5T8DeArs/backup-my.cnf') called at /usr//bin/innobackupex line 3673 main::init() called at /usr//bin/innobackupex line 1557 innobackupex: Error: Failed to stream '/tmp/tmp.HE5T8DeArs/backup-my.cnf': 1 at /usr//bin/innobackupex line 4865.
This time we have confirmed all the issues we faced earlier are answered but still the SST was not progressing.
Running the xtrabackup command manually doesn’t cause any error and backup finished correctly:
[root@node-1 mysql]# innobackupex --user=root --password=$pw --defaults-file=/etc/my.cnf --slave-info --tmpdir=/tmp --stream=tar ./ 2> /tmp/xtrabackup.log |gzip > backup.tar.gz [root@node-1 mysql]# vi /tmp/xtrabackup.log
As the manual backup goes well it hints that the issue was somewhere on new node though innobackup.backup.log expressed the errors.
Rechecked the configuration file here and noted an important difference. On node-1 we had following innodb settings while on node-3 we did not:
innodb_log_file_size=48M innodb_data_file_path=ibdata1:12M:autoextend innodb_buffer_pool_size=400M
Here, difference in values of innodb_log_file_size & innodb_data_file_path caused the xtrabackup restore fail.
innodb_log_file_size is pretty much the default value but innodb_data_file_path has default as “ibdata1:10M:autoextend” which was a differentiating factor causing the (restore) SST fail. Adding these parameters in my.cnf on node-3 fixed the issue and node joined the cluster.
So, here we faced three issues causing the State Snapshot Transfer to fail and hence stopping the nodes to join the cluster:
1. The perl-DBD-MySQL was missing dependencies and that we could identify following the logs on node-1.
2. The configuration for xtrabackup_sst_method on node-2 was incorrectly set than what was recommended (i.e. xtrabackup-v2).
3. Differentiating InnoDB related configuration variables can cause SST to fail.
Correcting them made the Galera happy…
Hope above troubleshooting steps are helpful. Recently we saw Percona’s blog conveying that we can have multiple reasons for an SST to fail; this post practically agree to that :).
References:
- http://www.percona.com/doc/percona-xtradb-cluster/5.5/wsrep-system-index.html#wsrep_sst_method
- http://www.percona.com/doc/percona-xtradb-cluster/5.6/manual/state_snapshot_transfer.html
Also see:
- http://www.percona.com/blog/2014/12/30/diagnosing-sst-errors-with-percona-xtradb-cluster-for-mysql/
2 comments
Another important one – in case the server has multiple IPs and wsrep_sst_receive_address is set to AUTO it may use the wrong one. For instance in a test configuration with Vagrant / OVM you may have two IP address, one internal on the 10.x.x.x range and other on 192.x.x.x and PXC may use the 10.x.x.x rather than the desired 192.x.x.x address.
In this case SST will fail with a message like:
150921 14:09:40 [ERROR] WSREP: gcs/src/gcs_group.cpp:long int gcs_group_handle_join_msg(gcs_group_t*, const gcs_recv_msg_t*)():717: Will never receive state. Need to abort.
Thanks for the update Nik.