21 Jan

Debugging Percona Xtradb (Galera) Cluster node startup / SST errors

We were working on setting up Percona XtraDB (Galera) Cluster following the Documentation for version 5.6+. This blog post is a use-case of a debugging / troubleshooting & fixing different SST issues we faced while bringing-up the cluster nodes.

We already have a successfully bootstrapped node-1 and now we’re adding second node (node-2) which was causing following error.

"2015-01-15 17:33:19 24618 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup --role 'joiner' --address '10.10.10.1' --auth 'sstuser:xxx' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --parent '24618'  '' : 32 (Broken pipe)
2015-01-15 17:33:19 24618 [ERROR] WSREP: Failed to read uuid:seqno from joiner script.
2015-01-15 17:33:19 24618 [ERROR] WSREP: SST failed: 32 (Broken pipe)
2015-01-15 17:33:19 24618 [ERROR] Aborting"

Verified that the SST user has proper privileges and concluded to check on the donor side.
Checked “innobackup.backup.log” file on master which logs the xtrabackup output during SST:

[root@node-1 mysql]# tail -5 innobackup.backup.log

150115 17:33:19  innobackupex: Connecting to MySQL server with DSN 'dbi:mysql:;mysql_read_default_file=/etc/my.cnf;mysql_read_default_group=xtrabackup;mysql_socket=/var/lib/mysql/mysql.sock' as 'sstuser'  (using password: YES).
innobackupex: got a fatal error with the following stacktrace: at /usr//bin/innobackupex line 2990
        main::mysql_connect('abort_on_error', 1) called at /usr//bin/innobackupex line 1530
innobackupex: Error: Failed to connect to MySQL server as DBD::mysql module is not installed at /usr//bin/innobackupex line 2990.

As the above log, innobackup.backup.log, conveys it’s missing DBD module causing the error. Checked to install perl’s DBD module for MySQL through yum as follows:

[root@node-1 mysql]# yum install perl-DBD-MySQL
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirror.netaddicted.ca
 * epel: mirror.csclub.uwaterloo.ca
 * extras: mirror.netaddicted.ca
 * updates: mirror.netaddicted.ca
Setting up Install Process
Package perl-DBD-MySQL-4.013-3.el6.x86_64 already installed and latest version
Nothing to do

Well this resulted that we already have the DBD installed! Tried running the backup command manually and received the same error described above:

[root@node-1 mysql]# innobackupex --user=sstuser --password=xxxx --defaults-file=/etc/my.cnf --slave-info  --tmpdir=/tmp --stream=tar ./   2> /tmp/xtrabackup.log  |gzip > backup.tar.gz
[root@node-1 mysql]# vi /tmp/xtrabackup.log

Attempted checking mysql version using following command, pointed missing dependencies for DBD:

[root@node-1 mysql]#  perl -MDBD::mysql -e 'print $DBD::mysql::VERSION'
Can't load '/usr/lib64/perl5/auto/DBD/mysql/mysql.so' for module DBD::mysql: libmysqlclient.so.16: cannot open shared object file: No such file or directory at /usr/lib64/perl5/DynaLoader.pm line 200.
 at -e line 0
Compilation failed in require.
BEGIN failed--compilation aborted.

Now as it’s clear about the missing dependencies, tried looking, identifying & fixing the dependencies:

[root@node-1 mysql]# locate libmysqlclient.so.16
[root@node-1 mysql]# yum list installed| grep perl-DBD-MySQL
perl-DBD-MySQL.x86_64  4.013-3.el6      @base
[root@node-1 mysql]# yum deplist perl-DBD-MySQL.x86_64|grep mysql
  dependency: libmysqlclient.so.16()(64bit)
   provider: mysql-libs.x86_64 5.1.73-3.el6_5
  dependency: libmysqlclient.so.16(libmysqlclient_16)(64bit)
   provider: mysql-libs.x86_64 5.1.73-3.el6_5

Installed missing dependencies:

[root@node-1 mysql]# yum install mysql-libs.x86_64
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirror.netaddicted.ca
 * epel: mirror.csclub.uwaterloo.ca
 * extras: mirror.netaddicted.ca
 * updates: mirror.netaddicted.ca
Setting up Install Process
Package mysql-libs is obsoleted by Percona-Server-shared-51, trying to install Percona-Server-shared-51-5.1.73-rel14.12.624.rhel6.x86_64 instead
Resolving Dependencies
--> Running transaction check
---> Package Percona-Server-shared-51.x86_64 0:5.1.73-rel14.12.624.rhel6 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

========================================================================================================================================================================
 Package                                      Arch                       Version                                       Repository                                  Size
========================================================================================================================================================================
Installing:
 Percona-Server-shared-51                     x86_64                     5.1.73-rel14.12.624.rhel6                     percona-release-x86_64                     2.1 M

Transaction Summary
========================================================================================================================================================================
Install       1 Package(s)

Total download size: 2.1 M
Installed size: 5.9 M
Is this ok [y/N]: y
Downloading Packages:
Percona-Server-shared-51-5.1.73-rel14.12.624.rhel6.x86_64.rpm                                                                                    | 2.1 MB     00:00
Running rpm_check_debug
Running Transaction Test
Transaction Test Succeeded
Running Transaction
  Installing : Percona-Server-shared-51-5.1.73-rel14.12.624.rhel6.x86_64                                                                                            1/1
  Verifying  : Percona-Server-shared-51-5.1.73-rel14.12.624.rhel6.x86_64                                                                                            1/1

Installed:
  Percona-Server-shared-51.x86_64 0:5.1.73-rel14.12.624.rhel6

Complete!

Now as the perl-DBD-MySQL dependency is fixed, ran the backup manually on node-1 and it completed successfully.

[root@node-1 mysql]# innobackupex --user=sstuser --password=xxx --defaults-file=/etc/my.cnf --slave-info  --tmpdir=/tmp --stream=tar ./   2> /tmp/xtrabackup.log  |gzip > backup.tar.gz

I hoped this should resolve the issue but…
On node-1, cleared datadir and attempted restarting the node again failed complaining as follows:

2016-15 18:54:14 25385 [Note] WSREP: Requesting state transfer: success, donor: 0
2015-01-15 18:54:16 25385 [Note] WSREP: (ce9728c5, 'tcp://0.0.0.0:4567') turning message relay requesting off
2015-01-15 18:54:19 25385 [Note] WSREP: 0.0 (node-1.com): State transfer to 1.0 (node-2.com) complete.
2015-01-15 18:54:19 25385 [Note] WSREP: Member 0.0 (node-1.com) synced with group.
WSREP_SST: [ERROR] xtrabackup process ended without creating '/var/lib/mysql//xtrabackup_galera_info' (20150115 18:54:19.507)
...
...
WSREP_SST: [ERROR] Cleanup after exit with status:32 (20150115 18:54:19.543)
WSREP_SST: [INFO] Removing the sst_in_progress file (20150115 18:54:19.548)
2015-01-15 18:54:19 25385 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup --role 'joiner' --address '10.10.10.1' --auth 'sstuser:s3cret' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --parent '25385'  '' : 32 (Broken pipe)
2015-01-15 18:54:19 25385 [ERROR] WSREP: Failed to read uuid:seqno from joiner script.
2015-01-15 18:54:19 25385 [ERROR] WSREP: SST failed: 32 (Broken pipe)

Looked further in details of configuration and found that the xtrabackup_sst_method was different on node-1 vs on node-2:

Node-1:

mysql> show global variables like '%sst%';
+---------------------------------+---------------+
| Variable_name                   | Value         |
+---------------------------------+---------------+
| wsrep_sst_auth                  | xxxxxxxx      |
| wsrep_sst_donor                 |               |
| wsrep_sst_donor_rejects_queries | OFF           |
| wsrep_sst_method                | xtrabackup-v2 |
| wsrep_sst_receive_address       | AUTO          |
+---------------------------------+---------------+

Node-2:

$] grep wsrep_sst_method /etc/my.cnf
wsrep_sst_method=xtrabackup

The wsrep_sst_method variable sets up the method for taking the State Snapshot Transfer (SST). Correced wsrep_sst_method=xtrabackup-v2 and the node came up clean.

From the docs:

The xtrabackup-v2 wsrep_sst_method is same as xtrabackup SST except that it uses newer protocol, hence is not compatible. This is the recommended option for PXC 5.5.34 and above.

There are other SST methods like rsync & mysqldump depending on the method you prefer. (Check references)

We progressed further but while adding a third node (node-3) we faced another issue and it did not join the cluster. We again checked the donor’s innobackup.backup.log file to trace the issue:

[root@node-1 mysql]# vi innobackup.backup.log
...
...
innobackupex:  Using server version 5.6.21-70.1-56-log

innobackupex: Created backup directory /tmp/tmp.fFyzHDRH1f
^Gxbstream: Error writing file 'UNOPENED' (Errcode: 32 - Broken pipe)
innobackupex: 'xbstream -c' returned with exit code 1.
innobackupex: got a fatal error with the following stacktrace: at /usr//bin/innobackupex line 4865
        main::backup_file_via_stream('/tmp/tmp.HE5T8DeArs', 'backup-my.cnf') called at /usr//bin/innobackupex line 4914
        main::backup_file('/tmp/tmp.HE5T8DeArs', 'backup-my.cnf', '/tmp/tmp.HE5T8DeArs/backup-my.cnf') called at /usr//bin/innobackupex line 4938
        main::write_to_backup_file('/tmp/tmp.HE5T8DeArs/backup-my.cnf', '# This MySQL options file was generated by innobackupex.\x{a}\x{a}# T...') called at /usr//bin/innobackupex line 3746
        main::write_backup_config_file('/tmp/tmp.HE5T8DeArs/backup-my.cnf') called at /usr//bin/innobackupex line 3673
        main::init() called at /usr//bin/innobackupex line 1557
innobackupex: Error: Failed to stream '/tmp/tmp.HE5T8DeArs/backup-my.cnf': 1 at /usr//bin/innobackupex line 4865.

This time we have confirmed all the issues we faced earlier are answered but still the SST was not progressing.
Running the xtrabackup command manually doesn’t cause any error and backup finished correctly:

[root@node-1 mysql]# innobackupex --user=root --password=$pw --defaults-file=/etc/my.cnf --slave-info  --tmpdir=/tmp --stream=tar ./   2> /tmp/xtrabackup.log  |gzip > backup.tar.gz
[root@node-1 mysql]# vi /tmp/xtrabackup.log

As the manual backup goes well it hints that the issue was somewhere on new node though innobackup.backup.log expressed the errors.

Rechecked the configuration file here and noted an important difference. On node-1 we had following innodb settings while on node-3 we did not:

innodb_log_file_size=48M
innodb_data_file_path=ibdata1:12M:autoextend
innodb_buffer_pool_size=400M

Here, difference in values of innodb_log_file_size & innodb_data_file_path caused the xtrabackup restore fail.
innodb_log_file_size is pretty much the default value but innodb_data_file_path has default as “ibdata1:10M:autoextend” which was a differentiating factor causing the (restore) SST fail. Adding these parameters in my.cnf on node-3 fixed the issue and node joined the cluster.

So, here we faced three issues causing the State Snapshot Transfer to fail and hence stopping the nodes to join the cluster:
1. The perl-DBD-MySQL was missing dependencies and that we could identify following the logs on node-1.
2. The configuration for xtrabackup_sst_method on node-2 was incorrectly set than what was recommended (i.e. xtrabackup-v2).
3. Differentiating InnoDB related configuration variables can cause SST to fail.

Correcting them made the Galera happy…
Hope above troubleshooting steps are helpful. Recently we saw Percona’s blog conveying that we can have multiple reasons for an SST to fail; this post practically agree to that :).

References:

  • http://www.percona.com/doc/percona-xtradb-cluster/5.5/wsrep-system-index.html#wsrep_sst_method
  • http://www.percona.com/doc/percona-xtradb-cluster/5.6/manual/state_snapshot_transfer.html

Also see:

  • http://www.percona.com/blog/2014/12/30/diagnosing-sst-errors-with-percona-xtradb-cluster-for-mysql/

31 Dec

MariaDB slave restore using GTID & xtrabackup bug

Recently I happen to work for a MySQL database restore, a pretty exciting task for DBA :P

Well, backup server was already configured with Holland (backup framework) & Xtrabackup (Percona’s backup tool) which made our lives easier with steps to restore.

Extracted backup command from backup server’s holland.log:

/usr/bin/innobackupex-1.5.1 --defaults-file=/mysql_backups/xtrabackup/20141219_013002/my.cnf --stream=tar --tmpdir=/mysql_backups/xtrabackup/20141219_013002 --slave-info --no-timestamp /mysql_backups/xtrabackup/20141219_013002 > /mysql_backups/xtrabackup/20141219_013002/backup.tar.gz 2 > /mysql_backups/xtrabackup/20141219_013002/xtrabackup.log

The task here is as simple as:
– Ship & extract the backup to destination,
– Apply-logs (innobackupex –defaults-file=/path/my.cnf –apply-log /path/datadir/)
– Start database (/etc/init.d/mysqld start),
– Setup replication (CHANGE MASTER TO) .

MySQL cameup clean after following all the steps and now it’s time to setup replication.

The option –slave-info to innobackupex script stores master’s binlog coordinates in a file, xtrabackup_slave_info, which can be used to setup the new server as slave.

The co-ordinates can also be found in the xtrabackup log shown as follows:

innobackupex-1.5.1: Backup created in directory '/mysql_backups/xtrabackup/20141219_013002'
innobackupex-1.5.1: MySQL binlog position: filename 'mysql-bin.002573', position 53014559, GTID of the last change '0-10110111-42499073'
innobackupex-1.5.1: MySQL slave binlog position: master host '10.1.101.11', gtid_slave_pos 0-10110111-42499073
141219 04:55:16  innobackupex-1.5.1: Connection to database server closed
innobackupex-1.5.1: You must use -i (--ignore-zeros) option for extraction of the tar stream.
141219 04:55:16  innobackupex-1.5.1: completed OK!

I noted that the backup xtrabackup_slave_info file binlog co-ordinate did not match that of master!! And we found that the binlog co-ordinates we had were of slave itself (show master status) and not of master (relay_Master_Log_File & exec_master_pos). This appears a bug in xtrabackup.

Luckily the xtrabackup log did note the GTID positions and that helped to restore the slave as follows.

Verified the present gtid_slave_pos matches that of log and started slave. (Check the References for MariaDB documentation page about restoring slave)

MariaDB [(none)]> show global variables like 'gtid_slave_pos';
+----------------+---------------------+
| Variable_name| Value |
+----------------+---------------------+
| gtid_slave_pos | 0-10110111-42499073 |
+----------------+---------------------+
1 row in set (0.00 sec)

MariaDB [(none)]> show global variables like 'gtid_current_pos';
+------------------+---------------------+
| Variable_name| Value |
+------------------+---------------------+
| gtid_current_pos | 0-10110111-42499073 |
+------------------+---------------------+

MariaDB [(none)]>CHANGE MASTER TO master_use_gtid=current_pos; START SLAVE;

We’ll look forward to the bug fix to that reports correct (master’s) binary log positions in xtrabackup_slave_info file.

References:

  • https://bugs.launchpad.net/bugs/1404484
  • https://mariadb.com/kb/en/mariadb/documentation/replication/standard-replication/global-transaction-id/#setting-up-from-backup
  • http://www.percona.com/doc/percona-xtrabackup/2.2/innobackupex/innobackupex_option_reference.html#cmdoption-innobackupex–slave-info

27 May

xtrabackup_51: not found & no ‘mysqld’ group in MySQL options

Recently I happen to setup a new MySQL instance with my tools – a standard MySQL 5.1+, xtrabackup setup and last-hotbackup.tar.gz.

To restore from the backup we used xtrabackup binaries and ran into issues following standard commands (assuming no changes):

To prepare the backup I used apply-log as follows:

$] innobackupex-1.5.1 --defaults-file=/usr/local/mysql/data/backup-my.cnf --apply-log  /usr/local/mysql/data --ibbackup xtrabackup_51

InnoDB Backup Utility v1.5.1-xtrabackup; Copyright 2003, 2009 Innobase Oy
and Percona Ireland Ltd 2009-2012.  All Rights Reserved.

This software is published under
the GNU GENERAL PUBLIC LICENSE Version 2, June 1991.

IMPORTANT: Please check that the apply-log run completes successfully.
At the end of a successful apply-log run innobackupex-1.5.1
prints "completed OK!".


sh: 1: xtrabackup_51: not found
innobackupex-1.5.1: fatal error: no 'mysqld' group in MySQL options

To my surprise I really didn’t find xtrabackup_51 under path – Later I verified the version for Xtrabackup is 2.1.1 & learnt that xtrabackup_51 is absent with percona-xtrabackup 2.1.1 binaries.
~/bin$ ls -lhtr
total 157M
drwxr-xr-x 4 kedar kedar 4.0K May 13 21:24 percona-xtrabackup-2.1.1
-rwxr-xr-x 1 kedar kedar 16M May 13 21:30 xtrabackup_55
-rwxr-xr-x 1 kedar kedar 78M May 13 21:36 xtrabackup_56
-rwxr-xr-x 1 kedar kedar 13M May 13 21:42 xtrabackup
-rwxr-xr-x 1 kedar kedar 2.2M May 13 21:42 xbstream
-rwxr-xr-x 1 kedar kedar 2.2M May 13 21:42 xbcrypt
lrwxrwxrwx 1 kedar kedar 12 May 13 21:42 innobackupex-1.5.1 -> innobackupex
-rwxr-xr-x 1 kedar kedar 108K May 13 21:42 innobackupex
-rw-rw-r-- 1 kedar kedar 47M May 13 21:44 percona-xtrabackup-2.1.1-600.tar.gz

That’s not cool, I thought until I read an important release note: “Support for InnoDB 5.0 and InnoDB 5.1 builtin has been removed from Percona XtraBackup.” [Release notes: http://www.percona.com/doc/percona-xtrabackup/2.1/release-notes/2.1/2.1.1.html] .

Options with me were:
1. Disable built-in Innodb and enable Innodb plugin or use Percona’s MySQL build
2. Use –ibbbackup xtarbackup (not xtrabackup_51)
3. Use XtraBackup 2.0 series for default built-in Innodb

– I upgraded the MySQL to use Percona’s build and executed –apply-logs specifying –ibbackup xtrabackup

The bug report [https://bugs.launchpad.net/percona-xtrabackup/+bug/1180905] is already in place and the issue has been resolved in the later version 2.1.2. Fixes: “innobackupex automatic version detection did not work correctly for latest Percona Server and MySQL 5.1 releases which could cause innobackupex to fail.”

For using MySQL 5.1+ with built-in support we can use Percona XtraBackup 2.0.
Support for MySQL 5.1 with builtin InnoDB (not the plugin) was removed in Percona XtraBackup 2.1. The last version to support MySQL 5.1 with builtin InnoDB was Percona XtraBackup 2.0. – conveyed @productiondba

Take out:
– Upgrade your xtrabackup to latest Percona XtraBackup Version (Yep, today it’s 2.1.3, not 2.1.2)
– Make sure we read release notes before using the latest build :)

04 Mar

Why we need a tester / QA – Software testing

“Why Testing”, “Importance of Software testing”, “Scope of testing”, “How to become a tester”, “Training for software quality assurance / software testing” …

Stop there… Above are the topics well discussed and explained at a lot many places and a lot many times, but this is not what we’re here for. I happened to face the question recently “Why’d you need a software tester as a dedicated resource?”

I did not want to get into deep technicalities of the SDL or STL Cycles to prove the roles but in plain simple words I’d try explaining the importance of the role that a tester plays and why a tester is required as a dedicated resource.

For any (software) product, testing is must. And, software tester can do better testing as compared to the developer or programmer who’s written the software.

Programmers, don’t feel bad. No camel sees his hunch!

“Why Tester”… hmmm let me make some points now:

1. A development team should avoid testing its own program (unit testing is a responsibility) – tester will check from his / her doubtful eyes :)

  • Developer may be biased with the fact that he has developed it, so may not test some areas as he’s confident enough that he has developed it bug free. (it’s like one always favours own baby) They usually have an attitude of:
    We take pride on what we developed. We defend ourselves “it is not my fault!”
  • He may not think out of the box, or all possible test scenarios that can break the system as he is used to work in the same shell.

2. A software should be tested in a neutral way to have unbiased testing results. This can be done only by a third person (not developer himself).

3. A developer is usually working under high pressures to implement requirements asap. Due which quality is compromised a lot many times. So, it’s better to have a separate tester.

4. Testing is not about just validating that system is performing what it is supposed to, but importantly it is also about assuring that the system is not performing what it is not supposed to perform. A tester delicately does that and does it better.

  • Testing needs a NEGATIVE approach, which a developer doesn’t have. TESTING IS DESTRUCTIVE PROCESS : A CREATIVE DESTRUCTION
  • Testing is not just after-programming evaluation, but it is key integral part of each phase of SDLC.
  • Testers try to find faults, developers try to take pride. (oops.. no hard feeling guys)

5. The more early a defect is found the less costly it is to fix.

I may sound aggressive at some lines; it’s not because I hate developers or there is some personal grudge against, but it’s rather to emphasize the necessity and my answer to Why Tester :).

Jointly, developers and testers fulfill the requirement successfully in favour of their employers and clients.

Anyways, I wanted to answer “Why Tester” without using technical-words and in fun way. I hope this is sufficient to explain why a software tester is required.

I’ll leave you with two lines as post-script:

If we’re perfect, there’d not be tester or QA and we know that nobody is perfect!!

Testers pursue defects, not the people behind the defects!!

10 Jan

MySQL variable innodb_flush_method – summarized

innodb_flush_method

innodb_flush_method variable specifies how InnoDB opens and flushes log and data files. In Innodb optimization, setting the variable innodb_flush_method tweaks the performance most of the times but there are cases of otherwise though. If innodb_flush_method is set to O_DIRECT it’d avoid double buffering and reduce swap pressure and helps improving performance.

innodb_flush_method has been covered at a lot of places and this is a quick reference to myself.


innodb_flush_method:


innodb_flush_method On non-Windows systems:

1. fdatasync:
– default
– InnoDB uses fsync() to flush both data and log files (though fdatasync is specified).
– fdatasync() flushes only data while fsync() also flushes metadata along with file’s data ( & thus causes more IO).
– fsync() causes double buffering. i.e. operating system buffers at least some of the data in its own cache even though InnoDB is managing it in it’s own buffers.
– When innodb_file_per_table is set, writes on multiple tables causes multiple fsync() calls as they cannot be combined in single IO.

2. O_DIRECT:
– Applies only for data files and not logs.
– O_DIRECT uses fsync() method to flush files to disk and ensures no double buffering on system.
– All reads and writes goes directly to disk.
– This setting disbles OS’s double buffering & read ahead while flushing.
– Implemented by setting O_DIRECT flag of fcntl() or directio() on Solaris.
– If innodb_file_per_table is not set, performance suffers due to serialized IO caused by inode-level mutex.

3. O_DSYNC:
– Applies only to log files and not data files.
– Sets O_SYNC flag of open() and makes all writes synchronous.
– O_SYNC doesn’t disable double buffering / caching at system level.
– O_DSYNC flushes only data while O_SYNC flushes both data and metadata.

innodb_flush_method On Windows systems:

1. async_unbuffered:
– unbuffered IO => data has been written / flushed before continue next.
– InnoDB to use unbuffered I/O for most writes.
– If innodb_flush_log_at_trx_commit = 2, InnoDB uses buffered I/O to the log files.
– This causes InnoDB to use Windows native Async IO for both Reads/Writes.
– For older version of Windows, InnoDB uses it’s own Async IO.

2. unbuffered:
– Same as async_unbuffered except here native async. IO is not used.

3. Normal:
– This option tells to use neither unbuffered IO nor native async. IO.

-- Kedar Vaijanapurkar --