10 Apr

Still have MySQL 5.6 variable binlog_error_action as IGNORE_ERROR?

Recently, we were affected by an ignored configuration option introduced in MySQL 5.6. This incident caused us to perform extended planning of downtime and the rebuilding of the slave. In this post, we’ll discuss our encounter with binlog_error_action and likely bad default.

The incident started with an alert from our monitoring platform that a check failed to verify a MySQL master’s binary log coordinates. My colleague working from the other side of the globe observed the following:

mysql> show master status;
    Empty set (0.00 sec)
    mysql> show binary logs;
    ERROR 1381 (HY000): You are not using binary logging

Interestingly, the MySQL slave was still reporting that everything was okay!

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: master
                  Master_User: replicator
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000186
          Read_Master_Log_Pos: 1069066810
               Relay_Log_File: mysqld-relay-bin.000245
                Relay_Log_Pos: 1069067004
        Relay_Master_Log_File: mysql-bin.000186
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 1069066810
              Relay_Log_Space: 1069067296
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 23294781
                  Master_UUID: 7e2e60eb-39fb-11ea-a248-005056b589fd
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set: 7e2e60eb-39fb-11ea-a248-005056b589fd:50961813-81101493
            Executed_Gtid_Set: 2366bbb5-39fb-11ea-a246-005056b5f82f:1-818,
7e2e60eb-39fb-11ea-a248-005056b589fd:1-81101493
                Auto_Position: 1
1 row in set (0.00 sec)

Guess what my colleague saw in the error log?

    2020-03-18 07:23:50 14694 [ERROR] /usr/sbin/mysqld: Incorrect key file for table '/tmp/#sql_3966_2.MYI'; try to repair it
    2020-03-18 07:23:50 14694 [ERROR] Got an error from unknown thread, /mnt/workspace/percona-server-5.6-redhat-binary-new/label_exp/min-centos-7-x64/test/rpmbuild/BUILD/percona-server-5.6.46-86.2/storage/myisam/mi_write.c:230
    2020-03-18 14:10:46 14694 [Warning] IP address '192.168.0.10' could not be resolved: Name or service not known
    2020-03-19 00:55:59 7f5589a9a700 InnoDB: Error: Write to file (merge) failed at offset 5678039040.
    InnoDB: 1048576 bytes should have been written, only 892928 were written.
    InnoDB: Operating system error number 28.
    InnoDB: Check that your OS and file system support files of this size.
    InnoDB: Check also that the disk is not full or a disk quota exceeded.
    InnoDB: Error number 28 means 'No space left on device'.
    InnoDB: Some operating system error numbers are described at
    InnoDB: http://dev.mysql.com/doc/refman/5.6/en/operating-system-error-codes.html
    2020-03-19 00:57:59 7f5589a9a700 InnoDB: Error: Write to file (merge) failed at offset 3308257280.
    InnoDB: 1048576 bytes should have been written, only 888832 were written.
    InnoDB: Operating system error number 28.
    InnoDB: Check that your OS and file system support files of this size.
    InnoDB: Check also that the disk is not full or a disk quota exceeded.
    InnoDB: Error number 28 means 'No space left on device'.
    InnoDB: Some operating system error numbers are described at
    InnoDB: http://dev.mysql.com/doc/refman/5.6/en/operating-system-error-codes.html
    2020-03-19 13:29:08 14694 [ERROR] /usr/sbin/mysqld: Incorrect key file for table '/tmp/#sql_3966_2.MYI'; try to repair it
    2020-03-19 13:29:08 14694 [ERROR] Got an error from unknown thread, /mnt/workspace/percona-server-5.6-redhat-binary-new/label_exp/min-centos-7-x64/test/rpmbuild/BUILD/percona-server-5.6.46-86.2/storage/myisam/mi_write.c:230
    2020-03-19 13:29:11 14694 [ERROR] /usr/sbin/mysqld: Incorrect key file for table '/tmp/#sql_3966_5.MYI'; try to repair it
    2020-03-19 13:29:11 14694 [ERROR] Got an error from unknown thread, /mnt/workspace/percona-server-5.6-redhat-binary-new/label_exp/min-centos-7-x64/test/rpmbuild/BUILD/percona-server-5.6.46-86.2/storage/myisam/mi_write.c:230
    2020-03-19 13:29:11 14694 [ERROR] /usr/sbin/mysqld: Incorrect key file for table '/tmp/#sql_3966_2.MYI'; try to repair it
    ...
    2020-03-19 15:27:33 14694 [ERROR] Got an error from unknown thread, /mnt/workspace/percona-server-5.6-redhat-binary-new/label_exp/min-centos-7-x64/test/rpmbuild/BUILD/percona-server-5.6.46-86.2/storage/myisam/mi_write.c:230
    2020-03-19 15:27:33 14694 [ERROR] /usr/sbin/mysqld: Incorrect key file for table '/tmp/#sql_3966_3.MYI'; try to repair it
    2020-03-19 15:27:33 14694 [ERROR] Got an error from unknown thread, /mnt/workspace/percona-server-5.6-redhat-binary-new/label_exp/min-centos-7-x64/test/rpmbuild/BUILD/percona-server-5.6.46-86.2/storage/myisam/mi_write.c:230
    2020-03-19 15:27:35 14694 [ERROR] An error occurred during flush stage of the commit. 'binlog_error_action' is set to 'IGNORE_ERROR'. Hence turning logging off for the whole duration of the MySQL server process. To turn it on again: fix the cause, shutdown the MySQL server and restart it.

Further debugging of the issue with PMM, confirmed a spike on the /tmp disk.

PMM Disk Check

PMM Disk Check

The environment here is an active-passive replication with the Orchestrator looking over them and doing its thing. It also has application traffic flowing through a ProxySQL cluster directed to the current active database.

So, What’s the Situation?

  • We need to restart the master to fix this, as mentioned in the error-log.
  • The master is no longer writing binary logs and therefore the slave is broken.
  • Because the slave is broken, it’s no longer able to take over an active position. No backups and no HA.

Here’s a TL;DW (write) summary of solution offered:

  • # Disable recoveries on Orchestrator:
    • orchestrator-client -c disable-global-recoveries
  • # Change database configuration add following variable:
    • binlog_error_action=ABORT_SERVER
  • # Restart MySQL
  • # Verify error log & binary log
  • # Enable global recoveries:
    • orchestrator-client -c enable-global-recoveries
  • # Take fresh backup on master
  • # Restore passive-node (Slave) & reconfigure replication.
  • # Raise /tmp disk partition more than largest table-size.

Bad Default?!

This option was introduced in MySQL 5.6.22 as a solution to the bug report about “mysqld continues operation without logging when binlogs cannot be written!” The default value was “IGNORE_ERROR”. The alternate option is to crash the server with the value “ABORT_SERVER”.

Had it been ABORT_SERVER, we had our SWAT team of Orchestrator and ProxySQL to manage the HA. They’d have performed the failover to the passive-node. Then, the only thing left to resolve this issue is to verify the status of the aborted (crashed) MySQL database server, which, if running at all, would already be demoted to a passive position.

Thankfully, this default is changed in MySQL 5.7.7 and later versions to ABORT_SERVER.

A Task for You!

Here is something for you to do:

For the EOL of MySQL 5.6, we still have almost a year to go. It’s scheduled for February 2021. 

If you’re using MySQL 5.6, go ahead and change the binlog_error_action variable. This is a dynamic change and won’t affect anything in “live.” Make sure you persist it in the MySQL config file.

mysql> SET GLOBAL binlog_error_action=ABORT_SERVER

References

  • https://bugs.mysql.com/bug.php?id=51014
  • https://mysqlserverteam.com/new-option-to-stop-the-server-if-binlogging-fails-in-mysql-5-6/
  • https://jira.percona.com/browse/PXC-2546

p.s. Are you still using MySQL 5.6? Consider upgrading your MySQL sooner than later.

PS2: This blog was originally published on Pythian Blog: https://blog.pythian.com/still-have-binlog_error_action-as-ignore_error/

08 Oct

ProxySQL Configuration File & Startup Process Explained

When learning ProxySQL, we have seen some confusion around the configuration and especially around the usage of the configuration file. In this post, we will explain how ProxySQL treats its internal configuration and its configuration file.

The Proxysql.Cnf

For the initial startup, the configuration is loaded from the /etc/proxysql.cnf file. At this time, the configuration file will be parsed and the settings will be stored in the embedded SQLite database. From this point onwards, the settings will be accessed from the SQLite database.

However, there are four configuration settings which are always parsed from the config file, even when the SQLite database is present. This places a requirement for the configuration file to always be in the place, if missing, ProxySQL terminates.

  • datadir: The path of ProxySQL datadir which stores database file, logs.
  • restart_on_missing_heartbeats (new in 1.4.4): If MySQL threads miss restart_on_missing_heartbeats (number of) heartbeats, proxysql will raise a SIGABRT signal and restart. Default is 10.
  • execute_on_exit_failure (new in 1.4.4): If set, ProxySQL’s parent process will execute the defined script every time ProxySQL crashes. It is recommended to use this setting to generate an alert or log the event. 
  • errorlog (new in 2.0.0): The file to be used as error-log. Defaults to ${datadir}/proxysql.log
Read More
09 Apr

mysql database backup shell script with status email

This post is for the backup script for MySQL database on Linux with mail. It’s a linux shell script for taking logical backup using mysqldump and sending status email.

The backup shell script works as follows:
– The script takes backup using mysqldump and compresses it.
– Upon success, it will attempt to ship the backup to specified offsite location.
– Upon detecting failure in any of the above step, it will send out failure email.
– Upon overall success, it will sendout success email with execution time and present backups list.

You might be looking for setting up physical backups for mysql using Holland backup framework.

Following is the code for MySQL backup shell script using mysqldump with status email.

Read More

07 Sep

MySQL 8 Resource Group – introduction and dynamic allocation

MySQL 8 is GA and it has a variety of new features. Recently, we happened to work with resource groups to restrict resource utilization for a MySQL thread.

That’s why I thought of writing this blog post: to give you a quick introduction to resource groups, an idea of how to dynamically allocate them and to discuss a related bug report.

Introduction To Resource Groups:

A resource group, a new feature in MySQL 8, is the entity that defines the allowed resources consumption for threads pertaining to that group. Group attributes enable control over resources and hence the database threads get restricted with those limits.

Resource groups introduce the following associated components:
1. SQL to manage the resource groups (create, alter, assign, drop).
2. Resource groups related privileges.
3. The information_schema.resource_groups table, which holds groups details.
4. A new column was added to the performance_schema.threads table to show us the assignments of threads to respective resource groups.

Read More

13 Aug

Galera cluster to AWS Aurora migration & HA_ERR_FOUND_DUPP_KEY

In this post we will see a case study of a Galera Cluster migration to AWS Aurora and quick solution to the replication issue.

A friend received an error in a Master-Master replication as follows:

Could not execute Write_rows event on table _database._table; Duplicate entry '65eJ8RmzASppBuQD2Iz73AAy8gPKIEmP-2018-08-03 08:30:03' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mysql-bin-changelog.000010, end_log_pos 1107814

We talked about it and I immediately started blabbering cons of writing on both masters, how to handle, roles of apps and so on. He intervened and revealed, it is Galera Cluster replicated to Aurora and he is not writing on Aurora.

He was actually migrating his Galera Cluster to Aurora and master-master was to support his rollback plans. Consider following diagram for simplicity of understanding:

galera to aws migration - sample architecture

galera to aws migration – sample architecture

Read More

-- Kedar Vaijanapurkar --