19 Aug

2 many dots can break your replication only once

Two or more dots in your relay log or binary log names can break replication; but worry not, it will only do it once. If you request to start again, it will work. That’s it, that’s the blog. Should you choose to spend some more time, go ahead.

A new MySQL 5.7 GTID replication chain was being constructed and fail-overs were being tested while an oddity of MySQL replication revealed itself. The Orchestrator graceful fail-over was not able to attach the replica and reporting an error.

Last_SQL_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.

Every possibility was investigated:

  • “the master’s binary log is corrupted”: No, master binary logs were okay. The mysqlbinlog utility was able to correctly parse the binary logs.
  • “the slave’s relay log is corrupted”: No, replica’s binary logs were okay. The mysqlbinlog utility was able to correctly parse the binary logs.
  • “a network problem”: telnet on port, no packet drops, no firewalls and mainly replication was working before failover!

How the error was fixed:

START SLAVE.

Hit the start slave again! Yes, on that error if you hit “start slave”, it starts working like a good boy.

Observation

It was later noted that the relay-log name was not regular. The one with NAME-followed-by-DIGITS. Instead it was incomplete filename provided and no trailing digits. Sounds like a MySQL bug?

This issue appeared on MySQL 5.7.35 and even a lower minor-version. It hasn’t been tested in MySQL 8 yet.

Reproducing MySQL Replication Error – relay-bin

So, it was time to bring this creature to lab and have tests. Let’s go back to the setup and try to regenerate this. We have two MySQL 5.7 instances in GTID replication with following configuration:

## mysql1 - my.cnf
 [mysqld]
 server-id = 1
 binlog_format = MIXED
 log-bin
 sync_binlog = 1
 gtid-mode=ON
 enforce-gtid-consistency
 relay-log=this.is.relay.log

## mysql2 - my.cnf
 [mysqld]
 server-id = 2
 binlog_format = MIXED
 log-bin
 sync_binlog = 1
 gtid-mode=ON
 enforce-gtid-consistency
 relay-log=this.is.relay.log
# Once instances are up, setup replication
change master to master_host='mysql1', master_user='repl', master_password='X', master_auto_position=1;
start slave; 
show slave status\G

So what you observe is:

Last_SQL_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.

(If you see no error first time, try stop slave; start slave; show slave status\G again.)

Now, if you attempt to START SLAVE again, it will work!!

My Worklog from local MySQL Setup


mysql> change master to master_host='mysql1', master_user='repl', master_password='X', master_auto_position=1;
Query OK, 0 rows affected, 2 warnings (0.01 sec)

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State:
                  Master_Host: mysql1
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: 645d6e993851-bin.000001
          Read_Master_Log_Pos: 2333
               Relay_Log_File: this.is.relay
                Relay_Log_Pos: 2754
        Relay_Master_Log_File: 645d6e993851-bin.000001
             Slave_IO_Running: No
            Slave_SQL_Running: No
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 2333
              Relay_Log_Space: 2754
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1
                  Master_UUID: 054d4ac0-0023-11ec-934c-0242ac130002
             Master_Info_File: /var/lib/mysql/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State:
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set: 054d4ac0-0023-11ec-934c-0242ac130002:1-8
            Executed_Gtid_Set: 054d4ac0-0023-11ec-934c-0242ac130002:1-8
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name:
           Master_TLS_Version:
1 row in set (0.00 sec)

mysql> start slave;
Query OK, 0 rows affected (0.01 sec)

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: mysql1
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: 645d6e993851-bin.000001
          Read_Master_Log_Pos: 3565
               Relay_Log_File: this.is.relay
                Relay_Log_Pos: 3161
        Relay_Master_Log_File: 645d6e993851-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 1594
                   Last_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 123
              Relay_Log_Space: 4501
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 1594
               Last_SQL_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1
                  Master_UUID: 054d4ac0-0023-11ec-934c-0242ac130002
             Master_Info_File: /var/lib/mysql/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State:
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp: 210818 17:18:04
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set: 054d4ac0-0023-11ec-934c-0242ac130002:1-12
            Executed_Gtid_Set: 054d4ac0-0023-11ec-934c-0242ac130002:1-8
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name:
           Master_TLS_Version:
1 row in set (0.00 sec)

mysql> start slave;
Query OK, 0 rows affected (0.00 sec)

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: mysql1
                  Master_User: repl
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: 645d6e993851-bin.000001
          Read_Master_Log_Pos: 3565
               Relay_Log_File: this.is.relay
                Relay_Log_Pos: 4501
        Relay_Master_Log_File: 645d6e993851-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 3565
              Relay_Log_Space: 4501
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 1
                  Master_UUID: 054d4ac0-0023-11ec-934c-0242ac130002
             Master_Info_File: /var/lib/mysql/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set: 054d4ac0-0023-11ec-934c-0242ac130002:1-12
            Executed_Gtid_Set: 054d4ac0-0023-11ec-934c-0242ac130002:1-12
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name:
           Master_TLS_Version:
1 row in set (0.00 sec)

It is possibly understood why it fails for the first time, may be due to trimmed relay-log name, but then why it works with on the consecutive attempt to start slave!

The observation here is if you use DOT in the filename for relay-log, it is trimmed. The maximum dots it can tolerate is 1, though it trims the filename after that dot but it doesn’t break replication. This is really odd and I tried reviewing the code but didn’t get enough time to go through that completely, If you’re interested to investigate, please be my guest and don’t forget to comment.

Some relay-log config experiments

Configuring: "relay-log=this.is.relay.log"
Creates: "this.index" and "this.is.relay"
Replication: breaks for first start-slave and works in the next.
Configuring: "relay-log=thisis.relay.log"
Creates: "thisis.index" and "thisis.relay"
Replication: breaks for first start-slave and works in the next.
Configuring: "relay-log=thisisrelay.log"
Creates: "thisisrelay.index" and "thisisrelay.000001"
Replication: OK

Even log-bin hates DOTs

Wait, if you think this is over, it is not. This DOT fun is not only limited to relay-log but also extends to log-bin.

Configuring: "log-bin=this.is.bin.log"
Creates: "this.is.bin"!
mysql> show binary logs;
+---------------+-----------+
| Log_name | File_size |
+---------------+-----------+
| this.is.bin | 194 |

Wonder how the next binary log will progress and what will happen once the max-binlog-size is reached! May be it will overwrite the same binlog… I decided stop having these experiments and to let the capable MySQL team to figure this out.

I already have raised the MySQL bug report. Cheers.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.

-- Kedar Vaijanapurkar --