pt-table-checksum & Waiting to check replicas for differences: 0% 00:00 remain

I recently came across an issue where the task was to checksum databases using Percona’s pt-table-checksum. The command continued well for all the databases but when it came to specific DB, it kept on showing following error:

Waiting to check replicas for differences: 0% 00:00 remain

As this did not change for hours (was in screen session), I cross-checked that nothing blocks the process and other options!

I killed pt-table-checksum & re-ran the command with PTDEBUG to get detailed debug info:

[kedar@nitty-witty ~] PTDEBUG=1 /usr/bin/pt-table-checksum --config /home/kedar/pt-table-checksum-config   --recursion-method dsn=h=localhost,D=DB_A,t=dsns --replicate DB_A.checksums --no-check-replication-filters --no-check-binlog-format --chunk-time=3 --max-load Threads_running=18 --check-interval=1 --max-lag=1000 --no-check-plan --ignore-databases mysql,performance_schema,information_schema --databases DB_A,DB_B,DB_C,DB_D --resume > checksum.log 2>&1

The environment variable PTDEBUG enables verbose debugging output to STDERR.

From the logs I found that the checksum command was stuck because following query was returning NULL:

mysql> SELECT MAX(chunk) FROM `DB_A`.`checksums` WHERE db='DB_D' AND tbl='a_c' AND master_crc IS NOT NULL;
+------------+
| MAX(chunk) |
+------------+
|       NULL |
+------------+
1 row in set (0.00 sec)

I ran the query with only db=’DB_D’ condition and found that all the tables of this database returned NULL! The database DB_D was very well present on master and all the slaves.

Debugging further get me to following status output of the slave:

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: master.mysite.com
                  Master_User: slave2
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000908
          Read_Master_Log_Pos: 576546786
               Relay_Log_File: mysqld-relay-bin.002738
                Relay_Log_Pos: 576546932
        Relay_Master_Log_File: mysql-bin.000908
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: DB_A,DB_B,DB_C
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 576546786
              Relay_Log_Space: 576547132
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 1
1 row in set (0.00 sec)

Look at “Replicate_Do_DB” closely -> “DB_A,DB_B,DB_C”. That’s right, DB_D is not replicated !!
So, that explains us the NULL results of max-chunk finding query and our error of waiting-to-check-replicas-for-differences.

Well, so the solution here is, as DB_D does not replicate, we shall remove it from the list of databases to checksum. Add DB_D in to the list of –ignore-databases and rerun the command to get things started!

[kedar@nitty-witty ~] PTDEBUG=1 /usr/bin/pt-table-checksum --config /home/kedar/pt-table-checksum-config   --recursion-method dsn=h=localhost,D=DB_A,t=dsns --replicate DB_A.checksums --no-check-replication-filters --no-check-binlog-format --chunk-time=3 --max-load Threads_running=18 --check-interval=1 --max-lag=1000 --no-check-plan --ignore-databases DB_D,mysql,performance_schema,information_schema --databases DB_A,DB_B,DB_C --resume > checksum.log 2>&1

This worked well and killed “Waiting to check replicas for differences: 0% 00:00 remain”.

Note that we’re using –no-check-replication-filters so that our checksum will continue over the replication filters. Also review –replicate-database option which can also be used to specify databases to be USEd.

I think fix in pt-table-checksum here could be that: In a condition where max-chunk returns NULL and slave is in sync (seconds_behind_master: 0), pt-table-checksum should exit with appropriate warning.
I may be overlooking many other options but that’s something I could quickly think. Comment for corrections / if-and-buts.

Hope this helps.

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like

Stored procedure to Find database objects

This procedure lists available database objects under passed database name. It lists present Tables, Views, Stored Procedures, Functions and Triggers under particular database. It also lists storage engine of tables.…
View Post