{"id":2334,"date":"2015-12-04T07:40:19","date_gmt":"2015-12-04T07:40:19","guid":{"rendered":"http:\/\/kedar.nitty-witty.com\/?p=2334"},"modified":"2024-01-27T17:29:19","modified_gmt":"2024-01-27T17:29:19","slug":"fixing-inconsistency-on-mysql-slave-of-galera-cluster","status":"publish","type":"post","link":"https:\/\/kedar.nitty-witty.com\/blog\/fixing-inconsistency-on-mysql-slave-of-galera-cluster","title":{"rendered":"Fixing inconsistency on MySQL Slave of Galera Cluster"},"content":{"rendered":"\n<p>Checksum is a standard practice among DBAs to verify the data consistency across replicated nodes. In this post we&#8217;re going to review the syncing options for an inconsistent MySQL slave of Galera cluster node.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter\"><img decoding=\"async\" src=\"http:\/\/kedar.nitty-witty.com\/wp-content\/uploads\/2015\/12\/PXC-slave-checksum.png\" alt=\"PXC-slave-checksum\" class=\"wp-image-82503\"\/><\/figure>\n<\/div>\n\n\n<p>Here we&#8217;re assuming a setup of regular replication to a MySQL instance from one of the Galera cluster nodes.<br><\/p>\n\n\n\n<!--more-->\n\n\n\n<p><br>In the usual MySQL replication setup, standard practice involves the usage of the pt-table-checksum tool to identify the discrepancies and usage of pt-table-sync to bring them in sync. The checksum tool, pt-table-checksum, can run across Galera cluster node to verify the data consistency and confirm if the MySQL slave is consistent with a chosen primary node.<\/p>\n\n\n\n<p>What happens if this Galera cluster&#8217;s regular MySQL slave sees data inconsistency on it? Will pt-table-sync work there? The answer to this depends&#8230;<br>pt-table-sync when used with &#8211;sync-to-master causes it to take locks on master but Galera doesn&#8217;t like those lock attempts.<\/p>\n\n\n\n<p>You may ask, why locks on a master?<\/p>\n\n\n\n<p>Coz&#8217; pt-table-sync will treat the master as the source and the slave as the destination. It will lock the table (&#8211;lock=1) on master, apply changes on master which will eventually be replicated to slave and thus causing the sync.<\/p>\n\n\n\n<p>Respective snippet of code from pt-table-sync script V.2.2.15:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">\tlock_server(src =&gt; $src, dst =&gt; $dst, %args);\n\t...\n\t$exit_status |= sync_a_table(\n               src   =&gt; $src,\n               dst   =&gt; $dst,\n               where =&gt; 1,  # prevents --where from being used\n               diff  =&gt; $diff,\n               %args,\n            );\n\t ...\n         unlock_server(src =&gt; $src, dst =&gt; $dst, %args);\n<\/pre>\n\n\n\n<p>Again&#8230; coming back to our point, pt-table-sync wouldn&#8217;t work well on Galera with &#8211;sync-to-master. Let&#8217;s do an attempt:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">PTDEBUG=1 .\/pt-table-sync --verbose --execute --replicate=pyth.checksum --sync-to-master --tables slave_repl.tmp h=localhost,u=root,p=$pw &gt; slave_repl.tmp.sql\n<\/pre>\n\n\n\n<p>If you run the above command (on slave) with debug you will note following error:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">\t# TableSyncer:6114 4650 Committing DBI::db=HASH(0x177cf18)\n\tDeadlock found when trying to get lock; try restarting transaction at line 6115 while doing slave_repl.tmp on localhost\n<\/pre>\n\n\n\n<p>(without PTDEBUG you won&#8217;t see much on slave except it will report nothing changed!)<\/p>\n\n\n\n<p>Great, so why the error? Let&#8217;s again check the code snippet (pt-table-sync V.2.2.15):<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">\tsub lock_table {\n\t   my ( $self, $dbh, $where, $db_tbl, $mode ) = @_;\n\t   my $query = \"LOCK TABLES $db_tbl $mode\";\n\t   PTDEBUG &amp;&amp; _d($query);\n\t...\n<\/pre>\n\n\n\n<p>As you see, it&#8217;s calling up for LOCK TABLES and Galera, which as we know, doesn&#8217;t support explicite locking because of the conflict with multi-master replication. That&#8217;s the reason for the error above.<\/p>\n\n\n\n<p>Okay, let continue&#8230; Upon executing pt-table-sync on slave, the &#8220;master&#8221;&#8216;s (Galera node&#8217;s) error-log will show the following error:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">\t2015-08-27 14:45:07 6988 [Warning] WSREP: SQL statement was ineffective, THD: 17, buf: 1399\n\tQUERY: commit\n\t =&gt; Skipping replication\n<\/pre>\n\n\n\n<p>We already have a <a href=\"https:\/\/bugs.launchpad.net\/percona-toolkit\/+bug\/1250450\" target=\"_blank\" rel=\"noopener\">bug report<\/a> in place and if it affects you, go ahead and mark it so.<\/p>\n\n\n\n<p>So how would you fix this?<\/p>\n\n\n\n<p>Easy Answer: Do a complete rebuild of slave from a fresh data backup of the cluster node.<br>Desync the cluster node, take the backup and restore it on a slave machine and setup replication.<\/p>\n\n\n\n<p>But let&#8217;s think about an alternate method other than a complete restore&#8230;<\/p>\n\n\n\n<p><em>&#8220;fixing using pt-slave-restart?&#8221;<\/em><\/p>\n\n\n\n<p><em>But pt-table-sync is &#8220;not Galera Ready&#8221; as they say!!<\/em><\/p>\n\n\n\n<p>Even then, pt-table-sync can help us understand the differences and that&#8217;s the long answer \ud83d\ude42<\/p>\n\n\n\n<p>It can still work and prepare SQL for you using &#8211;print &#8211;verbose options.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">.\/pt-table-sync --verbose --print --replicate=pyth.checksum --sync-to-master --tables slave_repl.tmp h=localhost,u=root,p=$pw &gt; slave_repl.tmp.sql\n<\/pre>\n\n\n\n<p>So, all you need to do is run the SQL against a slave to fix the discrepancies. You may choose to desync the node and run the pt-table-sync to generate differential sql. You&#8217;ll still need to confirm if slave got synced by re-running the pt-table-checksum and the discrepancies are resolved.<\/p>\n\n\n\n<p>Our steps to resync a PX Cluster&#8217;s slave using pt-table-sync are as follows:<\/p>\n\n\n\n<p>(Note: It&#8217;s advisable to stop writes on cluster to fix the discrepancies on slave. &#8220;Why?&#8221; &#8220;Explained later.&#8221;)<\/p>\n\n\n\n<p>&#8211; Desync master node:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">set global wsrep_desync=ON;\n<\/pre>\n\n\n\n<p>&#8211; Generate differential SQL:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">.\/pt-table-sync --verbose --print --replicate=pyth.checksum --sync-to-master --tables slave_repl.tmp h=localhost,u=root,p=$pw &gt; slave_repl.tmp.sql\n<\/pre>\n\n\n\n<p>&#8211; Review the sql generated and execute them on slave.<\/p>\n\n\n\n<p>&#8211; Once slave is synced, you can:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">set global wsrep_desync=OFF;\n<\/pre>\n\n\n\n<p>&#8211; Finally rerun the pt-table-checksum to verify the discrepancies.<\/p>\n\n\n\n<p>That concludes our solution.<\/p>\n\n\n\n<p><em>&#8220;Wait, but why desync?&#8221;<\/em><br>hmm&#8230; Well wsrep_desync is a dynamic variable which controls whether the node can participate in Flow Control.<\/p>\n\n\n\n<p><em>&#8220;hold on!! Flow control??&#8221;<\/em><br>Galera has synchronous replication where in node provides the feedback to rest in the group &#8211; fellas-I&#8217;m-late-hold-on OR okay-let&#8217;s-continue-the-job. So this communication feedback is flow control. (You should read <a href=\"http:\/\/galeracluster.com\/documentation-webpages\/nodestates.html\" target=\"_blank\" rel=\"nofollow noopener\">galera-documentation<\/a> &amp; <a href=\"https:\/\/www.percona.com\/blog\/2013\/05\/02\/galera-flow-control-in-percona-xtradb-cluster-for-mysql\/\" target=\"_blank\" rel=\"nofollow noopener\">Jay&#8217;s post<\/a>).<\/p>\n\n\n\n<p>When we will set <em>wsrep_desync=ON<\/em> on master, it will continue to replicate in and out the writesets as usual; but flow control will no longer take care of the desynced node. So, other nodes of the group won&#8217;t bother about the deynced node lagging behind. Thus by desyncing we&#8217;re making sure that our operations on one node are not affecting the whole cluster. This should answer why writes need to be stopped before starting to sync.<\/p>\n\n\n\n<p>Hope this helps.<\/p>\n\n\n\n<p><b>PS:<\/b> This blog was published on Pythian Blog.<\/p>\n","protected":false},"excerpt":{"rendered":"This blog post explores the syncing options for an inconsistent MySQL slave of Galera cluster node using pt-table-sync\n","protected":false},"author":1,"featured_media":2335,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[864,8,377,865],"tags":[382,855,414,427,417,415,856,100,857,416],"class_list":{"0":"post-2334","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-mysql-galera-cluster","8":"category-mysql","9":"category-mysql-articles","10":"category-percona-xtradb-cluster","11":"tag-galera-cluster","12":"tag-galera-replication-slave","13":"tag-inconsistency","14":"tag-mysql","15":"tag-pt-table-sync","16":"tag-pxc","17":"tag-pxc-to-mysql-replication","18":"tag-replication","19":"tag-replication-issues","20":"tag-slave"},"aioseo_notices":[],"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/kedar.nitty-witty.com\/blog\/wp-json\/wp\/v2\/posts\/2334","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/kedar.nitty-witty.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/kedar.nitty-witty.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/kedar.nitty-witty.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/kedar.nitty-witty.com\/blog\/wp-json\/wp\/v2\/comments?post=2334"}],"version-history":[{"count":4,"href":"https:\/\/kedar.nitty-witty.com\/blog\/wp-json\/wp\/v2\/posts\/2334\/revisions"}],"predecessor-version":[{"id":2871,"href":"https:\/\/kedar.nitty-witty.com\/blog\/wp-json\/wp\/v2\/posts\/2334\/revisions\/2871"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/kedar.nitty-witty.com\/blog\/wp-json\/wp\/v2\/media\/2335"}],"wp:attachment":[{"href":"https:\/\/kedar.nitty-witty.com\/blog\/wp-json\/wp\/v2\/media?parent=2334"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/kedar.nitty-witty.com\/blog\/wp-json\/wp\/v2\/categories?post=2334"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/kedar.nitty-witty.com\/blog\/wp-json\/wp\/v2\/tags?post=2334"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}