{"id":3304,"date":"2024-03-01T12:17:14","date_gmt":"2024-03-01T12:17:14","guid":{"rendered":"https:\/\/kedar.nitty-witty.com\/blog\/?p=3304"},"modified":"2024-09-23T13:42:47","modified_gmt":"2024-09-23T13:42:47","slug":"how-to-avoid-data-loss-in-mysql-primary-key-change","status":"publish","type":"post","link":"https:\/\/kedar.nitty-witty.com\/blog\/how-to-avoid-data-loss-in-mysql-primary-key-change","title":{"rendered":"How to avoid data loss in MySQL Primary Key change"},"content":{"rendered":"\n<p><span>Primary keys are the backbone of efficient data access and maintaining data consistency in your MySQL databases. However, altering them requires careful planning and execution, as incorrect procedures can lead to data loss. Recently I came across a question where user attempted to change Primary Key using pt-online-schema-change and ended up loosing data.<\/span><\/p>\n\n\n\n<p>This blog post highlights how an improperly executed Primary Key change in MySQL can cause data loss and explain the reasoning behind the same. We will perform online Primary Key change using industry standard Percona&#8217;s pt-online-schema-change tool.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Preparing for Primary Key Change in MySQL<\/h2>\n\n\n\n<p>Let us create a hypothetical scenario for changing primary key and see to generate the data loss situation. We will create a table primary_change and aim here is to use new_id column as primary key instead of id column. <\/p>\n\n\n\n<p>(I am sure you already know <a href=\"https:\/\/kedar.nitty-witty.com\/blog\/generate-random-test-data-for-mysql-using-routines\" target=\"_blank\" rel=\"noopener\" title=\"\">MySQL random data generator<\/a> script to generate dummy data.)<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>mysql&gt; create table primary_change ( id int not null primary key, new_id int not null);\nQuery OK, 0 rows affected (0.03 sec)\n\nmysql&gt; show create table primary_change\\G\n*************************** 1. row ***************************\n       Table: primary_change\nCreate Table: CREATE TABLE `primary_change` (\n  `id` int NOT NULL,\n  `new_id` int NOT NULL,\n  PRIMARY KEY (`id`)\n) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci\n1 row in set (0.00 sec)\n\nmysql&gt; insert into primary_change values (1,1), (2,2), (3,3), (4,4), (5,4);\nQuery OK, 5 rows affected (0.01 sec)\nRecords: 5  Duplicates: 0  Warnings: 0\n\nmysql&gt; select * from primary_change;\n+----+--------+\n| id | new_id |\n+----+--------+\n|  1 |      1 |\n|  2 |      2 |\n|  3 |      3 |\n|  4 |      4 |\n|  5 |      4 |\n+----+--------+\n5 rows in set (0.00 sec)<\/code><\/pre>\n\n\n\n<p>Note that the new_id is our candidate primary key and it is having duplicate values. <\/p>\n\n\n\n<p>Aim here is to switch the primary key from &#8220;id&#8221; column to &#8220;new_id&#8221; column.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Changing Primary Key with pt-online-schema-change<\/h2>\n\n\n\n<pre class=\"wp-block-code\"><code>&#91;root@centos_1 ~]# pt-online-schema-change    \\\n&gt; --print --progress time,10 \\\n&gt; --pause-file=\/tmp\/pt-primary-change.pause \\\n&gt; --no-swap-tables --no-drop-old-table --no-drop-new-table --no-drop-triggers \\\n&gt; --alter=\"DROP PRIMARY KEY, ADD PRIMARY KEY (new_id)\" \\\n&gt; --set-vars transaction_isolation=\\'READ-COMMITTED\\',lock_wait_timeout=30 \\\n&gt; h=localhost,D=test,t=primary_change  \\\n&gt; --no-check-unique-key-change \\\n&gt; --chunk-time=1 \\\n&gt; --new-table-name=_%T_primary_change_new \\\n&gt; --execute\nNo slaves found.  See --recursion-method if host centos_1.localdomain has slaves.\nNot checking slave lag because no slaves were found and --check-slave-lag was not specified.\nOperation, tries, wait:\n  analyze_table, 10, 1\n  copy_rows, 10, 0.25\n  create_triggers, 10, 1\n  drop_triggers, 10, 1\n  swap_tables, 10, 1\n  update_foreign_keys, 10, 1\nAltering `test`.`primary_change`...\n--alter contains 'DROP PRIMARY KEY'.  Dropping and altering the primary key can be dangerous, especially if the original table does not have other unique indexes.\nThe tool should handle this correctly, but you should test it first and carefully examine the triggers which rely on the PRIMARY KEY or a unique index.  Specify --no-check-alter to disable this check and perform the --alter.\n--check-alter failed.\n`test`.`primary_change` was not altered.\n&#91;root@centos_1 ~]#<\/code><\/pre>\n\n\n\n<p>Pay careful attention here, out tool, pt-online-schema-change warns about the dangers of altering Primary Key without having a careful examination. That&#8217;s why the tool employs a security check and asks us to use &#8211;no-check-alter flag to confirm that we have done the due diligence for executing the change.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Data Loss by Primary Key change in MySQL<\/h2>\n\n\n\n<p>Let us not do the due-diligence and use &#8211;no-check-alter to run the command anyhow.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>&#91;root@centos_1 ~]# pt-online-schema-change    --print --progress time,10 --pause-file=\/tmp\/pt-primary-change.pause --no-swap-tables --no-drop-old-table --no-drop-new-table --no-drop-triggers --alter=\"DROP PRIMARY KEY, ADD PRIMARY KEY (new_id)\" --set-vars transaction_isolation=\\'READ-COMMITTED\\',lock_wait_timeout=30 h=localhost,D=test,t=primary_change  --no-check-unique-key-change --chunk-time=1 --new-table-name=_%T_primary_change_new --execute --no-check-alter\nNo slaves found.  See --recursion-method if host centos_1.localdomain has slaves.\nNot checking slave lag because no slaves were found and --check-slave-lag was not specified.\nOperation, tries, wait:\n  analyze_table, 10, 1\n  copy_rows, 10, 0.25\n  create_triggers, 10, 1\n  drop_triggers, 10, 1\n  swap_tables, 10, 1\n  update_foreign_keys, 10, 1\nAltering `test`.`primary_change`...\nCreating new table...\nCREATE TABLE `test`.`_primary_change_primary_change_new` (\n  `id` int NOT NULL,\n  `new_id` int NOT NULL,\n  PRIMARY KEY (`id`)\n) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci\nCreated new table test._primary_change_primary_change_new OK.\nAltering new table...\nALTER TABLE `test`.`_primary_change_primary_change_new` DROP PRIMARY KEY, ADD PRIMARY KEY (new_id)\nAltered `test`.`_primary_change_primary_change_new` OK.\n2024-02-26T05:43:53 Creating triggers...\n-----------------------------------------------------------\nEvent : DELETE\nName  : pt_osc_test_primary_change_del\nSQL   : CREATE TRIGGER `pt_osc_test_primary_change_del` AFTER DELETE ON `test`.`primary_change` FOR EACH ROW BEGIN DECLARE CONTINUE HANDLER FOR 1146 begin end; DELETE IGNORE FROM `test`.`_primary_change_primary_change_new` WHERE `test`.`_primary_change_primary_change_new`.`new_id` &lt;=&gt; OLD.`new_id`; END\nSuffix: del\nTime  : AFTER\n-----------------------------------------------------------\n-----------------------------------------------------------\nEvent : UPDATE\nName  : pt_osc_test_primary_change_upd\nSQL   : CREATE TRIGGER `pt_osc_test_primary_change_upd` AFTER UPDATE ON `test`.`primary_change` FOR EACH ROW BEGIN DECLARE CONTINUE HANDLER FOR 1146 begin end; DELETE IGNORE FROM `test`.`_primary_change_primary_change_new` WHERE !(OLD.`new_id` &lt;=&gt; NEW.`new_id`) AND `test`.`_primary_change_primary_change_new`.`new_id` &lt;=&gt; OLD.`new_id`; REPLACE INTO `test`.`_primary_change_primary_change_new` (`id`, `new_id`) VALUES (NEW.`id`, NEW.`new_id`); END\nSuffix: upd\nTime  : AFTER\n-----------------------------------------------------------\n-----------------------------------------------------------\nEvent : INSERT\nName  : pt_osc_test_primary_change_ins\nSQL   : CREATE TRIGGER `pt_osc_test_primary_change_ins` AFTER INSERT ON `test`.`primary_change` FOR EACH ROW BEGIN DECLARE CONTINUE HANDLER FOR 1146 begin end; REPLACE INTO `test`.`_primary_change_primary_change_new` (`id`, `new_id`) VALUES (NEW.`id`, NEW.`new_id`);END\nSuffix: ins\nTime  : AFTER\n-----------------------------------------------------------\n2024-02-26T05:43:53 Created triggers OK.\n2024-02-26T05:43:53 Copying approximately 5 rows...\nINSERT LOW_PRIORITY IGNORE INTO `test`.`_primary_change_primary_change_new` (`id`, `new_id`) SELECT `id`, `new_id` FROM `test`.`primary_change` LOCK IN SHARE MODE \/*pt-online-schema-change 1282 copy table*\/\n2024-02-26T05:43:53 Copied rows OK.\nNot dropping old table because --no-drop-triggers was specified.\nNot dropping triggers because --no-drop-triggers was specified.  To drop the triggers, execute:\nDROP TRIGGER IF EXISTS `test`.`pt_osc_test_primary_change_del`\nDROP TRIGGER IF EXISTS `test`.`pt_osc_test_primary_change_upd`\nDROP TRIGGER IF EXISTS `test`.`pt_osc_test_primary_change_ins`\nNot dropping the new table `test`.`_primary_change_primary_change_new` because --no-drop-new-table was specified.  To drop the new table, execute:\nDROP TABLE IF EXISTS `test`.`_primary_change_primary_change_new`;\nSuccessfully altered `test`.`primary_change`.\n&#91;root@centos_1 ~]#<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>mysql&gt; select * from _primary_change_primary_change_new;\n+----+--------+\n| id | new_id |\n+----+--------+\n|  1 |      1 |\n|  2 |      2 |\n|  3 |      3 |\n|  4 |      4 |\n+----+--------+\n4 rows in set (0.00 sec)<\/code><\/pre>\n\n\n\n<p>Note that the &#8220;new&#8221; table created by pt-online-schema-change has missing record for id=5. This happened because the new_id column had duplicate records &#8220;4&#8221; for both id &#8220;4&#8221; and &#8220;5&#8221;.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Why This Kolaveri Di?<\/h2>\n\n\n\n<p>I mean to ask, so why did we see data loss in previous Primary Key change using pt-online-schema-change?<\/p>\n\n\n\n<p>During the online schema change, <code class=\"\">pt-online-schema-change<\/code> creates a temporary table with the desired modifications. It then copies data from the original table to the new one, using <code class=\"\">INSERT IGNORE<\/code> statements to bypass potential duplicate key errors in the new primary key column. Additionally, the tool utilizes <code class=\"\">REPLACE INTO<\/code> statements within the created triggers, which can also lead to data loss if duplicates exist.<\/p>\n\n\n\n<p>I invite you to explore the code of the pt-online-schema-change, isn&#8217;t that the beauty of open-source?<br>This section of the nibbeling process uses INSERT IGNORE to avoid duplicate errors:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>my $dml = \"<strong>INSERT LOW_PRIORITY IGNORE INTO<\/strong> $new_tbl-&gt;{name} \"\n. \"(\" . join(', ', map { $q-&gt;quote($_-&gt;{new}) } @common_cols) . \") \"\n. \"SELECT\";\nmy $select = join(', ', map { $q-&gt;quote($_-&gt;{old}) } @common_cols);<\/code><\/pre>\n\n\n\n<p>Source: https:\/\/github.com\/percona\/percona-toolkit\/blob\/3.x\/bin\/pt-online-schema-change.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Ensure uniqueness of Primary Key in MySQL<\/h2>\n\n\n\n<p>A very easy query for identifying if the data of a column has duplicates is using GROUP BY .. HAVING clauses. Find a sample MySQL query below:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>SELECT col_name, COUNT(col_name)<br>FROM table_name<br>GROUP BY col_name<br>HAVING COUNT(col_name) &gt; 1;<\/code><\/pre>\n\n\n\n<p>If you&#8217;re candidate primary key has duplicate values, consider taking appropriate actions.<\/p>\n\n\n\n<p>Consider testing the online schema change in a non-production environment before applying it to your main database. Regularly back up your databases to ensure you have a reliable recovery point in case of unforeseen issues.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusion<\/h2>\n\n\n\n<p>With power comes responsibility &#8211; pt-online-schema-change is every DBA&#8217;s power tool but should be used responsibly. Keep keen eyes to the warnings presented by scripts. When you plan your primary key change with a proactive approach and attention to detail, you can navigate the complexities of primary key changes with confidence and safeguard your valuable data assets.<\/p>\n","protected":false},"excerpt":{"rendered":"Learn how to safely change primary keys in MySQL using pt-online-schema-change. This blog highlights tips to avoid data loss during online alter of primary key.\n","protected":false},"author":1,"featured_media":3312,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[869,8,870],"tags":[965,967,963,966],"class_list":{"0":"post-3304","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-mariadb","8":"category-mysql","9":"category-mysql-tools","10":"tag-avoid-data-loss-in-mysql","11":"tag-mysql-online-alter-primary-key","12":"tag-primary-key-change","13":"tag-use-pt-online-schema-change-to-alter-primary-key"},"aioseo_notices":[],"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/kedar.nitty-witty.com\/blog\/wp-json\/wp\/v2\/posts\/3304","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/kedar.nitty-witty.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/kedar.nitty-witty.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/kedar.nitty-witty.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/kedar.nitty-witty.com\/blog\/wp-json\/wp\/v2\/comments?post=3304"}],"version-history":[{"count":5,"href":"https:\/\/kedar.nitty-witty.com\/blog\/wp-json\/wp\/v2\/posts\/3304\/revisions"}],"predecessor-version":[{"id":3313,"href":"https:\/\/kedar.nitty-witty.com\/blog\/wp-json\/wp\/v2\/posts\/3304\/revisions\/3313"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/kedar.nitty-witty.com\/blog\/wp-json\/wp\/v2\/media\/3312"}],"wp:attachment":[{"href":"https:\/\/kedar.nitty-witty.com\/blog\/wp-json\/wp\/v2\/media?parent=3304"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/kedar.nitty-witty.com\/blog\/wp-json\/wp\/v2\/categories?post=3304"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/kedar.nitty-witty.com\/blog\/wp-json\/wp\/v2\/tags?post=3304"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}