{"id":3397,"date":"2024-11-18T07:45:08","date_gmt":"2024-11-18T07:45:08","guid":{"rendered":"https:\/\/kedar.nitty-witty.com\/blog\/?p=3397"},"modified":"2024-11-19T03:40:22","modified_gmt":"2024-11-19T03:40:22","slug":"mysql-8-utf8mb4_0900_ai_ci-collation-confusion","status":"publish","type":"post","link":"https:\/\/kedar.nitty-witty.com\/blog\/mysql-8-utf8mb4_0900_ai_ci-collation-confusion","title":{"rendered":"MySQL 8 utf8mb4_0900_ai_ci collation confusion"},"content":{"rendered":"\n<p>Recently I was asked a question: Why am I getting utf8mb4_0900_ai_ci as the default collation in MySQL 8, despite setting the server to use utf8mb4_general_ci?<\/p>\n\n\n\n<p>With the upgrade to <strong>MySQL 8<\/strong>, many users, like my friend, face confusion around <strong>character sets<\/strong> and <strong>collations<\/strong>, especially when the MySQL 5.7 default collation (e.g., <strong>utf8mb4_general_ci<\/strong>) is replaced by <strong>utf8mb4_0900_ai_ci<\/strong>. This post will explain why this happens and guide you through resolving such issues when upgrading to <strong>MySQL 8<\/strong>, focusing on <strong>utf8mb4<\/strong> and its default collation behavior.<\/p>\n\n\n\n<p>First, let&#8217;s check the current character set and collation settings in the MySQL server:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>mysql &#91;localhost:8032] {msandbox} (test) &gt; show global variables like '%character%';\n+--------------------------+--------------------------------------------+\n| Variable_name | Value |\n+--------------------------+--------------------------------------------+\n| character_set_client | utf8mb4 |\n| character_set_connection | utf8mb4 |\n| character_set_database | utf8mb4 |\n| character_set_filesystem | binary |\n| character_set_results | utf8mb4 |\n| character_set_server | utf8mb4 |\n| character_set_system | utf8mb3 |\n| character_sets_dir | \/opt\/percona_server\/8.0.32\/share\/charsets\/ |\n+--------------------------+--------------------------------------------+\n8 rows in set (0.01 sec)\nmysql &#91;localhost:8032] {msandbox} (test) &gt; show global variables like '%collat%';\n+-------------------------------+--------------------+\n| Variable_name | Value |\n+-------------------------------+--------------------+\n| collation_connection | utf8mb4_general_ci |\n| collation_database | utf8mb4_general_ci |\n| collation_server | utf8mb4_general_ci |\n| default_collation_for_utf8mb4 | utf8mb4_0900_ai_ci |\n+-------------------------------+--------------------+\n4 rows in set (0.01 sec)\n<\/code><\/pre>\n\n\n\n<p>Let&#8217;s create a table without specifying a collation explicitly:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>mysql &#91;localhost:8032] {msandbox} (test) &gt; create table cs (id int not null auto_increment primary key, val varchar(40)) ;\nQuery OK, 0 rows affected (0.05 sec)\n\nmysql &#91;localhost:8032] {msandbox} (test) &gt; show create table cs\\G\n* 1. row *\nTable: cs\nCreate Table: CREATE TABLE cs (\nid int NOT NULL AUTO_INCREMENT,val varchar(40) DEFAULT NULL,PRIMARY KEY (id)\n) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci\n1 row in set (0.00 sec)\n<\/code><\/pre>\n\n\n\n<p>We have utf8mb4_900_ai_ci and not utf8mb4_general_ci as a coalition. Why?<br>Is it because default_collation_for_utf8mb4 is still configured as utf8mb4_0900_ai_ci?<br><\/p>\n\n\n\n<p>Let&#8217;s change default_collation_for_utf8mb4 to utf8mb4_general_ci. Ensured that this also changed at the current session.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>mysql &#91;localhost:8032] {msandbox} (test) &gt; SET PERSIST default_collation_for_utf8mb4='utf8mb4_general_ci';\nQuery OK, 0 rows affected, 1 warning (0.00 sec)\n\nmysql &#91;localhost:8032] {msandbox} (test) &gt; show warnings;\n+---------+------+--------------------------------------------------------------------------------------------------------+\n| Level | Code | Message |\n+---------+------+--------------------------------------------------------------------------------------------------------+\n| Warning | 1681 | <strong>Updating 'default_collation_for_utf8mb4' is deprecated.<\/strong> It will be made read-only in a future release. |\n+---------+------+--------------------------------------------------------------------------------------------------------+\n1 row in set (0.00 sec)\n\nmysql &#91;localhost:8032] {msandbox} (test) &gt; show global variables like '%collat%';\n+-------------------------------+--------------------+\n| Variable_name | Value |\n+-------------------------------+--------------------+\n| collation_connection | utf8mb4_general_ci |\n| collation_database | utf8mb4_general_ci |\n| collation_server | utf8mb4_general_ci |\n| default_collation_for_utf8mb4 | utf8mb4_general_ci |\n+-------------------------------+--------------------+\n4 rows in set (0.00 sec)\n\nmysql &#91;localhost:8032] {msandbox} (test) &gt; create table csagain (id int not null auto_increment primary key, val varchar(40)) ;\nQuery OK, 0 rows affected (0.03 sec)\n\nmysql &#91;localhost:8032] {msandbox} (test) &gt; show create table csagain\\G\n* 1. row *\nTable: csagain\nCreate Table: CREATE TABLE csagain (\nid int NOT NULL AUTO_INCREMENT,val varchar(40) DEFAULT NULL,PRIMARY KEY (id)\n) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=<strong>utf8mb4_0900_ai_ci<\/strong>\n1 row in set (0.00 sec)\n<\/code><\/pre>\n\n\n\n<p>Well the table collation is still shown as <strong>utf8mb4_0900_ai_ci<\/strong>! <strong>Pon Suresh<\/strong> explains about SET NAMES in comments section.<\/p>\n\n\n\n<p>Now, I hope you noted the warning above &#8211; what we did is deprecated! There\u2019s more to it. The <strong>default_collation_for_utf8mb4<\/strong> system variable is for internal use by MySQL Replication only. What we did is not correct. It is there to support the replication from older versions of MySQL (5.7) to MySQL 8 and assist in avoiding replication errors.<\/p>\n\n\n\n<p>Refer: https:\/\/dev.mysql.com\/doc\/refman\/8.4\/en\/server-system-variables.html#sysvar_default_collation_for_utf8mb4<\/p>\n\n\n\n<p>Move on, do you think this could be fixed with SET NAMES? Let&#8217;s try<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>mysql &#91;localhost:8032] {msandbox} (test) &gt; SET NAMES 'utf8mb4' COLLATE 'utf8mb4_general_ci';\nQuery OK, 0 rows affected (0.00 sec)\n\nmysql &#91;localhost:8032] {msandbox} (test) &gt; drop table csagain;\nQuery OK, 0 rows affected (0.02 sec)\n\nmysql &#91;localhost:8032] {msandbox} (test) &gt; create table csagain (id int not null auto_increment primary key, val varchar(40)) ;\nQuery OK, 0 rows affected (0.03 sec)\n\nmysql &#91;localhost:8032] {msandbox} (test) &gt; show create table csagain\\G\n* 1. row *\nTable: csagain\nCreate Table: CREATE TABLE csagain (\nid int NOT NULL AUTO_INCREMENT,val varchar(40) DEFAULT NULL,PRIMARY KEY (id)\n) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci\n1 row in set (0.01 sec)\n<\/code><\/pre>\n\n\n\n<p>No luck! So why are we having collation as utf8mb4_0900_ai_ci and not utf8mb4_general_ci even after adjusting and ensuring the configuration remains utf8mb4_general_ci.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p style=\"font-size:14px\"><strong><em>I\u2019d like to express my gratitude to two individuals I deeply admire and learn from &#8211; Alok (The PoP) and Guptaji. Their relentless dedication to expanding their technical expertise, while also helping fellow consultants grow along the way, has been truly inspiring. Thank you both for your continuous efforts and for always being a source of knowledge and support.<\/em><\/strong><\/p>\n<cite>promised-to-be-paid promotion<\/cite><\/blockquote>\n\n\n\n<h3 class=\"wp-block-heading\">The Collation Realization<\/h3>\n\n\n\n<p>Every database has a database character set and a database collation. When a database is created without explicitly mentioning the character set and collation, those values get inherited by the server character set and server collation default values.<\/p>\n\n\n\n<p>Thus when we do examine the character set for current database we realize the reasoning:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>mysql &#91;localhost:8032] {msandbox} (test) &gt; show create database test;\n+----------+--------------------------------------------------------------------------------------------------------------------------------+\n| Database | Create Database                                                                                                            \t|\n+----------+--------------------------------------------------------------------------------------------------------------------------------+\n| test \t| CREATE DATABASE `test` \/*!40100 DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci *\/ \/*!80016 DEFAULT ENCRYPTION='N' *\/ |\n+----------+--------------------------------------------------------------------------------------------------------------------------------+\n1 row in set (0.00 sec)\n<\/code><\/pre>\n\n\n\n<p>If we create a new database, it will inherit the current settings, which is utf8mb4_general_ci.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>mysql &#91;localhost:8032] {msandbox} (test) &gt; show create database guptaji;\n+----------+-----------------------------------------------------------------------------------------------------------------------------------+\n| Database | Create Database                                                                                                               \t|\n+----------+-----------------------------------------------------------------------------------------------------------------------------------+\n| guptaji  | CREATE DATABASE `guptaji` \/*!40100 DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci *\/ \/*!80016 DEFAULT ENCRYPTION='N' *\/ |\n+----------+-----------------------------------------------------------------------------------------------------------------------------------+\n1 row in set (0.00 sec)\n<\/code><\/pre>\n\n\n\n<p>The answer to our riddle was to ALTER the old database to have an updated character set<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>mysql &#91;localhost:8032] {msandbox} (guptaji) &gt; alter database test COLLATE utf8mb4_general_ci;\nQuery OK, 1 row affected (0.02 sec)\n\nmysql &#91;localhost:8032] {msandbox} (guptaji) &gt; show create database test;\n+----------+--------------------------------------------------------------------------------------------------------------------------------+\n| Database | Create Database                                                                                                            \t|\n+----------+--------------------------------------------------------------------------------------------------------------------------------+\n| test \t| CREATE DATABASE `test` \/*!40100 DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci *\/ \/*!80016 DEFAULT ENCRYPTION='N' *\/ |\n+----------+--------------------------------------------------------------------------------------------------------------------------------+<\/code><\/pre>\n\n\n\n<p>Now if you create a table without explicitly mentioning the CHARSET or COLLATION, it will be inherited from here. That said, do not forget that you can always explicitly mention collate utf8mb4_general_ci.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>mysql &#91;localhost:8032] {msandbox} (test) &gt; drop table csagain;\nQuery OK, 0 rows affected (0.04 sec)\n\ncreate table csagain (id int not null auto_increment primary key, val varchar(40)) DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;\nQuery OK, 0 rows affected (0.05 sec)\n\nmysql &#91;localhost:8032] {msandbox} (test) &gt; show create table csagain\\G\n* 1. row *\nTable: csagain\nCreate Table: CREATE TABLE csagain (\nid int NOT NULL AUTO_INCREMENT,val varchar(40) COLLATE utf8mb4_general_ci DEFAULT NULL,PRIMARY KEY (id)\n) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci\n1 row in set (0.01 sec)\n<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>utf8mb4_0900_ai_ci vs utf8mb4_general_ci<\/strong><\/h3>\n\n\n\n<p>Here&#8217;s a quick comparison between <strong>utf8mb4_general_ci<\/strong> and <strong>utf8mb4_0900_ai_ci<\/strong> collations in MySQL 8, and when you might use each one<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td><strong>Feature<\/strong><\/td><td><strong>utf8mb4_general_ci<\/strong><\/td><td><strong>utf8mb4_0900_ai_ci<\/strong><\/td><\/tr><tr><td>Compatibility<\/td><td>Compatible with earlier MySQL versions (5.7 and earlier)<\/td><td>Designed for MySQL 8.0 and later<\/td><\/tr><tr><td>Character Comparison<\/td><td>Basic comparison, ignoring accents<\/td><td>More advanced comparison, considers accents and some linguistic rules<\/td><\/tr><tr><td>Performance<\/td><td>Faster for basic comparisons<\/td><td>Slightly slower due to more complex rules<\/td><\/tr><tr><td>Language Support<\/td><td>Good for general use, but limited for specific languages<\/td><td>Improved support for multiple languages and scripts<\/td><\/tr><tr><td>Unicode Support<\/td><td>Supports most Unicode characters<\/td><td>Enhanced Unicode support, including newer characters<\/td><\/tr><tr><td>Handling of Special Characters<\/td><td>Basic handling<\/td><td>More sophisticated handling of special characters and emojis<\/td><\/tr><tr><td>Case Sensitivity<\/td><td>Case-insensitive<\/td><td>Case-insensitive<\/td><\/tr><tr><td>Use Case<\/td><td>Suitable for general applications and legacy systems<\/td><td>Recommended for modern applications requiring precise sorting and comparison<\/td><\/tr><tr><td>Implementation<\/td><td>Simple and straightforward<\/td><td>More complex, but adheres to modern Unicode standards<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>utf8mb4_general_ci is a simpler, faster collation suitable for general use but may not handle certain linguistic nuances.<\/p>\n\n\n\n<p>utf8mb4_0900_ai_ci offers better support for internationalization and modern Unicode standards, making it preferable for applications requiring precise sorting and character handling.<\/p>\n\n\n\n<p>Choosing between the two depends on the specific needs of your application, particularly regarding language support and performance requirements. If asked my opinion in general, I\u2019d go with default and latest.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Conclusion<\/strong><\/h2>\n\n\n\n<p>This appeared to be a simple question from a friend but I think it had more into it than a simple answer. I hope my tests will help someone. I am going to conclude with quick actionable items for when you\u2019re dealing with such confusion follow these steps to investigate.<\/p>\n\n\n\n<p><strong>Check your current collation settings<\/strong> using SHOW VARIABLES LIKE &#8216;%collation%&#8217;.<\/p>\n\n\n\n<p><strong>Alter the database collation<\/strong> with ALTER DATABASE &lt;dbname&gt; COLLATE &lt;desired_collation&gt;.<\/p>\n\n\n\n<p><strong>Set default collation for new databases<\/strong> by configuring default_collation_for_utf8mb4 in the MySQL server settings (although note that this is deprecated in future MySQL versions).<\/p>\n\n\n\n<p><strong>Explicitly specify collation<\/strong> when creating tables if needed.<\/p>\n","protected":false},"excerpt":{"rendered":"Recently I was asked a question: Why am I getting utf8mb4_0900_ai_ci as the default collation in MySQL 8, despite setting the server to use utf8mb4_general_ci? With the upgrade to MySQL&hellip;\n","protected":false},"author":1,"featured_media":3399,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[869,8,867,6],"tags":[1035,1045,1031,1033,1044,1037,469,1040,1032,1046,1042,1014,1039,1034,1043,1038,1047,684,1041,1013,1036,1012],"class_list":{"0":"post-3397","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-mariadb","8":"category-mysql","9":"category-mysql-upgrade","10":"category-technical","11":"tag-alter-database-collation","12":"tag-change-collation-mysql","13":"tag-character-set-mysql","14":"tag-character-set-utf8mb4","15":"tag-database-collation-settings","16":"tag-default-collation-mysql","17":"tag-mysql-8","18":"tag-mysql-8-collation-default","19":"tag-mysql-8-upgrade-collation","20":"tag-mysql-8-upgrade-issues","21":"tag-mysql-character-set-and-collation","22":"tag-mysql-collation","23":"tag-mysql-collation-behavior","24":"tag-mysql-collation-comparison","25":"tag-mysql-database-collation","26":"tag-mysql-utf8mb4","27":"tag-mysql-utf8mb4-default-collation","28":"tag-upgrade-mysql-8","29":"tag-utf8mb4-collation","30":"tag-utf8mb4_0900_ai_ci","31":"tag-utf8mb4_0900_ai_ci-vs-utf8mb4_general_ci","32":"tag-utf8mb4_general_ci"},"aioseo_notices":[],"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/kedar.nitty-witty.com\/blog\/wp-json\/wp\/v2\/posts\/3397","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/kedar.nitty-witty.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/kedar.nitty-witty.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/kedar.nitty-witty.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/kedar.nitty-witty.com\/blog\/wp-json\/wp\/v2\/comments?post=3397"}],"version-history":[{"count":6,"href":"https:\/\/kedar.nitty-witty.com\/blog\/wp-json\/wp\/v2\/posts\/3397\/revisions"}],"predecessor-version":[{"id":3414,"href":"https:\/\/kedar.nitty-witty.com\/blog\/wp-json\/wp\/v2\/posts\/3397\/revisions\/3414"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/kedar.nitty-witty.com\/blog\/wp-json\/wp\/v2\/media\/3399"}],"wp:attachment":[{"href":"https:\/\/kedar.nitty-witty.com\/blog\/wp-json\/wp\/v2\/media?parent=3397"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/kedar.nitty-witty.com\/blog\/wp-json\/wp\/v2\/categories?post=3397"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/kedar.nitty-witty.com\/blog\/wp-json\/wp\/v2\/tags?post=3397"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}