MySQL fragmentation and how to defragment

In a MySQL database, fragmentation can occur over time as data is inserted, updated, and deleted. Fragmentation refers to the inefficient storage and organization of data within database objects, such as tables and indexes. This can lead to decreased performance, increased disk space usage, and slower query execution.

In this blog post, we will see how to identify fragmentation in MySQL and effective techniques to defragment the database for optimal performance.

Understanding MySQL Fragmentation

InnoDB is the default storage engine for MySQL. It utilizes a clustered index structure known as the B-tree, where data is physically sorted based on the primary key. InnoDB also employs secondary indexes for efficient data retrieval.
Fragmentation can occur due to various reasons, including data insertion, modifications, or record deletions.

This fragmentation manifests in two main forms:

Data Fragmentation: Data fragmentation occurs when data pages within a table are scattered across the disk. This leads to slower disk access times and increased I/O operations for retrieving the required data.

Index Fragmentation: Index fragmentation happens when the logical order of index entries becomes disorganized due to data modifications or deletions. This can result in slower index scans and increased disk space usage.

How to defragment a table in MySQL

OPTIMIZE TABLE

This command reorganizes the physical storage of table data and associated index data, to reduce storage space and improve I/O efficiency when accessing the table. For InnoDB table, OPTIMIZE table is mapped directly to ALTER TABLE command which is also noted in the output.

mysql> OPTIMIZE TABLE table_name;
+-----------------+----------+----------+-------------------------------------------------------------------+
| Table | Op | Msg_type | Msg_text |
+-----------------+----------+----------+-------------------------------------------------------------------+
| test.table_name | optimize | note | Table does not support optimize, doing recreate + analyze instead |
| test.table_name | optimize | status | OK |
+-----------------+----------+----------+-------------------------------------------------------------------+
ALTER TABLE

The ALTER TABLE command in MySQL allows for the reorganization of tables, indexes, and data storage. It can be used to rebuild indexes, defragment data, and optimize table structures. The syntax for rebuilding a table is as follows:

ALTER TABLE table_name ENGINE=engine_type;

This command rebuilds the table using the specified storage engine, effectively defragmenting the data and indexes. However, it may require substantial downtime for large tables and can impact database availability during the defragmentation process. For a large table this is not ideal and that’s the time when Percona’s pt-online-schema-change tool comes handy.

pt-online-schema-change

pt-online-schema-change, part of the Percona Toolkit, provides an online method for altering tables while minimizing downtime. It uses a trigger-based mechanism to create a temporary copy of the original table, applies changes to it, and synchronizes the changes with the original table. This allows for concurrent read and write operations during the defragmentation process.

How to identify MySQL fragmentation

Here we will use following query to identify top 10 fragmented tables and try to reclaim the space.

mysql> SELECT table_schema, table_name , (DATA_FREE)/(102410241024) as FragGB, (DATA_LENGTH+INDEX_LENGTH)/(102410241024) as totalSizeGB, (data_free/(index_length+data_length)) frag_ratio FROM information_schema.tables where data_free>0 order by 3 desc limit 10;
+--------------+------------+--------+-------------+--------------------+
| TABLE_SCHEMA | TABLE_NAME | FragGB | totalSizeGB | frag_ratio         |
+--------------+------------+--------+-------------+--------------------+
| employees    | table_name | 0.3936 | 0.2776      | 1.4176             |
+--------------+------------+--------+-------------+--------------------+

The table status in MySQL will show the column data_free indicating the fragmented space. The said table appears to be of size 284M with fragmentation of 403M.

mysql> show table status like 'table_name'\G
* 1. row *
Name: table_name
Engine: InnoDB
Version: 10
Row_format: Dynamic
Rows: 753417
Avg_row_length: 395
Data_length: 298090496
Max_data_length: 0
Index_length: 0
Data_free: 422576128
Auto_increment: 11455175
Create_time: 2023-07-19 14:04:11
Update_time: 2023-07-22 15:27:37
Check_time: NULL
Collation: utf8mb3_general_ci
Checksum: NULL
Create_options:
Comment:
1 row in set (0.00 sec)

The file size is longer:

mysql> ! ls -lhtr /var/lib/mysql/employees/table_name*
-rw-r-----. 1 mysql mysql 440M Jul 22 12:55 /var/lib/mysql/employees/table_name.ibd

Let’s run a dummy alter using pt-online-schema-change

pt-online-schema-change --execute \
h=localhost,D=employees,t=table_name \
--alter=" ENGINE=InnoDB" \
--pause-file=/tmp/pt-osc_dummy_alter.pause \
--print --progress time,10 \
--no-swap-tables --no-drop-old-table --no-drop-new-table --no-drop-triggers \
--tries create_triggers:10000:1,drop_triggers:10000:1,copy_rows:10000:1 \
--set-vars transaction_isolation='READ-COMMITTED',lock_wait_timeout=60 \
--critical-load threads_running=100 \
--max-load Threads_running=50 \
--chunk-time=1 \
--max-lag 50

...

2023-07-22T13:14:01 Created triggers OK.
2023-07-22T13:14:01 Copying approximately 753417 rows...
INSERT LOW_PRIORITY IGNORE INTO `employees`.`_table_name_new` (`emp_no`, `salary`, `from_date`, `to_date`, `id`) SELECT `emp_no`, `salary`, `from_date`, `to_date`, `id` FROM `employees`.`table_name` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= ?)) AND ((`id` <= ?)) LOCK IN SHARE MODE /*pt-online-schema-change 11168 copy nibble*/
SELECT /*!40001 SQL_NO_CACHE */ `id` FROM `employees`.`table_name` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= ?)) ORDER BY `id` LIMIT ?, 2 /*next chunk boundary*/
Copying `employees`.`table_name`:  95% 00:00 remain

2023-07-22T13:14:11 Copied rows OK.
Not dropping old table because --no-drop-triggers was specified.
Not dropping triggers because --no-drop-triggers was specified.  To drop the triggers, execute:
DROP TRIGGER IF EXISTS `employees`.`pt_osc_employees_table_name_del`
DROP TRIGGER IF EXISTS `employees`.`pt_osc_employees_table_name_upd`
DROP TRIGGER IF EXISTS `employees`.`pt_osc_employees_table_name_ins`
Not dropping the new table `employees`.`_table_name_new` because --no-drop-new-table was specified.  To drop the new table, execute:
DROP TABLE IF EXISTS `employees`.`_table_name_new`;
Successfully altered `employees`.`table_name`.
[root@ip-172-31-89-193 ~]#

After the dummy alter, the table size has reduced significantly:

[root@ip-172-31-89-193 ~]# ls -lhtr /var/lib/mysql/employees/ | grep table_name
-rw-r-----. 1 mysql mysql 440M Jul 22 12:55 table_name.ibd
-rw-r-----. 1 mysql mysql 36M Jul 22 13:14 _table_name_new.ibd
[root@ip-172-31-89-193 ~]#

All we need to do now is swap the table as follows:

RENAME TABLE employees.table_name TO employees.table_name_old, employees._table_name_new TO employees.table_name ;
DROP TRIGGER IF EXISTS `employees`.`pt_osc_employees_table_name_del`;
DROP TRIGGER IF EXISTS `employees`.`pt_osc_employees_table_name_upd`;
DROP TRIGGER IF EXISTS `employees`.`pt_osc_employees_table_name_ins`;

In case you don’t want to take this control and let pt-online-schema-change handles this, you may consider removing following options from the command: “–no-swap-tables –no-drop-old-table –no-drop-new-table –no-drop-triggers”

As you can see the table is now only of 36M. You may choose to drop the “old” table in low traffic time.

mysql> ! ls -lhtr /var/lib/mysql/employees/table_name*
-rw-r-----. 1 mysql mysql 440M Jul 22 13:18 /var/lib/mysql/employees/table_name_old.ibd
-rw-r-----. 1 mysql mysql 36M Jul 22 13:18 /var/lib/mysql/employees/table_name.ibd
mysql>

By using pt-online-schema-change, you can defragment large tables without affecting database availability. The tool intelligently manages the replication of changes, preserves data integrity, and provides options for monitoring and controlling the defragmentation process.

Conclusion

Fragmentation in MySQL can negatively impact database performance, disk space utilization, and query execution speed. Understanding the underlying architecture and employing effective defragmentation techniques are vital for maintaining optimal database operations. The ALTER TABLE command and pt-online-schema-change offer powerful options for defragmentation, allowing you to rebuild tables, optimize indexes, and enhance overall performance. By incorporating these techniques into your database maintenance routine, you can ensure a well-optimized and efficient MySQL environment.

Remember, regular monitoring, analysis, and defragmentation of your MySQL database are essential for maintaining optimal performance and ensuring a smooth user experience.

Leave a Reply

Your email address will not be published. Required fields are marked *