MySQL Data Archiving basics – considerations, mistakes, best practices

This blog is dedicated to MySQL data archiving; we’ll break down from the basics of what data archival is, why it matters, best practices and scenarios. We will also see pre-archiving considerations and mistakes to avoid during archiving. Towards the end of the post, we also have podcast discussion over the subject of MySQL Archiving Basics.

What is data archival in MySQL?

In simple terms, data archival is the process of moving historical or infrequently accessed data out of your active database and into long-term storage. Think of it like cleaning out your closet—you don’t throw away your winter clothes in the summer, but you store them somewhere else until you need them again.

In the context of MySQL, this means identifying data that’s no longer actively used—like old logs, transaction records, or customer orders—and moving it to an archive table, a separate database, or even an external storage system.

But why is this important? Well, as your database grows, it can become bloated with data that’s rarely accessed. This can slow down queries, increase backup times, and eat up valuable disk space. Archiving helps keep your database lean, fast, and efficient.

Why Archive Data in MySQL?

Performance Improvement: Smaller databases mean faster queries. By archiving old data, you reduce the size of your active tables, which can significantly speed up read and write operations.

Cost Savings: Storing data isn’t free. Whether you’re using local disks or cloud storage, archiving old data to cheaper storage solutions can save you money. This also reduces the backup size and further storage requirements for them.

Compliance and Retention: Many industries have regulations that require you to keep data for a certain period. Archiving ensures you meet these requirements without cluttering your active database.

Easier Maintenance: A smaller database is easier to back up, restore, and maintain. It also reduces the risk of running out of disk space unexpectedly.

When to Archive Data?

A good rule of thumb is to archive data that’s no longer actively used. But it also depends on the company policy or the compliance requirements.

For example:
– Logs older than 6 months or even 1 month depending on your applications. (It is better not to store application logs in database to start with.)
– Transaction records older than 1 year.
– Customer orders that have been fulfilled and are no longer needed for daily operations.

You can also set up automated archival processes based on time or data size. For instance, you might archive data every quarter or when your database reaches a certain size.

How to Archive Data in MySQL?

Export and Delete

This is very basic approach is to export old data to an archive table or file, and then delete it from the main table. You can use MySQL’s INSERT INTO … SELECT statement to move data to an archive table, and then delete it from the original table manually. This is not performant or efficient for large datasets.

Using pt-online-schema-change for archiving

It is possible to use pt-online-schema-change with --where option to only store required data discarding everything else. This is particularly useful when the data to be archived is much larger than the data that needs to remain in the table. An example: Purge data order than one month for user_logs table using log_date column.

pt-online-schema-change \
--alter "ENGINE=InnoDB" \
--where="log_date >= date_sub(now(), interval 1 MONTH)" \
…
h=localhost,D=your_db,t=user_logs \
--dry-run

pt-archiver for MySQL data archiving

pt-archiver is a versatile tool from the Percona Toolkit that allows you to efficiently manage data by archiving, deleting, or copying it. One of its standout features is its flexibility in handling different data transfer scenarios.

Typical use cases for pt-archiver would be
– Archive data without taking backup to a file / table in chunks
– Copy data to the same machine but a different table
– Copy data to a different machine
– Export data to a file

This flexibility, combined with its automation capabilities, makes pt-archiver an excellent choice for database maintenance tasks. You can further streamline these operations by scheduling them using cron jobs or task schedulers, ensuring your database remains clean and efficient without manual intervention.

Partitioning and MySQL archiving

MySQL supports table partitioning, which allows you to split a large table into smaller, more manageable pieces. For example, you can partition a table by date, with monthly partitions.

Partitioning also helps improving queries and archiving specific partitioning is comparatively better managed. The partitioned then can be moved to an history or archive table while main table holds only latest partitions as required.

Using MySQL native commands for partitioned tables It is possible to archive table partitions to a backup using EXCHANGE PARTITION command with minimal impact. I’m sharing basic commands to give you an idea to develop further on this.

CREATE TABLE new_table LIKE source_table;
ALTER TABLE new_table REMOVE PARTITIONING;
ALTER TABLE source_table EXCHANGE PARTITION partition_name WITH TABLE;
ALTER TABLE target_table EXCHANGE PARTITION partition_name WITH TABLE;

Archival Doesn’t Free Up Disk Space

Now, here’s something important to keep in mind: archiving data won’t free up disk space.

When you delete data from a MySQL table, the space isn’t immediately released back to the operating system. Instead, MySQL marks that space as available for reuse. This means your disk usage might not decrease even after archiving and deleting data. Archiving process will create the holes in the database pages, known as fragmentation.

To actually reclaim that disk space, you’ll need to defragment the table. Here’s how you can do that:
1. Running OPTIMIZE TABLE on a table will defragment it and reclaim unused space. For example:
OPTIMIZE TABLE your_table_name;
This command rebuilds the table and releases the freed space back to the operating system.

2. Another way to defragment a table is by using ALTER TABLE.
ALTER TABLE your_table_name ENGINE=InnoDB;
This essentially rebuilds the table, which has the same effect as OPTIMIZE TABLE.

Keep in mind that both of these operations can be resource-intensive, so it’s best to run them during off-peak hours.

3. The recommended approach is to use pt-online-schema-change with NULL ALTER (ENGINE=INNODB), which allows rebuilding the table with minimal locking while ensuring the existing table remains operational.

Before you start archiving

Plan Ahead: Define clear archival policies. Decide what data to archive, how often, and where to store it.
What data to archive, based on dates, IDs, numbers and validate the data being archived.

Test Your Processes: Always test your archival scripts or tools on a staging environment before running them in production. This ensures you don’t accidentally lose important data.

Backup First: Before deleting any data, make sure you have a backup. This is your safety net in case something goes wrong.

Monitor Performance: After archiving, monitor your database performance to ensure everything is running smoothly.
If you have replicas used for read, consider the lag isn’t impacted with the archiving process. Note that pt-archiver comes with options –check-replica-lag which dynamically throttles the archiving process based on identified lag on the replicas.

Monitor Archiving process: Ensure that you monitor the archival process configured / cronned. Any failures should be identified sooner than later.

Document Everything: Keep a record of your archival processes, including what data was archived, when, and where it’s stored. This is especially important for compliance and auditing purposes.
Ensure widespread awareness of your archiving strategy across your team. A well-defined plan is critical to avoid unintended data loss.

For instance, I’ve encountered situations where clients lost archived data on replicas because their strategy failed to account for the fact that replicas often hold more data than the primary database. This oversight can lead to inconsistencies and data gaps.

Common Mistakes in MySQL Archiving

Archiving Without Backups:
Always back up your data before archiving. You don’t want to realize too late that you’ve deleted something important.

Ignoring Relationships:
If your data has relationships (like foreign keys), make sure you archive related records together. Otherwise, you could end up with broken links in your database.

Over-Archiving:
Don’t archive data too aggressively. Make sure you’re not archiving data that might still be needed for reporting or analysis.

Modifying tables:
Ensure that the archiving process doesn’t fail due to table modifications. If data is being copied to a destination table, implement the definition change on both source and target archival table as well to avoid possible data loss.

Failing to Automate:
Manual archival processes are time-consuming and error-prone. Automate wherever possible to save time and reduce risks.

Forgetting to Defragment:
As we discussed earlier, don’t forget to run OPTIMIZE TABLE or ALTER TABLE after archiving to reclaim disk space.

Podcast MySQL Archiving Basics

Conclusion

To wrap up, data archival is a powerful way to keep your MySQL database running smoothly. By moving old data out of your active database, you can improve performance, save on storage costs, and ensure compliance with data retention policies.

But remember, archival alone won’t free up disk space—you’ll need to defragment your tables using OPTIMIZE TABLE or ALTER TABLE to reclaim that space.

If you haven’t already, I encourage you to take a look at your database and see where archival might help. Start small, test your processes, and don’t forget to back up your data!