How to fix Percona XtraDB Cluster 8.0 Upgrade issues

Percona XtraDB Cluster Upgrade

Are you planning an upgrade for your Percona XtraDB Cluster (PXC)? Upgrading to PXC 8.0 can be a smooth process, but sometimes challenges arise that require careful troubleshooting. In this blog post, we will talk about 2 specific issues that you might face during an upgrade of Percona XtraDB Cluster or Galera.

It is possible that you might already know of these as it’s been already been blogged about but consider this as a refresher.

Percona XtraDB Cluster issues

Upgrading from PXC 5.7 to PXC 8.0

Upgrading from PXC 5.7 to PXC 8.0 or Galera MySQL 8 typically follows a straightforward plan:

  • Backup Cluster Data: Begin by backing up your cluster data from one node to ensure data safety.
  • Shutdown Cluster Nodes: Temporarily shut down all cluster nodes to prepare for the upgrade.
  • Perform MySQL Upgrade: Upgrade MySQL on each node one at a time to the target version.
  • Bootstrap the Cluster: After upgrading, it’s time to bootstrap the cluster.

Note that you need to be prepared well and have performed proper upgrade testing before you migrate to the newer version. The Percona tool, pt-upgrade and MySQL’s upgrade checker utility are two amazing tools to help you with upgrading to MySQL 8. This blog is not exactly about How To upgrade but about the issues faced during an upgrade process.

Issue 1 – Bootstrapping PXC doesn’t bring up the cluster node

One common issue encountered during the upgrade is related to bootstrapping the Percona XtraDB Cluster. After upgrading PXC binaries and executing the bootstrap command the cluster not failed to start:

root@production: mysql # systemctl start mysql@bootstrap
Job for mysql@bootstrap.service failed because a timeout was exceeded. See "systemctl status mysql@bootstrap.service" and "journalctl -xe" for details.

For Galera Cluster node similar error may also appear while joining a cluster as follows:

Job for mysql.service failed because a timeout was exceeded

MySQL error log
2023-07-13T09:24:04.598299Z 0 [System] [MY-011323] [Server] X Plugin ready for connections. Bind-address: '::' port: 33060, socket: /var/lib/mysql/mysqlx.sock
2023-07-13T09:24:04.598348Z 0 [System] [MY-010931] [Server] /usr/sbin/mysqld: ready for connections. Version: '8.0.32-24.2' socket: '/var/lib/mysql/mysql.sock' port: 3306 Percona XtraDB
Cluster (GPL), Release rel24, Revision 2119e75, WSREP version 26.1.4.3.
2023-07-13T09:24:04.598439Z 0 [System] [MY-013172] [Server] Received SHUTDOWN from user . Shutting down mysqld (Version: 8.0.32-24.2).
2023-07-13T09:24:04.598506Z 0 [Note] [MY-000000] [WSREP] Received shutdown signal. Will sleep for 10 secs before initiating shutdown. pxc_maint_mode switched to SHUTDOWN
…
2023-07-13T10:14:43.389622Z 0 [Note] [MY-000000] [Galera] announce period timed out (pc.announce_timeout)
…
2023-07-13T10:15:13.402826Z 0 [ERROR] [MY-000000] [Galera] failed to open gcomm backend connection: 110: failed to reach primary view (pc.wait_prim_timeout): 110 (Connection timed out)
at gcomm/src/pc.cpp:connect():161
2023-07-13T10:15:13.402836Z 0 [ERROR] [MY-000000] [Galera] gcs/src/gcs_core.cpp:gcs_core_open():219: Failed to open backend connection: -110 (Connection timed out)
2023-07-13T10:15:14.403073Z 0 [Note] [MY-000000] [Galera] gcomm: terminating thread
2023-07-13T10:15:14.403130Z 0 [Note] [MY-000000] [Galera] gcomm: joining thread
2023-07-13T10:15:14.403354Z 0 [ERROR] [MY-000000] [Galera] gcs/src/gcs.cpp:gcs_open():1811: Failed to open channel 'prod-db' at 'gcomm://10.10.10.10,10.10.10.11,10.10.10.12': -110 (Con
nection timed out)
2023-07-13T10:15:14.403381Z 0 [ERROR] [MY-000000] [Galera] gcs connect failed: Connection timed out
2023-07-13T10:15:14.403401Z 0 [ERROR] [MY-000000] [WSREP] Provider/Node (gcomm://10.10.10.10,10.10.10.11,10.10.10.12) failed to establish connection with cluster (reason: 7)
2023-07-13T10:15:14.403417Z 0 [ERROR] [MY-010119] [Server] Aborting
The kernel log

[prod] percona@production: mysql@bootstrap.service.d $ sudo grep bootstrap /var/log/messages | grep -v pmm
...
Jul 13 10:37:15 production systemd: Starting Percona XtraDB Cluster with config /etc/sysconfig/mysql.bootstrap...
Jul 13 10:38:45 production systemd: mysql@bootstrap.service start-pre operation timed out. Terminating.
Jul 13 10:38:51 production systemd: Failed to start Percona XtraDB Cluster with config /etc/sysconfig/mysql.bootstrap.
Jul 13 10:38:51 production systemd: Unit mysql@bootstrap.service entered failed state.

Observing carefully all of them points to “timeout”. The Percona XtraDB Cluster couldn’t start before TimeoutStartSec (90s) and hence it was terminated.

Fixing Percona XtraDB Cluster Timeout Error

systemctl edit mysql@bootstrap 
(OR systemctl edit mysql.service #as per your service definition)
[Service]
TimeoutStartSec=6000

$ systemctl daemon-reload

$ systemctl show mysql.service -p TimeoutStartUSec
TimeoutStartUSec=1h 40min

After the change in service config, MySQL (Percona XtraDB Cluster node) started cleanly with the bootstrap command.

Issue 2 – PXC Cluster node doesn’t join cluster

To get the other nodes join the bootstrapped Percona XtraDB Cluster

[13/07/2023, 4:58:24,911 PM] [prod] root@production: mysql # systemctl start mysql
[13/07/2023, 4:58:24,912 PM] Job for mysql.service failed because the control process exited with error code. See "systemctl status mysql.service" and "journalctl -xe" for details.

The error output has following hints

[Galera] Handshake failed: tlsv1 alert unknown ca

[Galera] handshake with remote endpoint ssl://10.0.0.1:4567 failed: asio.ssl:336031996: 'unknown protocol' ( 336031996: 'error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol')
2023-07-13T13:04:13.369141Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.32-24.2) starting as process 8662
2023-07-13T13:04:13.381273Z 0 [ERROR] [MY-000059] [Server] SSL error: Unable to get private key from 'server-key.pem'.
2023-07-13T13:04:13.381316Z 0 [Warning] [MY-013595] [Server] Failed to initialize TLS for channel: mysql_main. See below for the description of exact issue.
2023-07-13T13:04:13.381333Z 0 [Warning] [MY-010069] [Server] Failed to set up SSL because of the following SSL library error: Unable to get private key
2023-07-13T13:04:13.381340Z 0 [Note] [MY-000000] [WSREP] New joining cluster node configured to use specified SSL artifacts
2023-07-13T13:04:13.381379Z 0 [Note] [MY-000000] [Galera] Loading provider /usr/lib64/galera4/libgalera_smm.so initial position: f0bc6f23-11bd-11ec-a9f8-8f101bd9a1be:2783648882
2023-07-13T13:04:13.381394Z 0 [Note] [MY-000000] [Galera] wsrep_load(): loading provider library '/usr/lib64/galera4/libgalera_smm.so'

This is a reminder that Percona XtraDB Cluster 8.0 by-default enables the pxc-encrypt-cluster-traffic and with that encrypting SST, IST, and replication traffic.

Configuring pxc-encrypt-cluster-traffic is non-dynamic change and will require MySQL restart. It is important to ensure verifying this change while preparing for upgrades for seamless experience.

Fixing SSL Handshake Failures in PXC 8.0

You may either disable the pxc-encrypt-cluster-traffic mode (joking) OR preferably create / use the existing pem files to ship across the cluster.

  • Bootstrap the Percona XtraDB Cluster node with SSL keys.
  • Transfer key files from the bootstrapped node to the remaining cluster nodes.
  • Start the remaining nodes to have them join the cluster successfully.

There’s an extensive blog I recommend reading by Marco on Percona Blog.

Alright, there ends this reminder blog for the upgrade to Percona XtraDB Cluster for MySQL 8.0. But before you go, there is one other issues at MySQL 8 upgrade that you must not forget and one of such corner case is about default for timestamp columns change in MySQL 8. This is an interesting use-case of handling an issue after the upgrade is done.

2 comments
Leave a Reply

Your email address will not be published. Required fields are marked *