How to make MaxScale High Available with Corosync/Pacemaker

admin · 发表于 2017-12-29 13:04:38

马上注册，结交更多好友，享用更多功能，让你轻松玩转社区。

您需要登录才可以下载或查看，没有账号？开始注册

x

MaxScale, an open-source database-centric router for MySQL and MariaDB makes High Availability possible by hiding the complexity of backends and masking failures. MaxScale itself however is a single application running in a Linux box between the client application and the databases - so how do we make MaxScale High Available? This blog post shows how to quickly setup a Pacemaker/Corosync environment and configure MaxScale as a managed cluster resource.
Anyone following the instructions detailed here, modifying configuration files and issuing system and software checks could create a complete setup with three Linux Centos 6.5 servers and unicast heartbeat mode.
In a few steps MaxScale will be ready for basic HA operations and one simple failure test, the running process manually killed, is showed as an example.
We make the following assumptions here:
The solution is a quick setup example that may not be suited for all production environments.
Pacemaker/Corosync and crmsh command line tools usage is known at a basic level
A Virtual IP is set providing the access to the MaxScale process
MaxScale is already configured and working with a MariaDB/MySQL replication setup or MariaDB Galera Cluster
MaxScale process is started/stopped and monitored via /etc/init.d/maxscale LSB compatible script that is available in RPM package from version 1.0. The script might be found in the GitHub repository, for Ubuntu as well.
Step 1 - Clustering Software installation
On each cluster node do the following operations:
Let’s start enabling  a new repo
# vi /etc/yum.repos.d/ha-clustering.repo
and add the following lines to the file
[haclustering]
name=HA Clustering

baseurl=http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-6/

enabled=1

gpgcheck=0
Now install the software.
# yum install pacemaker corosync crmsh
Please note the packages versions used in this setup are:
pacemaker-1.1.10-14.el6_5.3.x86_64
corosync-1.4.5-2.4.x86_64
crmsh-2.0+git46-1.1.x86_64
Step 2 - Configuring  the system
Let’s begin assigning the hostname to each node:
The node names are: node1,node,node3
# hostname node1
...
# hostname nodeN
and write entries in /etc/hosts:
For each node add the server names and current-node, that is as an alias for the current server.
# vi /etc/hosts
10.74.14.39    node1
10.228.103.72 node2
10.35.15.26    node3 current-node    ...
# vi /etc/hosts
10.74.14.39    node1 current-node
10.228.103.72 node2
10.35.15.26    node3
Prepare authkey for optional cryptographic use:
On one of the nodes, say node2 run the corosync-keygen utility and follow the instructions.
[root@node2 ~]# corosync-keygen

Corosync Cluster Engine Authentication key generator.
   Gathering 1024 bits for key from /dev/random.
   Press keys on your keyboard to generate entropy.
After completion the key will be found in /etc/corosync/authkey.
Now let’s create the corosync configuration file:
[root@node2 ~]# vi /etc/corosync/corosync.conf
Add the following content to the file:
# Please read the corosync.conf.5 manual page
compatibility: whitetank

totem {
      version: 2
      secauth: off
      interface {
            member {
                     memberaddr: node1
            }
            member {
                     memberaddr: node2
            }
            member {
                     memberaddr: node3
            }
      ringnumber: 0
               bindnetaddr: current-node
               mcastport: 5405
               ttl: 1
      }
      transport: udpu
}

logging {
      fileline: off
      to_logfile: yes
      to_syslog: yes
      logfile: /var/log/cluster/corosync.log
      debug: off
      timestamp: on
      logger_subsys {
            subsys: AMF
            debug: off

      }
}

# this will start Pacemaker processes
service {
ver: 0
name: pacemaker
}
A few notes here:
Unicast UDP is used
bindnetaddr for Corosync process is “current-node”, that has the right value on each node due to the alias added in /etc/hosts above
Pacemaker processes are started by the Sorosync daemon, so there is no need to launch it via /etc/init.d/pacemaker start
We can now copy configuration files and auth key on each of the other nodes:
[root@node2 ~]# scp /etc/corosync/*  root@node1:/etc/corosync/
...
[root@node2 ~]# scp /etc/corosync/*  root@nodeN:/etc/corosync/
Step 3 - Start the Cluster
The Cluster can be started now but let’s do additional checks before proceeding. Corosync needs UDP port 5405 to be opened so we need to configure any firewall or iptables accordingly.
For a quick start just disable iptables on each nodes:
[root@node2 ~]# service iptables stop
…
[root@nodeN ~]# service iptables stop
Let’s start Corosync on each node:
[root@node2 ~] #/etc/init.d/corosync start
…
[root@nodeN ~] #/etc/init.d/corosync start
and check if the corosync daemon is successfully bound to port 5405:
[root@node2 ~] #netstat -na | grep 5405

udp       0    0 10.228.103.72:5405       0.0.0.0:*
Check also if other nodes are reachable with nc utility and option UDP (-u):
[root@node2 ~] #echo "check ..."  | nc -u node1 5405
[root@node2 ~] #echo "check ..."  | nc -u node3 5405
…
[root@node1 ~] #echo "check ..."  | nc -u node2 5405
[root@node1 ~] #echo "check ..."  | nc -u node3 5405
If the following message is displayed:
nc: Write error: Connection refused
there is an issue with communication between the nodes, this is most likely to be an issue with the firewall configuration on your nodes.
Please check and resolve issues with your firewall configuration.
We can check the cluster status, from any node, with this command:
[root@node3 ~]# crm status
After a while the output will look like:
[root@node3 ~]# crm status
Last updated: Mon Jun 30 12:47:53 2014
Last change: Mon Jun 30 12:47:39 2014 via crmd on node2
Stack: classic openais (with plugin)
Current DC: node2 - partition with quorum
Version: 1.1.10-14.el6_5.3-368c726
3 Nodes configured, 3 expected votes
0 Resources configured

Online: [ node1 node2 node3 ]
The Cluster has been started successfully, that’s the first achievement so far!
Please note, in the basic setup we will disable the following properties:
stonith
quorum policy
[root@node3 ~]# crm configure property 'stonith-enabled'='false'
[root@node3 ~]# crm configure property 'no-quorum-policy'='ignore'
After these commands the configuration is automatically updated on every node and we want to check it from another node, say node1
[root@node1 ~]# crm configure show

node node1
node node2
node node3
property cib-bootstrap-options: \
dc-version=1.1.10-14.el6_5.3-368c726 \
cluster-infrastructure="classic openais (with plugin)" \
expected-quorum-votes=3 \
stonith-enabled=false \
no-quorum-policy=ignore \
placement-strategy=balanced \
default-resource-stickiness=infinity
Well done, the Corosync / Pacemaker cluster is now ready to manage resources, in the next steps we’ll add MaxScale.
Step 4 - Check MaxScale init script
The new MaxScale /etc/init.d./maxscale script allows to start/stop/restart and monitor MaxScale process running in the system.
The script found in the RPM package is already working with the following path: /usr/local/skysql/maxscale
It might be necessary to modify some variables such as MAXSCALE_HOME to match the installation directory you choose when you installed MaxScale or MAXSCALE_PIDFILE or LD_LIBRARY_PATH
We assume here MaxScale is configured with a MariaDB/MySQL replication setup or MariaDB Galera Cluster and those servers might be located in the three Linux boxes we are using or anywhere else.
Following commands should be issued on each node, assuring the application could run and managed:
[root@node1 ~]# /etc/init.d/maxscale
Usage: /etc/init.d/maxscale {start|stop|status|restart|condrestart|reload}
Start
[root@node1 ~]# /etc/init.d/maxscale start
Starting MaxScale: maxscale (pid 25892) is running...    [  OK  ]
Start again
[root@node1 ~]# /etc/init.d/maxscale start
Starting MaxScale:  found maxscale (pid  25892) is running.[  OK  ]
Stop
[root@node1 ~]# /etc/init.d/maxscale stop
Stopping MaxScale:                                        [  OK  ]
Stop again
[root@node1 ~]# /etc/init.d/maxscale stop
Stopping MaxScale:                                        [FAILED]
Status (MaxScale not running)
[root@node1 ~]# /etc/init.d/maxscale status
MaxScale is stopped                                     [FAILED]
Status (MaxScale is running)
[root@node1 ~]# /etc/init.d/maxscale status
Checking MaxScale status: MaxScale (pid  25953) is running.[  OK  ]
As MaxScale script is LSB compatible, returns the proper exit code for each action, it’s now possible to configure the application as a resource in Pacemaker, next step will show how to do it.
Step 5 - Configure MaxScale as a cluster resource
We are assuming here MaxScale could run on each node with the same configuration file.
[root@node2 ~]# crm configure primitive MaxScale lsb:maxscale \
op monitor interval=”10s” timeout=”15s” \
op start interval=”0” timeout=”15s” \
op stop interval=”0” timeout=”30s”
The command above has configured MaxScale as a LSB resource, note “lsb:maxscale”
In Pacemaker there are two different ways for managing applications:
Resource Agents (VIP, MySQL, Filesystem etc)
LSB scripts for applications that don’t require the complexity of a resource agent and custom applications, in general.
MaxScale itself manages the backend servers we had configured in etc/MaxScale.cnf service sections such as:
[RW Split Router]
type=service
router=readwritesplit
servers=server1,server2,server3,server4,server5,server6,server7
user=maxuser
passwd=maxpwd
So we only want Pacemaker to manage the MaxScale process and the LSB approach is well suitable here.
If everything is fine we should see the resource running:
[root@node2 ~]# crm status
Last updated: Mon Jun 30 13:15:34 2014
Last change: Mon Jun 30 13:15:28 2014 via cibadmin on node2
Stack: classic openais (with plugin)
Current DC: node2 - partition with quorum
Version: 1.1.10-14.el6_5.3-368c726
3 Nodes configured, 3 expected votes
1 Resources configured

Online: [ node1 node2 node3 ]

MaxScale (lsb:maxscale): Started node1
Well done, another achievement here!
We now have MaxScale running via Pacemaker and we don’t need anymore to have it started via /etc/init.d at boot time! Pacemaker will do all the job but it needs to be started at boot: with CentOS 6.5 setup we need at least:
# chkconfig maxscale off
# chkconfig corosync on
Step 6 - Does the HA software work? Let’s see a resource restarted after a failure:
MaxScale application is now managed by the HA clustering software but what does it mean?
Will the application be restarted in case of any failure? It should be!
We try now to kill the MaxScale process and see what will happen ...
As we now MaxScale PID could be easily found in $MAXSCALE_HOME/log/maxscale.pid
In this example the PID is 26114, and we kill the process with brute force:
[root@node2 ~]# kill -9 26114

[root@node2 ~]# crm status
Last updated: Mon Jun 30 13:16:11 2014
Last change: Mon Jun 30 13:15:28 2014 via cibadmin on node2
Stack: classic openais (with plugin)
Current DC: node2 - partition with quorum
Version: 1.1.10-14.el6_5.3-368c726
3 Nodes configured, 3 expected votes
1 Resources configured

Online: [ node1 node2 node3 ]

Failed actions:
MaxScale_monitor_15000 on node1 'not running' (7): call=19, status=complete, last-rc-change='Mon Jun 30 13:16:14 2014', queued=0ms, exec=0ms
Note the MaxScale_monitor failed action above and ... after a few seconds it will be started again:
[root@node2 ~]# crm status
Last updated: Mon Jun 30 13:16:22 2014
Last change: Mon Jun 30 13:15:28 2014 via cibadmin on node1
Stack: classic openais (with plugin)
Current DC: node2 - partition with quorum
Version: 1.1.10-14.el6_5.3-368c726
3 Nodes configured, 3 expected votes
1 Resources configured

Online: [ node1 node2 node3 ]

MaxScale (lsb:maxscale): Started node1
The Clustering HA software will keep MaxScale running in one of the three Linux boxes we have but … which node? and how could we connect to MaxScale from our client application if we don’t know where it runs?
# mysql -h $MAXSCALE_IP -P 4006 -utest -p test
What is the $MAXSCALE_IP then? Let’s Follow the last step ...
Step 7 - Add a Virtual IP (VIP) to the cluster
The solution for $MAXSCALE_IP is that MaxScale process should be contacted using one known IP, that may move across nodes with MaxScale as well.
The setup is very easy: assuming an addition IP address is available and that it can be added to one of the nodes, this i the new configuration to add:
[root@node2 ~]# crm configure primitive maxscale_vip ocf:heartbeat:IPaddr2 params ip=192.168.122.125 op monitor interval=10s
There is of course another action to do: MaxScale process and the VIP must be run in the same node, so it’s mandatory to add to the configuration the group ‘maxscale_service’.
[root@node2 ~]# crm configure group maxscale_service maxscale_vip MaxScale
Here is the final configuration:
[root@node3 ~]# crm configure show
node node1
node node2
node node3
primitive MaxScale lsb:maxscale \
op monitor interval=15s timeout=10s \
op start interval=0 timeout=15s \
op stop interval=0 timeout=30s
primitive maxscale_vip IPaddr2 \
params ip=192.168.122.125 \
op monitor interval=10s
group maxscale_service maxscale_vip MaxScale \
meta target-role=Started
property cib-bootstrap-options: \
dc-version=1.1.10-14.el6_5.3-368c726 \
cluster-infrastructure="classic openais (with plugin)" \
expected-quorum-votes=3 \
stonith-enabled=false \
no-quorum-policy=ignore \
placement-strategy=balanced \
last-lrm-refresh=1404125486
Check the resource status:
[root@node1 ~]# crm status
Last updated: Mon Jun 30 13:51:29 2014
Last change: Mon Jun 30 13:51:27 2014 via crmd on node1
Stack: classic openais (with plugin)
Current DC: node2 - partition with quorum
Version: 1.1.10-14.el6_5.3-368c726
3 Nodes configured, 3 expected votes
2 Resources configured

Online: [ node1 node2 node3 ]

Resource Group: maxscale_service
   maxscale_vip (ocf::heartbeat:IPaddr2): Started node2
   MaxScale (lsb:maxscale): Started node2
With both resources on node2, now MaxScale service will be reachable via the configured VIP address 192.168.122.125:
# mysql -h 192.168.122.125 -P 4006 -utest -p test
Please note our three Linux boxes setup require now four IP addresses: one for each node plus the moving IP address assigned to MaxScale
Summary
The goal of this post was to present a quick HA solution for a running MaxScale setup, using a widely adopted open-source clustering solution.
Even though the main content could be seen as a basic Corosync/Pacemaker setup guide, I encourage you to look for other failure scenarios and all the cluster administrative commands such as moving resources, adding constraints that could be found through the links below.
The reader might fin the LSB script tutorials interesting too, just enabling another application to the HA side,

账号		自动登录	找回密码
密码			开始注册

How to make MaxScale High Available with Corosync/Pacemaker

马上注册，结交更多好友，享用更多功能，让你轻松玩转社区。

浏览过的版块

站长推荐 /4