Linux HA Cluster
Jump to navigation
Jump to search
Various notes on making an Linux High Availability Cluster.
Corosync/Pacemaker
Corosync and Pacemaker make it easy to make an HA cluster that share a Virtual IP (VIP) between nodes.
Sync time between servers
- Install NTP and set the timezone
dpkg-reconfigure tzdata apt-get update apt-get -y install ntp
Configure Firewall
Install Corosync/Pacemaker
apt-get install pacemaker corosync
Note: you can install the options pcs tool for controlling pacemaker wtih
apt-get install pcs
Configure Authorization Key for two servers
- Generate the key on server 1 (main server)
apt-get install haveged corosync-keygen
- Copy key to server 2
scp /etc/corosync/authekey <username>@<server2>:/etc/corosync
- Change permissions for file on server 2
chown root: /etc/corosync/authkey chmod 400 /etc/corosync/authkey
Configure Corosync cluster
- On server 1 add the following to /etc/crosync/corosync.conf
totem {
version: 2
cluster_name: lbcluster
transport: udpu
interface {
ringnumber: 0
bindnetaddr: <private_binding_IP_address>
broadcast: yes
mcastport: 5405
}
}
quorum {
provider: corosync_votequorum
two_node: 1
}
nodelist {
node {
ring0_addr: <server_1_private_IP_address>
name: primary
nodeid: 1
}
node {
ring0_addr: <server_2_private_IP_address>
name: secondary
nodeid: 2
}
}
logging {
to_logfile: yes
logfile: /var/log/corosync/corosync.log
to_syslog: yes
timestamp: on
}
Make sure to update the <private_binding_IP_address>, <server_1_private_IP_address>, and <server_2_private_IP_address> sections in the configuration above.
- Copy config to server 2
scp /etc/corosync/corosync.conf <username>@<server2>:/etc/corosync
Enable and Run Corosync
Do all of the following on server 1 and server 2:
- Make directory and config file
mkdir -p /etc/corosync/service.d vim /etc/corosync/service.d/pcmk
service {
name: pacemaker
ver: 1
}
- Edit /etc/default/corosync. Add/change START= to START=yes
- Start corosync on both server with
systemctl enable corosync systemctl restart corosync
- Check to make sure everything worked
corosync-cmapctl | grep members runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0 runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(server_A_private_IP_address) runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1 runtime.totem.pg.mrp.srp.members.1.status (str) = joined runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0 runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(server_B_private_IP_address) runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1 runtime.totem.pg.mrp.srp.members.2.status (str) = joined
Enable and Start Pacemaker
- Enable with
systemctl enable pacemaker
- Start
systemctl start pacemaker
- Check status
crm status
Last updated: Sun Sep 17 15:49:24 2017 Last change: Tue Sep 12 09:04:23 2017 by root via crm_attribute on secondary Stack: corosync Current DC: primary (version 1.1.14-70404b0) - partition with quorum 2 nodes and 0 resource configuredOnline: [ primary secondary ]
Configure Pacemaker & add Virtual IP
- Run on server 1:
crm configure property stonith-enabled=false crm configure property no-quorum-policy=ignore
- Add Virtual IP
crm configure primitive virtual_public_ip \ ocf:heartbeat:IPaddr2 params ip="1.0.0.3" \ cidr_netmask="32" op monitor interval="10s" \ meta migration-threshold="2" failure-timeout="60s" resource-stickiness="100"
Notes:
- Change ip="10.0.0.3" to the IP address you will use. Both servers must have IP addresses already on the same network assigned to interfaces
- If adding an additional address, you will need to change the primitive's name from virtual_public_ip to something like virtual_public_ip2
- Check status
crm status
Last updated: Sun Sep 17 15:49:24 2017 Last change: Tue Sep 12 09:04:23 2017 by root via crm_attribute on secondary Stack: corosync Current DC: primary (version 1.1.14-70404b0) - partition with quorum 2 nodes and 1 resource configuredOnline: [ primary secondary ]Full list of resources:virtual_public_ip (ocf::heartbeat:IPaddr2): Started primary
- Verify Virtual IP is running on server 1:
iip -4 addr ls
You should something similar to:
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 inet 1.0.0.1/24 brd 1.0.0.255 scope global eth0 valid_lft forever preferred_lft forever inet 1.0.0.3/32 brd 1.0.0.255 scope global eth0 valid_lft forever preferred_lft forever
Testing
- Simulate server 1 going down. Run the following on server 1:
crm node standby primary
- Login to server 2 and check the status:
crm status
Last updated: Sun Sep 17 15:49:24 2017 Last change: Tue Sep 12 09:04:23 2017 by root via crm_attribute on secondary Stack: corosync Current DC: primary (version 1.1.14-70404b0) - partition with quorum 2 nodes and 1 resource configuredNode primary: standby Online: [ secondary ]Full list of resources:virtual_public_ip (ocf::heartbeat:IPaddr2): Started secondary
- Check server 2's IP to make sure Virtual IP is present:
ip -4 addr ls
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 inet 1.0.0.2/24 brd 1.0.0.255 scope global eth0 valid_lft forever preferred_lft forever inet 1.0.0.3/32 brd 1.0.0.255 scope global eth0 valid_lft forever preferred_lft forever
- Bring server 1 online again
crm node online primary
- You can force the fail-over by running on server 2
crm node standby secondary crm node online secondary