Skip to content

Pacemaker

Chapter 64: Pacemaker High Availability Clustering

Section titled “Chapter 64: Pacemaker High Availability Clustering”

Linux HA Clustering with Pacemaker and Corosync

Section titled “Linux HA Clustering with Pacemaker and Corosync”

Pacemaker is the industry-standard HA clustering solution for Linux:

  • Critical Services: Ensures services stay up during hardware/software failures
  • Production Systems: Database clusters, web servers, and storage need HA
  • On-Call: You’ll manage cluster failures and node events
  • Career: HA clustering is essential for senior sysadmin roles
  • Certification: RHCSA, RHCE, and AWS exams cover clustering

Cluster downtime can cost $100K+ per hour for critical applications.


64.1 Introduction to High Availability Clustering

Section titled “64.1 Introduction to High Availability Clustering”

High Availability (HA) clusters are designed to provide continuous service by eliminating single points of failure. When one node fails, another takes over seamlessly.

HIGH AVAILABILITY CLUSTER ARCHITECTURE
+------------------------------------------------------------------+
| |
| ┌──────────────────────────────────────────────────────────┐ │
| │ HA CLUSTER OVERVIEW │ │
| │ │ │
| │ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
| │ │ Node 1 │ │ Node 2 │ │ Node 3 │ │ │
| │ │ │ │ │ │ │ │ │
| │ │ Primary │◄──→│ Secondary│◄──→│ Witness │ │ │
| │ │ Active │ │ Standby │ │ (optional│ │ │
| │ └────┬────┘ └────┬────┘ └────┬────┘ │ │
| │ │ │ │ │ │
| │ └──────────────┼──────────────┘ │ │
| │ │ │ │
| │ ┌───────┴───────┐ │ │
| │ │ │ │ │
| │ ▼ ▼ │ │
| │ ┌──────────────┐ ┌──────────────┐ │ │
| │ │ Corosync │ │ Pacemaker │ │ │
| │ │ (Cluster │ │ (CRM) │ │ │
| │ │ Messaging)│ │ (Resource │ │ │
| │ │ │ │ Manager) │ │ │
| │ └──────────────┘ └──────────────┘ │ │
| │ │ │ │
| │ ▼ │ │
| │ ┌──────────────┐ │ │
| │ │ Shared │ │ │
| │ │ Storage │ │ │
| │ │ (SAN/NAS) │ │ │
| │ └──────────────┘ │ │
| │ │ │
| └──────────────────────────────────────────────────────────┘ │
| |
+------------------------------------------------------------------+
CLUSTER TYPES
+------------------------------------------------------------------+
| |
| ACTIVE-PASSIVE (PRIMARY-BACKUP) |
| ┌─────────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ Node 1 (Primary) Node 2 (Standby) │ │
│ │ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ ✓ Active │ │ Standby │ │ │
│ │ │ Service A │ │ │ │ │
│ │ │ │ │ Ready to │ │ │
│ │ │ │ │ take over │ │ │
│ │ └─────────────┘ └─────────────┘ │ │
│ │ │ │
│ │ When Node 1 fails: │ │
│ │ - Node 2 takes over │ │
| │ - Service A starts on Node 2 │ │
| │ │ │
│ └─────────────────────────────────────────────────────────────┘ │
| |
| ACTIVE-ACTIVE (MULTI-PRIMARY) │
| ┌─────────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ Node 1 Node 2 │ │
│ │ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ ✓ Active │ │ ✓ Active │ │ │
│ │ │ Service A │ │ Service B │ │ │
│ │ │ │ │ │ │ │
│ │ │ Load Balancer │◄───→│ Load Balancer│ │ │
│ │ └─────────────┘ └─────────────┘ │ │
│ │ │ │
│ │ When Node 1 fails: │ │
│ │ - Service A starts on Node 2 │ │
│ │ - Both services now on Node 2 │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────┘ │
| |
+------------------------------------------------------------------+

PACEMAKER ARCHITECTURE
+------------------------------------------------------------------+
| |
| ┌──────────────────────────────────────────────────────────┐ │
| │ PACEMAKER STACK │ │
| │ │ │
| │ ┌──────────────────────────────────────────────────┐ │ │
| │ │ CRM (Cluster Resource Manager) │ │ │
| │ │ ┌──────────────┐ ┌─────────────────────────┐ │ │ │
| │ │ │ Policy │ │ Resource │ │ │ │
| │ │ │ Engine │ │ Manager │ │ │ │
| │ │ │ (PE/TE) │ │ (LRM) │ │ │ │
| │ │ └──────────────┘ └─────────────────────────┘ │ │ │
| │ │ │ │ │
| │ └──────────────────────────────────────────────────┘ │ │
| │ │ │ │
| │ ▼ │ │
| │ ┌──────────────────────────────────────────────────┐ │ │
| │ │ COROSYNC (Messaging Layer) │ │ │
| │ │ │ │ │
| │ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │ │
| │ │ │ Node │ │ Node │ │ Node │ │ │ │
| │ │ │ 1 │◄─→│ 2 │◄─→│ 3 │ │ │ │
| │ │ └─────────┘ └─────────┘ └─────────┘ │ │ │
| │ │ │ │ │
| │ │ - Membership │ │ │
| │ │ - Quorum │ │ │
| │ │ - Messaging │ │ │
| │ │ - Fencing │ │ │
| │ └──────────────────────────────────────────────────┘ │ │
| │ │ │
| └──────────────────────────────────────────────────────────┘ │
| |
+------------------------------------------------------------------+
ConceptDescription
ResourceA service that can be managed (IP, filesystem, application)
Resource AgentScript/daemon that controls a specific resource type
PrimitiveSingle resource instance
CloneResource running on multiple nodes
Master/SlaveClone with primary/secondary roles
ConstraintRule defining resource relationships and locations
STONITHShoot The Other Node In The Head - node fencing

64.3 Pacemaker Installation and Configuration

Section titled “64.3 Pacemaker Installation and Configuration”
Terminal window
# =============================================================================
# INSTALLATION (Ubuntu/Debian)
# =============================================================================
# Install pacemaker and dependencies
apt-get update
apt-get install pacemaker corosync fence-agents resource-agents
# Install additional tools
apt-get install crmsh pcs
# Enable services
systemctl enable pacemaker
systemctl enable corosync
# =============================================================================
# INSTALLATION (RHEL/CentOS)
# =============================================================================
# Install packages
yum install pacemaker corosync fence-agents resource-agents pcs
# Enable and start pcsd
systemctl enable pcsd
systemctl start pcsd
# Set hacluster password
passwd hacluster
# =============================================================================
# CLUSTER AUTHENTICATION
# =============================================================================
# Authenticate nodes (on all nodes)
pcs host auth node1 node2 node3 -u hacluster -p password
# Create cluster
pcs cluster setup my_cluster node1 node2 node3
# Start cluster
pcs cluster start --all
Terminal window
# =============================================================================
# CHECK CLUSTER STATUS
# =============================================================================
# View cluster status
crm status
# Or with pcs
pcs status
# View configuration
crm configure show
# =============================================================================
# CREATE BASIC CLUSTER
# =============================================================================
# Disable STONITH (for testing only!)
crm configure property stonith-enabled=false
crm configure property no-quorum-policy=ignore
# Add a floating IP address resource
crm configure primitive virtual_ip ocf:heartbeat:IPaddr2 \
params ip=192.168.1.100 cidr_netmask=24 \
op monitor interval=30s
# Add a service resource (nginx)
crm configure primitive nginx_service systemd:nginx \
op monitor interval=30s
# Create a group (runs on same node)
crm configure group web_group virtual_ip nginx_service
# Add colocation constraint (group follows virtual_ip)
crm configure colocation web_on_ip INFINITY: web_group virtual_ip
# Add ordering constraint
crm configure order ip_then_web mandatory: virtual_ip web_group
Terminal window
# =============================================================================
# COMMON RESOURCE AGENTS
# =============================================================================
# IPaddr2 - Floating IP
crm configure primitive virtual_ip ocf:heartbeat:IPaddr2 \
params ip=192.168.1.100 cidr_netmask=24 \
op monitor interval=30s
# Filesystem - Mount shared storage
crm configure primitive shared_fs ocf:heartbeat:Filesystem \
params device=/dev/sdb1 directory=/mnt/shared fstype=ext4 \
op monitor interval=30s
# MySQL - Database service
crm configure primitive mysql ocf:heartbeat:mysql \
params binary=/usr/sbin/mysqld \
config=/etc/mysql/my.cnf \
datadir=/var/lib/mysql \
op monitor interval=30s timeout=10s
# NFS - NFS mount
crm configure primitive nfs_mount ocf:heartbeat:nfs \
params share=192.168.1.10:/data \
mountpoint=/mnt/nfs \
op monitor interval=30s
# Apache - Web server
crm configure primitive apache ocf:heartbeat:apache \
params configfile=/etc/apache2/apache2.conf \
op monitor interval=30s

RESOURCE CONSTRAINTS
+------------------------------------------------------------------+
| |
| COLOCATION CONSTRAINT │
| ──────────────────── │
| │
│ Ensures resources run on the same node │
| │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ Service A ──┬──→ Node 1 (both run together) │ │
| │ │ │ │
│ │ Service B ──┘ │ │
│ │ │ │
│ │ Command: │ │
│ │ crm configure colocation A_with_B INFINITY: A B │ │
│ │ │ │
| └───────────────────────────────────────────────────────────┘ │
| |
| ORDERING CONSTRAINT │
| ───────────────── │
| │
| Defines start/stop order │
| │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ Step 1: Start Service A ──→ Step 2: Start Service B │ │
│ │ │ │
│ │ Command: │ │
│ │ crm configure order A_before_B mandatory: A B │ │
│ │ │ │
| └───────────────────────────────────────────────────────────┘ │
| |
| LOCATION CONSTRAINT │
| ─────────────────── │
| │
| Prefers or prohibits running on specific nodes │
| │
| ┌───────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ Prefer Node 1: │ │
│ │ crm configure location prefer_node1 serviceA \ │ │
│ │ rule 100: #uname eq node1 │ │
| │ │ │
│ │ Avoid Node 2: │ │
│ │ crm configure location avoid_node2 serviceA \ │ │
│ │ rule -inf: #uname eq node2 │ │
│ │ │ │
| └───────────────────────────────────────────────────────────┘ │
| |
+------------------------------------------------------------------+
Terminal window
# =============================================================================
# CONSTRAINT COMMANDS
# =============================================================================
# Colocation - Run web_app on same node as db
crm configure colocation web_with_db INFINITY: web_app db
# Ordering - Start db before web_app
crm configure order db_before_web mandatory: db web_app
# Order with kind (promote master before starting master)
crm configure order master_then_slave Mandatory: \
ms_Master_Slave:promote Master_Slave:start
# Location - Prefer node1 for web
crm configure location prefer_node1 web_app \
rule 100: #uname eq node1
# Location - Avoid node2 for web
crm configure location avoid_node2 web_app \
rule -inf: #uname eq node2
# Show all constraints
crm constraint show
# Remove constraint
crm configure delete colocation_web_with_db

STONITH (FENCING)
+------------------------------------------------------------------+
| |
| ┌──────────────────────────────────────────────────────────┐ │
| │ FENCING PROCESS │ │
| │ │ │
| │ Node A fails (unresponsive) │ │
│ │ │ │
| │ ▼ │ │
| │ ┌─────────────────────┐ │ │
| │ │ Cluster detects │ │ │
| │ │ node failure │ │ │
| │ │ (no heartbeat) │ │ │
| │ └──────────┬──────────┘ │ │
│ │ │ │
| │ ▼ │ │
| │ ┌─────────────────────┐ │ │
| │ │ Execute fence │ │ │
| │ │ agent on failed │ │ │
│ │ node │ │ │
│ │ └──────────┬──────────┘ │ │
| │ │ │ │
| │ ▼ │ │
| │ ┌─────────────────────┐ │ │
| │ │ Node powered off │ │ │
| │ │ or reset │ │ │
| │ └──────────┬──────────┘ │ │
| │ │ │ │
| │ ▼ │ │
| │ ┌─────────────────────┐ │ │
| │ │ Resources moved │ │ │
| │ │ to surviving │ │ │
| │ │ nodes │ │ │
| │ └─────────────────────┘ │ │
| │ │ │
| └──────────────────────────────────────────────────────────┘ │
| |
| FENCING DEVICES │
| ────────────── │
| - IPMI (Intelligent Platform Management Interface) │
| - iLO (Integrated Lights-Out) │
| - iDRAC (Integrated Dell Remote Access Controller) │
| - APC PDU (Power Distribution Unit) │
| - VM Fence (for virtual machines) │
| |
+------------------------------------------------------------------+
Terminal window
# =============================================================================
# CONFIGURE STONITH
# =============================================================================
# Enable STONITH (REQUIRED for production!)
crm configure property stonith-enabled=true
# IPMI Fencing - Node 1
crm configure primitive stonith_node1 stonith:external/ipmi \
params hostname=node1 \
ipaddr=192.168.1.101 \
userid=admin \
passwd=password \
interface=lanplus \
op monitor interval=60s
# IPMI Fencing - Node 2
crm configure primitive stonith_node2 stonith:external/ipmi \
params hostname=node2 \
ipaddr=192.168.1.102 \
userid=admin \
passwd=password \
interface=lanplus \
op monitor interval=60s
# APC PDU Fencing
crm configure primitive stonith_apc stonith:external/apc \
params ipaddr=192.168.1.50 \
login=apc \
passwd=apc_password \
action=reboot \
port=1 \
op monitor interval=60s
# VM Fence (for virtualization)
crm configure primitive stonith_vm1 stonith:external/libvirt \
params uri=qemu:///system \
hypervisor_uri=qemu+tcp://192.168.1.10/system \
target=vm1 \
op monitor interval=60s

Terminal window
# =============================================================================
# CLUSTER MANAGEMENT
# =============================================================================
# View status
crm status
pcs status
# View configuration
crm configure show
pcs config show
# Start cluster
pcs cluster start node1
# Stop cluster
pcs cluster stop node1
# Enable cluster on boot
pcs cluster enable
# =============================================================================
# RESOURCE MANAGEMENT
# =============================================================================
# Start resource
crm resource start my_resource
# Stop resource
crm resource stop my_resource
# Move resource to different node
crm resource move my_resource node2
# Migrate resource
crm resource migrate my_resource node2
# Clean up failed resource
crm resource cleanup my_resource
# Monitor resource
crm resource monitor my_resource
# =============================================================================
# NODE MANAGEMENT
# =============================================================================
# Standby node (prevent resources from running)
crm node standby node2
# Online node (allow resources)
crm node online node2
# Remove node from cluster
pcs cluster node remove node3
# Add node to cluster
pcs cluster node add node3
Terminal window
# =============================================================================
# TROUBLESHOOTING
# =============================================================================
# Check logs
journalctl -u pacemaker -f
journalctl -u corosync -f
# View detailed resource status
crm_mon -1
# Simulate resource move (test mode)
crm_verify -L
# Check for configuration errors
crm configure verify
# Force cleanup of resource
crm resource cleanup <resource_name>
# Check failed actions
crm_simulate -sL
# Debug resource agent
OCF_ROOT=/usr/lib/ocf/resource.d/ crm_verify -VVV

WRONG:

Terminal window
# Disable STONITH for "convenience"
crm configure property stonith-enabled=false

CORRECT:

Terminal window
# Always enable STONITH
crm configure property stonith-enabled=true
# Configure proper fencing device
crm configure primitive stonith_fence stonith:fence_xvm \
params pcmk_host_map="node1:vm1;node2:vm2"

Why: Without STONITH, split-brain scenarios can cause data corruption.


WRONG:

Terminal window
# Just set up cluster and walk away
crm configure primitive web_svc ocf:heartbeat:apache \
params configfile=/etc/httpd/conf/httpd.conf

CORRECT:

Terminal window
# Test failover regularly
# Simulate node failure
crm node standby node1
# Verify services move
# Check logs
crm_mon
# Bring node back
crm node online node1

Why: Untested clusters fail in production.


WRONG:

Terminal window
# Put all services on same node
crm configure colocation web_db inf: web_svc db_svc

CORRECT:

Terminal window
# Distribute resources across nodes
crm configure location prefer-node1 web_svc 100: node1
crm configure location prefer-node2 db_svc 100: node2

Why: Colocated resources fail together during node failure.


PACEMAKER BEST PRACTICES CHECKLIST
+------------------------------------------------------------------+
| |
| ┌─────────────────────────────────────────────────────────────┐ │
| │ CONFIGURATION │ │
| ├─────────────────────────────────────────────────────────────┤ │
| │ □ Always enable STONITH (never disable in production) │ │
| │ □ Configure quorum policy appropriately │ │
| │ □ Use proper resource agents │ │
| │ □ Test resource failover │ │
| │ □ Document resource dependencies │ │
| └─────────────────────────────────────────────────────────────┘ │
| |
| ┌─────────────────────────────────────────────────────────────┐ │
| │ MONITORING │ │
| ├─────────────────────────────────────────────────────────────┤ │
| │ □ Monitor cluster status continuously │ │
| │ □ Set up alerts for cluster changes │ │
| │ □ Monitor fencing devices │ │
| │ □ Monitor resource health │ │
| └─────────────────────────────────────────────────────────────┘ │
| |
| ┌─────────────────────────────────────────────────────────────┐ │
| │ TESTING │ │
| ├─────────────────────────────────────────────────────────────┤ │
| │ □ Regular failover testing (monthly) │ │
| │ □ Test STONITH/fencing │ │
| │ □ Test network partitions │ │
| │ □ Document failover procedures │ │
| └─────────────────────────────────────────────────────────────┘ │
| |
| ┌─────────────────────────────────────────────────────────────┐ │
| │ DOCUMENTATION │ │
| ├─────────────────────────────────────────────────────────────┤ │
| │ □ Document cluster configuration │ │
| │ □ Document failover procedures │ │
| │ □ Keep configuration backups │ │
| │ □ Document IPMI credentials │ │
| └─────────────────────────────────────────────────────────────┘ │
| |
+------------------------------------------------------------------+

  1. Q: What is STONITH and why is it important?

    • A: STONITH (Shoot The Other Node In The Head) is fencing - cutting off a failed node from cluster resources. Prevents split-brain scenarios where both nodes think they’re primary and corrupt data.
  2. Q: Explain the difference between Pacemaker and Corosync.

    • A: Corosync provides cluster messaging and membership (who’s in the cluster). Pacemaker provides resource management (what runs where). They work together - Corosync tells Pacemaker who exists, Pacemaker decides where to run resources.
  3. Q: What are Pacemaker resource types?

    • A: Primitives (single resources), Groups (related resources), Clones (resources on all nodes), Multi-state (primary/secondary).
  1. Q: A resource keeps failing over between nodes. How would you troubleshoot?
    • A: Check crm_mon -1 for failure details, review resource agent logs, check health monitoring intervals, verify network connectivity, check for resource constraints causing issues, look at migration thresholds.

End of Chapter 65: Pacemaker High Availability Clustering