Pacemaker
Chapter 64: Pacemaker High Availability Clustering
Section titled “Chapter 64: Pacemaker High Availability Clustering”Linux HA Clustering with Pacemaker and Corosync
Section titled “Linux HA Clustering with Pacemaker and Corosync”Why This Matters in DevOps/SRE
Section titled “Why This Matters in DevOps/SRE”Pacemaker is the industry-standard HA clustering solution for Linux:
- Critical Services: Ensures services stay up during hardware/software failures
- Production Systems: Database clusters, web servers, and storage need HA
- On-Call: You’ll manage cluster failures and node events
- Career: HA clustering is essential for senior sysadmin roles
- Certification: RHCSA, RHCE, and AWS exams cover clustering
Cluster downtime can cost $100K+ per hour for critical applications.
64.1 Introduction to High Availability Clustering
Section titled “64.1 Introduction to High Availability Clustering”Understanding HA Clusters
Section titled “Understanding HA Clusters”High Availability (HA) clusters are designed to provide continuous service by eliminating single points of failure. When one node fails, another takes over seamlessly.
HIGH AVAILABILITY CLUSTER ARCHITECTURE+------------------------------------------------------------------+| || ┌──────────────────────────────────────────────────────────┐ │| │ HA CLUSTER OVERVIEW │ │| │ │ │| │ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │| │ │ Node 1 │ │ Node 2 │ │ Node 3 │ │ │| │ │ │ │ │ │ │ │ │| │ │ Primary │◄──→│ Secondary│◄──→│ Witness │ │ │| │ │ Active │ │ Standby │ │ (optional│ │ │| │ └────┬────┘ └────┬────┘ └────┬────┘ │ │| │ │ │ │ │ │| │ └──────────────┼──────────────┘ │ │| │ │ │ │| │ ┌───────┴───────┐ │ │| │ │ │ │ │| │ ▼ ▼ │ │| │ ┌──────────────┐ ┌──────────────┐ │ │| │ │ Corosync │ │ Pacemaker │ │ │| │ │ (Cluster │ │ (CRM) │ │ │| │ │ Messaging)│ │ (Resource │ │ │| │ │ │ │ Manager) │ │ │| │ └──────────────┘ └──────────────┘ │ │| │ │ │ │| │ ▼ │ │| │ ┌──────────────┐ │ │| │ │ Shared │ │ │| │ │ Storage │ │ │| │ │ (SAN/NAS) │ │ │| │ └──────────────┘ │ │| │ │ │| └──────────────────────────────────────────────────────────┘ │| |+------------------------------------------------------------------+Cluster Types
Section titled “Cluster Types” CLUSTER TYPES+------------------------------------------------------------------+| || ACTIVE-PASSIVE (PRIMARY-BACKUP) || ┌─────────────────────────────────────────────────────────────┐ ││ │ │ ││ │ Node 1 (Primary) Node 2 (Standby) │ ││ │ ┌─────────────┐ ┌─────────────┐ │ ││ │ │ ✓ Active │ │ Standby │ │ ││ │ │ Service A │ │ │ │ ││ │ │ │ │ Ready to │ │ ││ │ │ │ │ take over │ │ ││ │ └─────────────┘ └─────────────┘ │ ││ │ │ ││ │ When Node 1 fails: │ ││ │ - Node 2 takes over │ │| │ - Service A starts on Node 2 │ │| │ │ ││ └─────────────────────────────────────────────────────────────┘ │| || ACTIVE-ACTIVE (MULTI-PRIMARY) │| ┌─────────────────────────────────────────────────────────────┐ ││ │ │ ││ │ Node 1 Node 2 │ ││ │ ┌─────────────┐ ┌─────────────┐ │ ││ │ │ ✓ Active │ │ ✓ Active │ │ ││ │ │ Service A │ │ Service B │ │ ││ │ │ │ │ │ │ ││ │ │ Load Balancer │◄───→│ Load Balancer│ │ ││ │ └─────────────┘ └─────────────┘ │ ││ │ │ ││ │ When Node 1 fails: │ ││ │ - Service A starts on Node 2 │ ││ │ - Both services now on Node 2 │ ││ │ │ ││ └─────────────────────────────────────────────────────────────┘ │| |+------------------------------------------------------------------+64.2 Pacemaker Architecture
Section titled “64.2 Pacemaker Architecture”Components Overview
Section titled “Components Overview” PACEMAKER ARCHITECTURE+------------------------------------------------------------------+| || ┌──────────────────────────────────────────────────────────┐ │| │ PACEMAKER STACK │ │| │ │ │| │ ┌──────────────────────────────────────────────────┐ │ │| │ │ CRM (Cluster Resource Manager) │ │ │| │ │ ┌──────────────┐ ┌─────────────────────────┐ │ │ │| │ │ │ Policy │ │ Resource │ │ │ │| │ │ │ Engine │ │ Manager │ │ │ │| │ │ │ (PE/TE) │ │ (LRM) │ │ │ │| │ │ └──────────────┘ └─────────────────────────┘ │ │ │| │ │ │ │ │| │ └──────────────────────────────────────────────────┘ │ │| │ │ │ │| │ ▼ │ │| │ ┌──────────────────────────────────────────────────┐ │ │| │ │ COROSYNC (Messaging Layer) │ │ │| │ │ │ │ │| │ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │ │| │ │ │ Node │ │ Node │ │ Node │ │ │ │| │ │ │ 1 │◄─→│ 2 │◄─→│ 3 │ │ │ │| │ │ └─────────┘ └─────────┘ └─────────┘ │ │ │| │ │ │ │ │| │ │ - Membership │ │ │| │ │ - Quorum │ │ │| │ │ - Messaging │ │ │| │ │ - Fencing │ │ │| │ └──────────────────────────────────────────────────┘ │ │| │ │ │| └──────────────────────────────────────────────────────────┘ │| |+------------------------------------------------------------------+Key Concepts
Section titled “Key Concepts”| Concept | Description |
|---|---|
| Resource | A service that can be managed (IP, filesystem, application) |
| Resource Agent | Script/daemon that controls a specific resource type |
| Primitive | Single resource instance |
| Clone | Resource running on multiple nodes |
| Master/Slave | Clone with primary/secondary roles |
| Constraint | Rule defining resource relationships and locations |
| STONITH | Shoot The Other Node In The Head - node fencing |
64.3 Pacemaker Installation and Configuration
Section titled “64.3 Pacemaker Installation and Configuration”Installation
Section titled “Installation”# =============================================================================# INSTALLATION (Ubuntu/Debian)# =============================================================================
# Install pacemaker and dependenciesapt-get updateapt-get install pacemaker corosync fence-agents resource-agents
# Install additional toolsapt-get install crmsh pcs
# Enable servicessystemctl enable pacemakersystemctl enable corosync
# =============================================================================# INSTALLATION (RHEL/CentOS)# =============================================================================
# Install packagesyum install pacemaker corosync fence-agents resource-agents pcs
# Enable and start pcsdsystemctl enable pcsdsystemctl start pcsd
# Set hacluster passwordpasswd hacluster
# =============================================================================# CLUSTER AUTHENTICATION# =============================================================================
# Authenticate nodes (on all nodes)pcs host auth node1 node2 node3 -u hacluster -p password
# Create clusterpcs cluster setup my_cluster node1 node2 node3
# Start clusterpcs cluster start --allBasic Configuration
Section titled “Basic Configuration”# =============================================================================# CHECK CLUSTER STATUS# =============================================================================
# View cluster statuscrm status
# Or with pcspcs status
# View configurationcrm configure show
# =============================================================================# CREATE BASIC CLUSTER# =============================================================================
# Disable STONITH (for testing only!)crm configure property stonith-enabled=falsecrm configure property no-quorum-policy=ignore
# Add a floating IP address resourcecrm configure primitive virtual_ip ocf:heartbeat:IPaddr2 \ params ip=192.168.1.100 cidr_netmask=24 \ op monitor interval=30s
# Add a service resource (nginx)crm configure primitive nginx_service systemd:nginx \ op monitor interval=30s
# Create a group (runs on same node)crm configure group web_group virtual_ip nginx_service
# Add colocation constraint (group follows virtual_ip)crm configure colocation web_on_ip INFINITY: web_group virtual_ip
# Add ordering constraintcrm configure order ip_then_web mandatory: virtual_ip web_groupResource Agents
Section titled “Resource Agents”# =============================================================================# COMMON RESOURCE AGENTS# =============================================================================
# IPaddr2 - Floating IPcrm configure primitive virtual_ip ocf:heartbeat:IPaddr2 \ params ip=192.168.1.100 cidr_netmask=24 \ op monitor interval=30s
# Filesystem - Mount shared storagecrm configure primitive shared_fs ocf:heartbeat:Filesystem \ params device=/dev/sdb1 directory=/mnt/shared fstype=ext4 \ op monitor interval=30s
# MySQL - Database servicecrm configure primitive mysql ocf:heartbeat:mysql \ params binary=/usr/sbin/mysqld \ config=/etc/mysql/my.cnf \ datadir=/var/lib/mysql \ op monitor interval=30s timeout=10s
# NFS - NFS mountcrm configure primitive nfs_mount ocf:heartbeat:nfs \ params share=192.168.1.10:/data \ mountpoint=/mnt/nfs \ op monitor interval=30s
# Apache - Web servercrm configure primitive apache ocf:heartbeat:apache \ params configfile=/etc/apache2/apache2.conf \ op monitor interval=30s64.4 Resource Constraints
Section titled “64.4 Resource Constraints”Constraint Types
Section titled “Constraint Types” RESOURCE CONSTRAINTS+------------------------------------------------------------------+| || COLOCATION CONSTRAINT │| ──────────────────── │| ││ Ensures resources run on the same node │| ││ ┌───────────────────────────────────────────────────────────┐ ││ │ │ ││ │ Service A ──┬──→ Node 1 (both run together) │ │| │ │ │ ││ │ Service B ──┘ │ ││ │ │ ││ │ Command: │ ││ │ crm configure colocation A_with_B INFINITY: A B │ ││ │ │ │| └───────────────────────────────────────────────────────────┘ │| || ORDERING CONSTRAINT │| ───────────────── │| │| Defines start/stop order │| ││ ┌───────────────────────────────────────────────────────────┐ ││ │ │ ││ │ Step 1: Start Service A ──→ Step 2: Start Service B │ ││ │ │ ││ │ Command: │ ││ │ crm configure order A_before_B mandatory: A B │ ││ │ │ │| └───────────────────────────────────────────────────────────┘ │| || LOCATION CONSTRAINT │| ─────────────────── │| │| Prefers or prohibits running on specific nodes │| │| ┌───────────────────────────────────────────────────────────┐ ││ │ │ ││ │ Prefer Node 1: │ ││ │ crm configure location prefer_node1 serviceA \ │ ││ │ rule 100: #uname eq node1 │ │| │ │ ││ │ Avoid Node 2: │ ││ │ crm configure location avoid_node2 serviceA \ │ ││ │ rule -inf: #uname eq node2 │ ││ │ │ │| └───────────────────────────────────────────────────────────┘ │| |+------------------------------------------------------------------+Constraint Examples
Section titled “Constraint Examples”# =============================================================================# CONSTRAINT COMMANDS# =============================================================================
# Colocation - Run web_app on same node as dbcrm configure colocation web_with_db INFINITY: web_app db
# Ordering - Start db before web_appcrm configure order db_before_web mandatory: db web_app
# Order with kind (promote master before starting master)crm configure order master_then_slave Mandatory: \ ms_Master_Slave:promote Master_Slave:start
# Location - Prefer node1 for webcrm configure location prefer_node1 web_app \ rule 100: #uname eq node1
# Location - Avoid node2 for webcrm configure location avoid_node2 web_app \ rule -inf: #uname eq node2
# Show all constraintscrm constraint show
# Remove constraintcrm configure delete colocation_web_with_db64.5 Fencing and STONITH
Section titled “64.5 Fencing and STONITH”Understanding Fencing
Section titled “Understanding Fencing” STONITH (FENCING)+------------------------------------------------------------------+| || ┌──────────────────────────────────────────────────────────┐ │| │ FENCING PROCESS │ │| │ │ │| │ Node A fails (unresponsive) │ │ │ │ │ │| │ ▼ │ │| │ ┌─────────────────────┐ │ │| │ │ Cluster detects │ │ │| │ │ node failure │ │ │| │ │ (no heartbeat) │ │ │| │ └──────────┬──────────┘ │ │ │ │ │ │| │ ▼ │ │| │ ┌─────────────────────┐ │ │| │ │ Execute fence │ │ │| │ │ agent on failed │ │ │ │ │ node │ │ ││ │ └──────────┬──────────┘ │ │| │ │ │ │| │ ▼ │ │| │ ┌─────────────────────┐ │ │| │ │ Node powered off │ │ │| │ │ or reset │ │ │| │ └──────────┬──────────┘ │ │| │ │ │ │| │ ▼ │ │| │ ┌─────────────────────┐ │ │| │ │ Resources moved │ │ │| │ │ to surviving │ │ │| │ │ nodes │ │ │| │ └─────────────────────┘ │ │| │ │ │| └──────────────────────────────────────────────────────────┘ │| || FENCING DEVICES │| ────────────── │| - IPMI (Intelligent Platform Management Interface) │| - iLO (Integrated Lights-Out) │| - iDRAC (Integrated Dell Remote Access Controller) │| - APC PDU (Power Distribution Unit) │| - VM Fence (for virtual machines) │| |+------------------------------------------------------------------+STONITH Configuration
Section titled “STONITH Configuration”# =============================================================================# CONFIGURE STONITH# =============================================================================
# Enable STONITH (REQUIRED for production!)crm configure property stonith-enabled=true
# IPMI Fencing - Node 1crm configure primitive stonith_node1 stonith:external/ipmi \ params hostname=node1 \ ipaddr=192.168.1.101 \ userid=admin \ passwd=password \ interface=lanplus \ op monitor interval=60s
# IPMI Fencing - Node 2crm configure primitive stonith_node2 stonith:external/ipmi \ params hostname=node2 \ ipaddr=192.168.1.102 \ userid=admin \ passwd=password \ interface=lanplus \ op monitor interval=60s
# APC PDU Fencingcrm configure primitive stonith_apc stonith:external/apc \ params ipaddr=192.168.1.50 \ login=apc \ passwd=apc_password \ action=reboot \ port=1 \ op monitor interval=60s
# VM Fence (for virtualization)crm configure primitive stonith_vm1 stonith:external/libvirt \ params uri=qemu:///system \ hypervisor_uri=qemu+tcp://192.168.1.10/system \ target=vm1 \ op monitor interval=60s64.6 Cluster Management
Section titled “64.6 Cluster Management”Common Operations
Section titled “Common Operations”# =============================================================================# CLUSTER MANAGEMENT# =============================================================================
# View statuscrm statuspcs status
# View configurationcrm configure showpcs config show
# Start clusterpcs cluster start node1
# Stop clusterpcs cluster stop node1
# Enable cluster on bootpcs cluster enable
# =============================================================================# RESOURCE MANAGEMENT# =============================================================================
# Start resourcecrm resource start my_resource
# Stop resourcecrm resource stop my_resource
# Move resource to different nodecrm resource move my_resource node2
# Migrate resourcecrm resource migrate my_resource node2
# Clean up failed resourcecrm resource cleanup my_resource
# Monitor resourcecrm resource monitor my_resource
# =============================================================================# NODE MANAGEMENT# =============================================================================
# Standby node (prevent resources from running)crm node standby node2
# Online node (allow resources)crm node online node2
# Remove node from clusterpcs cluster node remove node3
# Add node to clusterpcs cluster node add node3Troubleshooting
Section titled “Troubleshooting”# =============================================================================# TROUBLESHOOTING# =============================================================================
# Check logsjournalctl -u pacemaker -fjournalctl -u corosync -f
# View detailed resource statuscrm_mon -1
# Simulate resource move (test mode)crm_verify -L
# Check for configuration errorscrm configure verify
# Force cleanup of resourcecrm resource cleanup <resource_name>
# Check failed actionscrm_simulate -sL
# Debug resource agentOCF_ROOT=/usr/lib/ocf/resource.d/ crm_verify -VVVCommon Mistakes & Anti-Patterns
Section titled “Common Mistakes & Anti-Patterns”1. Disabling STONITH
Section titled “1. Disabling STONITH”WRONG:
# Disable STONITH for "convenience"crm configure property stonith-enabled=falseCORRECT:
# Always enable STONITHcrm configure property stonith-enabled=true# Configure proper fencing devicecrm configure primitive stonith_fence stonith:fence_xvm \ params pcmk_host_map="node1:vm1;node2:vm2"Why: Without STONITH, split-brain scenarios can cause data corruption.
2. Not Testing Failover
Section titled “2. Not Testing Failover”WRONG:
# Just set up cluster and walk awaycrm configure primitive web_svc ocf:heartbeat:apache \ params configfile=/etc/httpd/conf/httpd.confCORRECT:
# Test failover regularly# Simulate node failurecrm node standby node1# Verify services move# Check logscrm_mon# Bring node backcrm node online node1Why: Untested clusters fail in production.
3. Single Resource Colocation
Section titled “3. Single Resource Colocation”WRONG:
# Put all services on same nodecrm configure colocation web_db inf: web_svc db_svcCORRECT:
# Distribute resources across nodescrm configure location prefer-node1 web_svc 100: node1crm configure location prefer-node2 db_svc 100: node2Why: Colocated resources fail together during node failure.
64.7 HA Best Practices
Section titled “64.7 HA Best Practices”Summary Checklist
Section titled “Summary Checklist” PACEMAKER BEST PRACTICES CHECKLIST+------------------------------------------------------------------+| || ┌─────────────────────────────────────────────────────────────┐ │| │ CONFIGURATION │ │| ├─────────────────────────────────────────────────────────────┤ │| │ □ Always enable STONITH (never disable in production) │ │| │ □ Configure quorum policy appropriately │ │| │ □ Use proper resource agents │ │| │ □ Test resource failover │ │| │ □ Document resource dependencies │ │| └─────────────────────────────────────────────────────────────┘ │| || ┌─────────────────────────────────────────────────────────────┐ │| │ MONITORING │ │| ├─────────────────────────────────────────────────────────────┤ │| │ □ Monitor cluster status continuously │ │| │ □ Set up alerts for cluster changes │ │| │ □ Monitor fencing devices │ │| │ □ Monitor resource health │ │| └─────────────────────────────────────────────────────────────┘ │| || ┌─────────────────────────────────────────────────────────────┐ │| │ TESTING │ │| ├─────────────────────────────────────────────────────────────┤ │| │ □ Regular failover testing (monthly) │ │| │ □ Test STONITH/fencing │ │| │ □ Test network partitions │ │| │ □ Document failover procedures │ │| └─────────────────────────────────────────────────────────────┘ │| || ┌─────────────────────────────────────────────────────────────┐ │| │ DOCUMENTATION │ │| ├─────────────────────────────────────────────────────────────┤ │| │ □ Document cluster configuration │ │| │ □ Document failover procedures │ │| │ □ Keep configuration backups │ │| │ □ Document IPMI credentials │ │| └─────────────────────────────────────────────────────────────┘ │| |+------------------------------------------------------------------+Interview Questions
Section titled “Interview Questions”Conceptual Questions
Section titled “Conceptual Questions”-
Q: What is STONITH and why is it important?
- A: STONITH (Shoot The Other Node In The Head) is fencing - cutting off a failed node from cluster resources. Prevents split-brain scenarios where both nodes think they’re primary and corrupt data.
-
Q: Explain the difference between Pacemaker and Corosync.
- A: Corosync provides cluster messaging and membership (who’s in the cluster). Pacemaker provides resource management (what runs where). They work together - Corosync tells Pacemaker who exists, Pacemaker decides where to run resources.
-
Q: What are Pacemaker resource types?
- A: Primitives (single resources), Groups (related resources), Clones (resources on all nodes), Multi-state (primary/secondary).
Scenario-Based Questions
Section titled “Scenario-Based Questions”- Q: A resource keeps failing over between nodes. How would you troubleshoot?
- A: Check
crm_mon -1for failure details, review resource agent logs, check health monitoring intervals, verify network connectivity, check for resource constraints causing issues, look at migration thresholds.
- A: Check
End of Chapter 65: Pacemaker High Availability Clustering