Skip to content

Keepalived HAProxy

High Availability (HA) and Load Balancing are critical for production environments. This chapter covers HA concepts, keepalived, HAProxy, load balancing strategies, and building resilient infrastructure.


High Availability and Load Balancing are critical for production systems:

  • Uptime: HA ensures services remain available during failures
  • Scalability: Load balancers distribute traffic across multiple servers
  • On-Call: You’ll respond to HA cluster failures and load-related issues
  • Design: Understanding HA patterns is essential for infrastructure design
  • Cost: Proper load balancing optimizes resource utilization

Downtime costs can be $100,000+ per hour for critical systems.


High Availability Architecture
+------------------------------------------------------------------+
| |
| Single Point of Failure High Availability |
| +----------+ +----------+ |
| | Server | | LB | |
| | | | | | | |
| | v | | +---+---+---+ |
| | App | | | | | | |
| +----------+ | v v v v |
| +--+---+---+---+ |
| |S1 S2 S3 | |
| +--+---+---+---+ |
| Server Cluster |
+------------------------------------------------------------------+
Terminal window
# SLA (Service Level Agreement)
# 99.9% = 8.76 hours downtime/year
# 99.99% = 52.6 minutes downtime/year
# 99.999% = 5.26 minutes downtime/year
# HA Architecture Components
# - Redundancy: Multiple instances of everything
# - Failover: Automatic switching on failure
# - Load balancing: Distribute traffic
# - Monitoring: Detect failures quickly
# - Health checks: Verify service availability
Terminal window
# Active-Passive
# - Primary server handles traffic
# - Secondary server waits
# - Failover on primary failure
# Active-Active
# - All servers handle traffic
# - Better resource utilization
# - More complex setup
# N+1 Redundancy
# - N servers needed for load
# - 1 extra server for failover
# Geographic Redundancy
# - Multiple data centers
# - DNS failover
# - Data replication

Terminal window
# Install keepalived
sudo pacman -S keepalived
# Enable and start
sudo systemctl enable --now keepalived
/etc/keepalived/keepalived.conf
# VRRP for IP failover
vrrp_instance VI_1 {
state MASTER # BACKUP on other servers
interface eth0
virtual_router_id 51
priority 100 # 100 on master, 90 on backup
advert_int 1
authentication {
auth_type PASS
auth_pass secret123
}
virtual_ipaddress {
192.168.1.100/24 dev eth0
}
# Notification scripts
notify_master /etc/keepalived/notify.sh master
notify_backup /etc/keepalived/notify.sh backup
notify_fault /etc/keepalived/notify.sh fault
}
/etc/keepalived/notify.sh
#!/bin/bash
case "$1" in
master)
echo "Became MASTER" | logger
# Start services
systemctl start nginx
;;
backup)
echo "Became BACKUP" | logger
;;
fault)
echo "FAULT state" | logger
;;
esac
/etc/keepalived/keepalived.conf
vrrp_script check_nginx {
script "/usr/bin/pgrep nginx"
interval 2
timeout 2
fall 3
rise 2
}
vrrp_instance VI_1 {
# ... other config ...
track_script {
check_nginx
}
}

Terminal window
# Install HAProxy
sudo pacman -S haproxy
# Start service
sudo systemctl enable --now haproxy
/etc/haproxy/haproxy.cfg
global
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin
stats timeout 30s
user haproxy
group haproxy
daemon
# Default SSL settings
ssl-default-bind-ciphers PROFILE=SYSTEM
ssl-default-server-ciphers PROFILE=SYSTEM
defaults
log global
mode http
option httplog
option dontlognull
option http-server-close
option forwardfor except 127.0.0.0/8
option redispatch
retries 3
timeout connect 5000
timeout client 50000
timeout server 50000
errorfile 400 /etc/haproxy/errors/400.http
errorfile 403 /etc/haproxy/errors/403.http
errorfile 408 /etc/haproxy/errors/408.http
errorfile 500 /etc/haproxy/errors/500.http
errorfile 502 /etc/haproxy/errors/502.http
errorfile 503 /etc/haproxy/errors/503.http
errorfile 504 /etc/haproxy/errors/504.http
frontend http_front
bind *:80
bind *:443 ssl crt /etc/ssl/certs/server.pem
default_backend web_servers
backend web_servers
balance roundrobin
option httpchk GET /health
server web1 192.168.1.10:80 check inter 2000 rise 2 fall 3
server web2 192.168.1.11:80 check inter 2000 rise 2 fall 3
server web3 192.168.1.12:80 check inter 2000 rise 2 fall 3
Terminal window
# Round Robin (default)
balance roundrobin
# Least Connections
balance leastconn
# Source IP (persistence)
balance source
# URI
balance uri
# URL parameter
balance url_param
# Header
balance hdr(User-Agent)
Terminal window
# Basic TCP check
option tcpchk
# HTTP check
option httpchk
http-check expect status 200
# Custom health check
http-check GET /api/health HTTP/1.1\r\nHost:\ example.com
# Enable stats
listen stats
bind *:8404
stats enable
stats uri /stats
stats refresh 30s
stats auth admin:password
Terminal window
frontend https_front
bind *:443 ssl crt /etc/ssl/certs/server.pem crt /etc/ssl/certs/
# Redirect HTTP to HTTPS
http-request redirect scheme https unless { ssl_fc }
default_backend web_servers
# Backend with SSL
backend web_servers
balance roundrobin
option ssl-hello-chk
server web1 192.168.1.10:443 check ssl verify none
server web2 192.168.1.11:443 check ssl verify none

/etc/nginx/nginx.conf
http {
upstream backend {
least_conn;
server 192.168.1.10:80 weight=3;
server 192.168.1.11:80;
server 192.168.1.12:80 backup;
keepalive 32;
}
server {
listen 80;
location / {
proxy_pass http://backend;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_connect_timeout 5s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;
}
# Health check endpoint
location /health {
access_log off;
return 200 "healthy\n";
add_header Content-Type text/plain;
}
}
}
Terminal window
# SSL configuration
server {
listen 443 ssl http2;
ssl_certificate /etc/ssl/certs/server.crt;
ssl_certificate_key /etc/ssl/certs/server.key;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers HIGH:!aNULL:!MD5;
ssl_prefer_server_ciphers on;
ssl_session_cache shared:SSL:10m;
ssl_session_timeout 10m;
location / {
proxy_pass http://backend;
}
}

/etc/named.conf
zone "example.com" {
type master;
file "db.example.com";
};
# Zone file
@ IN A 192.168.1.10
@ IN A 192.168.1.11
@ IN A 192.168.1.12
www IN A 192.168.1.10
www IN A 192.168.1.11
www IN A 192.168.1.12
Terminal window
# AWS CLI examples
aws route53 create-hosted-zone --name example.com --caller-reference "unique-id"
# Create record set with health check
aws route53 change-resource-record-sets \
--hosted-zone-id Z1234567890 \
--change-batch '{
"Changes": [{
"Action": "CREATE",
"ResourceRecordSet": {
"Name": "example.com",
"Type": "A",
"SetIdentifier": "primary",
"HealthCheckId": "abc123",
"AliasTarget": {
"HostedZoneId": "Z2FDTNDATAQYW2",
"DNSName": "dualstack.elb-123456789.us-east-1.elb.amazonaws.com",
"EvaluateTargetHealth": true
}
}
}]
}'

Database HA Pattern
+------------------------------------------------------------------+
| |
| Primary DB Standby DB |
| +----------+ +----------+ |
| | |----WAL------>| | |
| | Primary | Shipping | Standby | |
| | | | | |
| +----------+ +----------+ |
| |
+------------------------------------------------------------------+
patroni.yml
scope: postgres
name: postgresql0
restapi:
listen: 127.0.0.1:8008
connect_address: 127.0.0.1:8008
postgresql:
listen: 127.0.0.1:5432
data_dir: /data/postgresql0
pgpass: /tmp/pgpass
authentication:
replication:
username: replicator
password: password
etcd:
hosts: 127.0.0.1:2379
Terminal window
# sentinel.conf
sentinel monitor mymaster 127.0.0.1 6379 2
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 60000
sentinel parallel-syncs mymaster 1

check_ha.sh
#!/bin/bash
# Check if VIP is assigned
vip=$(ip addr show | grep 192.168.1.100)
if [ -z "$vip" ]; then
echo "CRITICAL: VIP not assigned"
exit 2
fi
# Check backend servers
backend_status=$(curl -s http://192.168.1.10:80/health)
if [ "$backend_status" != "healthy" ]; then
echo "WARNING: Backend 1 unhealthy"
fi
# Check haproxy
haproxy_check=$(systemctl is-active haproxy)
if [ "$haproxy_check" != "active" ]; then
echo "CRITICAL: HAProxy not running"
exit 2
fi
echo "OK: HA setup healthy"
exit 0

Complete HA Architecture
+------------------------------------------------------------------+
| |
| Client |
| | |
| v |
| DNS |
| +----+----+ |
| | | |
| v v |
| +----+ +----+ |
| | LB1| | LB2| |
| +----+ +----+ |
| | | |
| +---+-----+ |
| | |
| +---+-----+-----+ |
| | | |
| v v |
| +-------+ +-------+ |
| |App Srv| |App Srv| |
| | 1 | | 2 | |
| +-------+ +-------+ |
| | | |
| +------+-----+ |
| | |
| v |
| +----------+ |
| |Primary DB| |
| +----------+ |
| | |
| v |
| +----------+ |
| |Standby DB| |
| +----------+ |
| |
+------------------------------------------------------------------+
Terminal window
# /etc/keepalived/keepalived.conf (on both lb1 and lb2)
vrrp_script haproxy_check {
script "systemctl is-active haproxy"
interval 2
timeout 2
fall 3
rise 2
}
vrrp_instance HA_VIP {
state BACKUP
interface eth0
virtual_router_id 50
priority 100 # 100 on lb1, 90 on lb2
advert_int 1
authentication {
auth_type PASS
auth_pass haproxy_secret
}
virtual_ipaddress {
192.168.1.100/24
}
track_script {
haproxy_check
}
}

1. Single Point of Failure in Load Balancer

Section titled “1. Single Point of Failure in Load Balancer”

WRONG:

# Only one HAProxy instance
frontend http_front
bind *:80
default_backend app_servers

CORRECT:

keepalived.conf
# Multiple HAProxy with keepalived for VIP
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 51
priority 100
virtual_ipaddress {
10.0.0.100
}
}

Why: Single load balancer = single point of failure.


WRONG:

backend app_servers
server app1 10.0.1.10:80
server app2 10.0.1.11:80

CORRECT:

backend app_servers
option httpchk GET /health
server app1 10.0.1.10:80 check inter 3s fall 2 rise 2
server app2 10.0.1.11:80 check inter 3s fall 2 rise 2

Why: Without health checks, traffic goes to failed servers.


3. Session Persistence Without Consideration

Section titled “3. Session Persistence Without Consideration”

WRONG:

# Round robin without considering sessions
balance roundrobin

CORRECT:

# Use source IP or cookies for sticky sessions if needed
balance source
# OR
cookie SERVERID insert indirect nocache

Why: Without session affinity, users lose their session.


  1. Q: Explain the difference between Layer 4 and Layer 7 load balancing.

    • A: Layer 4 (TCP) routes based on IP/port, faster, no content inspection. Layer 7 (HTTP) can route based on URL, headers, cookies, more intelligent but slightly slower.
  2. Q: What is keepalived and how does it provide HA?

    • A: Keepalived uses VRRP (Virtual Router Redundancy Protocol) to share a virtual IP between multiple servers. One is MASTER, others are BACKUP. If MASTER fails, BACKUP takes the VIP.
  3. Q: What are the different HAProxy load balancing algorithms?

    • A: roundrobin (weighted), static-rr, leastconn (fewest connections), source (IP hash), uri (URL hash), url_param, hdr, rdp-cookie.
  1. Q: Your web application is slow. How would you diagnose load balancer issues?
    • A: Check HAProxy stats, verify backend server health, check connection counts, look for backend saturation, verify health checks are working, check for SSL termination issues.

In this chapter, you learned:

  • ✅ High availability concepts and architecture
  • ✅ SLA and uptime calculations
  • ✅ keepalived for IP failover
  • ✅ HAProxy load balancing
  • ✅ nginx as load balancer
  • ✅ DNS load balancing
  • ✅ Database HA patterns
  • ✅ HA monitoring and health checks
  • ✅ Complete HA architecture examples

Chapter 20: Troubleshooting Methodology


Last Updated: February 2026