Keepalived HAProxy

Chapter 63: Keepalived HAProxy

Overview

High Availability (HA) and Load Balancing are critical for production environments. This chapter covers HA concepts, keepalived, HAProxy, load balancing strategies, and building resilient infrastructure.

Why This Matters in DevOps/SRE

High Availability and Load Balancing are critical for production systems:

Uptime: HA ensures services remain available during failures
Scalability: Load balancers distribute traffic across multiple servers
On-Call: You’ll respond to HA cluster failures and load-related issues
Design: Understanding HA patterns is essential for infrastructure design
Cost: Proper load balancing optimizes resource utilization

Downtime costs can be $100,000+ per hour for critical systems.

63.1 High Availability Concepts

HA Fundamentals

                       High Availability Architecture
+------------------------------------------------------------------+
|                                                                   |
|  Single Point of Failure          High Availability               |
|  +----------+                     +----------+                    |
|  | Server   |                     |    LB    |                    |
|  |    |     |                     |    |    |                    |
|  |    v     |                     | +---+---+---+                |
|  | App      |                     | |   |   |   |                |
|  +----------+                     | v   v   v   v               |
|                                  +--+---+---+---+                |
|                                  |S1  S2  S3  |                |
|                                  +--+---+---+---+                |
|                                     Server Cluster               |
+------------------------------------------------------------------+

Key HA Concepts

# SLA (Service Level Agreement)
# 99.9% = 8.76 hours downtime/year
# 99.99% = 52.6 minutes downtime/year
# 99.999% = 5.26 minutes downtime/year

# HA Architecture Components
# - Redundancy: Multiple instances of everything
# - Failover: Automatic switching on failure
# - Load balancing: Distribute traffic
# - Monitoring: Detect failures quickly
# - Health checks: Verify service availability

HA Designs

# Active-Passive
# - Primary server handles traffic
# - Secondary server waits
# - Failover on primary failure

# Active-Active
# - All servers handle traffic
# - Better resource utilization
# - More complex setup

# N+1 Redundancy
# - N servers needed for load
# - 1 extra server for failover

# Geographic Redundancy
# - Multiple data centers
# - DNS failover
# - Data replication

63.2 keepalived

Installing keepalived

# Install keepalived
sudo pacman -S keepalived

# Enable and start
sudo systemctl enable --now keepalived

keepalived Configuration

# VRRP for IP failover

vrrp_instance VI_1 {
    state MASTER              # BACKUP on other servers
    interface eth0
    virtual_router_id 51

    priority 100              # 100 on master, 90 on backup
    advert_int 1

    authentication {
        auth_type PASS
        auth_pass secret123
    }

    virtual_ipaddress {
        192.168.1.100/24 dev eth0
    }

    # Notification scripts
    notify_master /etc/keepalived/notify.sh master
    notify_backup /etc/keepalived/notify.sh backup
    notify_fault /etc/keepalived/notify.sh fault
}

Script for Health Checks

#!/bin/bash
case "$1" in
    master)
        echo "Became MASTER" | logger
        # Start services
        systemctl start nginx
        ;;
    backup)
        echo "Became BACKUP" | logger
        ;;
    fault)
        echo "FAULT state" | logger
        ;;
esac

Keepalived with Health Checks

vrrp_script check_nginx {
    script "/usr/bin/pgrep nginx"
    interval 2
    timeout 2
    fall 3
    rise 2
}

vrrp_instance VI_1 {
    # ... other config ...

    track_script {
        check_nginx
    }
}

63.3 HAProxy

Installing HAProxy

# Install HAProxy
sudo pacman -S haproxy

# Start service
sudo systemctl enable --now haproxy

Basic HAProxy Configuration

global
    log /dev/log local0
    log /dev/log local1 notice
    chroot /var/lib/haproxy
    stats socket /run/haproxy/admin.sock mode 660 level admin
    stats timeout 30s
    user haproxy
    group haproxy
    daemon

    # Default SSL settings
    ssl-default-bind-ciphers PROFILE=SYSTEM
    ssl-default-server-ciphers PROFILE=SYSTEM

defaults
    log     global
    mode    http
    option  httplog
    option  dontlognull
    option  http-server-close
    option  forwardfor except 127.0.0.0/8
    option  redispatch
    retries 3
    timeout connect 5000
    timeout client  50000
    timeout server  50000
    errorfile 400 /etc/haproxy/errors/400.http
    errorfile 403 /etc/haproxy/errors/403.http
    errorfile 408 /etc/haproxy/errors/408.http
    errorfile 500 /etc/haproxy/errors/500.http
    errorfile 502 /etc/haproxy/errors/502.http
    errorfile 503 /etc/haproxy/errors/503.http
    errorfile 504 /etc/haproxy/errors/504.http

frontend http_front
    bind *:80
    bind *:443 ssl crt /etc/ssl/certs/server.pem

    default_backend web_servers

backend web_servers
    balance roundrobin
    option httpchk GET /health

    server web1 192.168.1.10:80 check inter 2000 rise 2 fall 3
    server web2 192.168.1.11:80 check inter 2000 rise 2 fall 3
    server web3 192.168.1.12:80 check inter 2000 rise 2 fall 3

HAProxy Load Balancing Algorithms

# Round Robin (default)
balance roundrobin

# Least Connections
balance leastconn

# Source IP (persistence)
balance source

# URI
balance uri

# URL parameter
balance url_param

# Header
balance hdr(User-Agent)

HAProxy Health Checks

# Basic TCP check
option tcpchk

# HTTP check
option httpchk
http-check expect status 200

# Custom health check
http-check GET /api/health HTTP/1.1\r\nHost:\ example.com

# Enable stats
listen stats
    bind *:8404
    stats enable
    stats uri /stats
    stats refresh 30s
    stats auth admin:password

SSL Termination with HAProxy

frontend https_front
    bind *:443 ssl crt /etc/ssl/certs/server.pem crt /etc/ssl/certs/

    # Redirect HTTP to HTTPS
    http-request redirect scheme https unless { ssl_fc }

    default_backend web_servers

# Backend with SSL
backend web_servers
    balance roundrobin
    option ssl-hello-chk
    server web1 192.168.1.10:443 check ssl verify none
    server web2 192.168.1.11:443 check ssl verify none

63.4 nginx as Load Balancer

nginx Configuration

http {
    upstream backend {
        least_conn;

        server 192.168.1.10:80 weight=3;
        server 192.168.1.11:80;
        server 192.168.1.12:80 backup;

        keepalive 32;
    }

    server {
        listen 80;

        location / {
            proxy_pass http://backend;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;

            proxy_connect_timeout 5s;
            proxy_send_timeout 60s;
            proxy_read_timeout 60s;
        }

        # Health check endpoint
        location /health {
            access_log off;
            return 200 "healthy\n";
            add_header Content-Type text/plain;
        }
    }
}

nginx with SSL Termination

# SSL configuration
server {
    listen 443 ssl http2;

    ssl_certificate /etc/ssl/certs/server.crt;
    ssl_certificate_key /etc/ssl/certs/server.key;

    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers HIGH:!aNULL:!MD5;
    ssl_prefer_server_ciphers on;

    ssl_session_cache shared:SSL:10m;
    ssl_session_timeout 10m;

    location / {
        proxy_pass http://backend;
    }
}

63.5 DNS Load Balancing

Round Robin DNS

zone "example.com" {
    type master;
    file "db.example.com";
};

# Zone file
@       IN      A       192.168.1.10
@       IN      A       192.168.1.11
@       IN      A       192.168.1.12
www     IN      A       192.168.1.10
www     IN      A       192.168.1.11
www     IN      A       192.168.1.12

Cloud DNS (Route 53)

# AWS CLI examples
aws route53 create-hosted-zone --name example.com --caller-reference "unique-id"

# Create record set with health check
aws route53 change-resource-record-sets \
    --hosted-zone-id Z1234567890 \
    --change-batch '{
        "Changes": [{
            "Action": "CREATE",
            "ResourceRecordSet": {
                "Name": "example.com",
                "Type": "A",
                "SetIdentifier": "primary",
                "HealthCheckId": "abc123",
                "AliasTarget": {
                    "HostedZoneId": "Z2FDTNDATAQYW2",
                    "DNSName": "dualstack.elb-123456789.us-east-1.elb.amazonaws.com",
                    "EvaluateTargetHealth": true
                }
            }
        }]
    }'

63.6 High Availability Patterns

Database HA

                       Database HA Pattern
+------------------------------------------------------------------+
|                                                                   |
|   Primary DB                    Standby DB                        |
|   +----------+               +----------+                         |
|   |          |----WAL------>|          |                         |
|   | Primary  |   Shipping   | Standby  |                         |
|   |          |               |          |                         |
|   +----------+               +----------+                         |
|                                                                   |
+------------------------------------------------------------------+

PostgreSQL HA with Patroni

scope: postgres
name: postgresql0

restapi:
  listen: 127.0.0.1:8008
  connect_address: 127.0.0.1:8008

postgresql:
  listen: 127.0.0.1:5432
  data_dir: /data/postgresql0
  pgpass: /tmp/pgpass
  authentication:
    replication:
      username: replicator
      password: password

etcd:
  hosts: 127.0.0.1:2379

Redis Sentinel

# sentinel.conf
sentinel monitor mymaster 127.0.0.1 6379 2
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 60000
sentinel parallel-syncs mymaster 1

63.7 Monitoring HA Setup

Health Check Script

#!/bin/bash
# Check if VIP is assigned
vip=$(ip addr show | grep 192.168.1.100)
if [ -z "$vip" ]; then
    echo "CRITICAL: VIP not assigned"
    exit 2
fi

# Check backend servers
backend_status=$(curl -s http://192.168.1.10:80/health)
if [ "$backend_status" != "healthy" ]; then
    echo "WARNING: Backend 1 unhealthy"
fi

# Check haproxy
haproxy_check=$(systemctl is-active haproxy)
if [ "$haproxy_check" != "active" ]; then
    echo "CRITICAL: HAProxy not running"
    exit 2
fi

echo "OK: HA setup healthy"
exit 0

63.8 Complete HA Example

Architecture Diagram

                     Complete HA Architecture
+------------------------------------------------------------------+
|                                                                   |
|     Client                                                        |
|        |                                                          |
|        v                                                          |
|      DNS                                                          |
|    +----+----+                                                   |
|    |         |                                                   |
|    v         v                                                   |
|  +----+   +----+                                                |
|  | LB1|   | LB2|                                                |
|  +----+   +----+                                                |
|    |         |                                                   |
|    +---+-----+                                                   |
|        |                                                         |
|    +---+-----+-----+                                            |
|    |            |                                                |
|    v            v                                                |
| +-------+  +-------+                                            |
| |App Srv|  |App Srv|                                            |
| |   1   |  |   2   |                                            |
| +-------+  +-------+                                            |
|    |            |                                                |
|    +------+-----+                                                |
|           |                                                      |
|           v                                                      |
|     +----------+                                                  |
|     |Primary DB|                                                  |
|     +----------+                                                  |
|           |                                                      |
|           v                                                      |
|     +----------+                                                  |
|     |Standby DB|                                                  |
|     +----------+                                                  |
|                                                                   |
+------------------------------------------------------------------+

Complete HAProxy with Keepalived

# /etc/keepalived/keepalived.conf (on both lb1 and lb2)

vrrp_script haproxy_check {
    script "systemctl is-active haproxy"
    interval 2
    timeout 2
    fall 3
    rise 2
}

vrrp_instance HA_VIP {
    state BACKUP
    interface eth0
    virtual_router_id 50

    priority 100    # 100 on lb1, 90 on lb2
    advert_int 1

    authentication {
        auth_type PASS
        auth_pass haproxy_secret
    }

    virtual_ipaddress {
        192.168.1.100/24
    }

    track_script {
        haproxy_check
    }
}

Common Mistakes & Anti-Patterns

1. Single Point of Failure in Load Balancer

WRONG:

# Only one HAProxy instance
frontend http_front
  bind *:80
  default_backend app_servers

CORRECT:

# Multiple HAProxy with keepalived for VIP
vrrp_instance VI_1 {
  state MASTER
  interface eth0
  virtual_router_id 51
  priority 100
  virtual_ipaddress {
    10.0.0.100
  }
}

Why: Single load balancer = single point of failure.

2. Not Health Checking Backends

WRONG:

backend app_servers
  server app1 10.0.1.10:80
  server app2 10.0.1.11:80

CORRECT:

backend app_servers
  option httpchk GET /health
  server app1 10.0.1.10:80 check inter 3s fall 2 rise 2
  server app2 10.0.1.11:80 check inter 3s fall 2 rise 2

Why: Without health checks, traffic goes to failed servers.

3. Session Persistence Without Consideration

WRONG:

# Round robin without considering sessions
balance roundrobin

CORRECT:

# Use source IP or cookies for sticky sessions if needed
balance source
# OR
cookie SERVERID insert indirect nocache

Why: Without session affinity, users lose their session.

Interview Questions

Conceptual Questions

Q: Explain the difference between Layer 4 and Layer 7 load balancing.
- A: Layer 4 (TCP) routes based on IP/port, faster, no content inspection. Layer 7 (HTTP) can route based on URL, headers, cookies, more intelligent but slightly slower.
Q: What is keepalived and how does it provide HA?
- A: Keepalived uses VRRP (Virtual Router Redundancy Protocol) to share a virtual IP between multiple servers. One is MASTER, others are BACKUP. If MASTER fails, BACKUP takes the VIP.
Q: What are the different HAProxy load balancing algorithms?
- A: roundrobin (weighted), static-rr, leastconn (fewest connections), source (IP hash), uri (URL hash), url_param, hdr, rdp-cookie.

Scenario-Based Questions

Q: Your web application is slow. How would you diagnose load balancer issues?
- A: Check HAProxy stats, verify backend server health, check connection counts, look for backend saturation, verify health checks are working, check for SSL termination issues.

Summary

In this chapter, you learned:

✅ High availability concepts and architecture
✅ SLA and uptime calculations
✅ keepalived for IP failover
✅ HAProxy load balancing
✅ nginx as load balancer
✅ DNS load balancing
✅ Database HA patterns
✅ HA monitoring and health checks
✅ Complete HA architecture examples

Next Chapter

Chapter 20: Troubleshooting Methodology

Last Updated: February 2026