Recursive DNS server failover with keepalived --vrrp

2015-01-09 - Progress

I have got keepalived working on my recursive DNS servers, handling failover for testdns0.csi.cam.ac.uk and testdns1.csi.cam.ac.uk. I am quite pleased with the way it works.

Documentation

It was difficult to get started because keepalived's documentation is TERRIBLE. More effort has been spent explaining how it is put together than explaining how to get it to work. The keepalived.conf man page is a barely-commented example configuration file which does not describe all the options. Some of the options are only mentioned in the examples in /usr/share/doc/keepalived/samples. Bah!

Edited to add: Oh good grief. I have found the keepalived configuration documentation hidden in keepalived.conf.SYNOPSIS.

The vital clue came from Graeme Fowler who told me about keepalived's vrrp_script feature which is "documented" in keepalived.conf.vrrp.localcheck which I never would have found without Graeme's help.

Overview

Keepalived is designed to run on a pair of load-balancing routers in front of a cluster of servers. It has two main parts. Its Linux Virtual Server daemon runs health checks on the back-end servers and configures the kernel's load balancing router as appropriate. The LVS stuff handles failover of the back-end servers. The other part of keepalived is its VRRP daemon which handles failover of the load-balancing routers themselves.

My DNS servers do not need the LVS load-balancing stuff, but they do need some kind of health check for named. I am running keepalived in VRRP-only mode and using its vrrp_script feature for health checks.

There is an SMTP client in keepalived which can notify you of state changes. It is too noisy for me, because I get messages from every server when anything changes. You can also tell keepalived to run scripts on state changes, so I am using that for notifications.

VRRP configuration

All my servers are configured as VRRP BACKUPs, and there is no MASTER. According to the VRRP RFC, the master is supposed to be the machine which owns the IP addresses. In my setup, no particular machine owns the service addresses.

I am using authentication mainly for additional protection against screwups (e.g. VRID collisions). VRRP password authentication doesn't provide any security: any attacker has to be on the local link so they can just sniff the password off the wire.

I am slightly surprised that it works when I set both IPv4 and IPv6 addresses on the same VRRP instance. The VRRP spec says you have to have separate vrouters for IPv4 and IPv6. Perhaps it works because keepalived doesn't implement real VRRP by default: it does not use a virtual MAC address but instead it just moves the virtual IP addresses and sends gratuitous ARPs to update the switches' forwarding tables. Keepalived has a use_vmac option but it seems rather fiddly to get working, so I am sticking with the default.

vrrp_instance testdns0 {
    virtual_router_id 210
    interface em1
    state BACKUP
    priority 50
    notify /etc/keepalived/notify
    authentication {
        auth_type PASS
        auth_pass XXXXXXXX
    }
    virtual_ipaddress {
        131.111.8.119/23
        2001:630:212:8::d:fff0
    }
    track_script {
        named_check_testdns0_1
        named_check_testdns0_2
        named_check_testdns0_3
        named_check_testdns0_4
    }
}

State change notifications

My notification script sends email when a server enters the MASTER state and takes over the IP addresses. It also sends email if the server dropped into the BACKUP state because named crashed.

#!/bin/sh
# this is /etc/keepalived/notify
instance=$2
state=$3
case $state in
(BACKUP)
    # do not notify if this server is working
    if /etc/keepalived/named_ok
    then exit 0
    else state=DEAD
    fi
esac
exim -t <<EOF
To: hostmaster@cam.ac.uk
Subject: $instance $state on $(hostname)
EOF

DNS server health checks and dynamic VRRP priorities

In the vrrp_instance snippet above, you can see that it specifies four vrrp_scripts to track. There is one vrrp_script for each possible priority, so that the four servers can have four different priorities for each vrrp_instance.

Each vrrp_script is specified using the Jinja macro below. (Four different vrrp_scripts for each of four different vrrp_instances is a lot of repetition!) The type argument is "recdns" or "testdns", the num is 0 or 1, and the prio is a number from 1 to 4.

Each script is run every "interval" seconds, and is allowed to run for up to "timeout" seconds. (My checking script should take at most 1 second.)

A positive "weight" setting is added to the vrrp_instance's priority to increse it when the script succeeds. (If the weight is negative it is added to the priority to decrease it when the script fails.)

{%- macro named_check(type,num,prio) -%}
vrrp_script named_check_{{type}}{{num}}_{{prio}} {
    script "/etc/keepalived/named_check {{type}} {{num}} {{prio}}"
    interval 1
    timeout 2
    weight {{ prio * 50 }}
}
{%- endmacro -%}

When keepalived runs the four tracking scripts for a vrrp_instance on one of my servers, at most one of the scripts will succeed. The priority is therefore adjusted to 250 for the server that should be live, 200 for its main backup, 150 and 100 on the other servers, and 50 on any server which is broken or out of service.

The checking script finds the position of the host on which it is running in a configuration file which lists the servers in priority order. A server can be commented out to remove it from service. The priority order for testdns1 is the opposite of the order for testdns0. So the following contents of /etc/keepalived/priority.testdns specifies that testdns1 is running on recdns-cnh, testdns0 is on recdns-wcdc, recdns-rnb is disabled, and recdns-sby is a backup.

recdns-cnh
#recdns-rnb
recdns-sby
recdns-wcdc

I can update this prioriy configuration file to change which machines are in service, without having to restart or reconfigure keepalived.

The health check script is:

#!/bin/sh

set -e

type=$1 num=$2 check=$3

# Look for the position of our hostname in the priority listing

name=$(hostname --short)

# -F = fixed string not regex
# -x = match whole line
# -n = print line number

# A commented-out line will not match, so grep will fail
# and set -e will make the whole script fail.

grepout=$(grep -Fxn $name /etc/keepalived/priority.$type)

# Strip off everything but the line number. Do this separately
# so that grep's exit status is not lost in the pipeline.

prio=$(echo $grepout | sed 's/:.*//')

# for num=0 later is higher priority
# for num=1 later is lower priority

if [ $num = 1 ]
then
    prio=$((5 - $prio))
fi

# If our priority matches what keepalived is asking about, then our
# exit status depends on whether named is running, otherwise tell
# keepalived we are not running at the priority it is checking.

[ $check = $prio ] && /etc/keepalived/named_ok

The named_ok script just uses dig to verify that the server seems to be working OK. I originally queried for version.bind, but there are very strict rate limits on the server info view so it did not work very well! So now the script checks that this command produces the expected output:

dig @localhost +time=1 +tries=1 +short cam.ac.uk in txt