IPv6 DAD-die issues

2018-03-26 - Progress - Tony Finch

Here's a somewhat obscure network debugging tale...

Context: recursive DNS server networking

Our central server network spans four sites across Cambridge, so it has a decent amount of resilience against power and cooling failures, and although it is a single layer two network, it is using some pretty fancy Cisco Nexus switches to provide plenty of redundant connectivity.

We have four recursive DNS servers, one at each site, usually two live and two hot spare. They are bare metal machines, which are intended to be able to boot up and provide service even if everything else is broken, provided they have power and cooling and network in at least one site.

The server network has several VLANs, and our resolver service addresses are on two of them: 131.111.8.42 is on VLAN 808, and 131.111.12.20 is on VLAN 812. So that any of the servers can provide service on either address, their switch ports are configured to deliver VLAN 808 untagged (so the servers can be provisioned using PXE booting without any special config) and VLAN 812 tagged.

Context: complying with reverse path filtering

There is strict reverse path filtering on the server network routers, so I have to make sure my resolvers use the correct VLAN depending on the source address. The trick is to use policy routing to match source addresses, since the normal routing table only looks at destination addresses.

The servers run Ubuntu, so this is configured in /etc/network/interfaces by adding a couple of up and down commands. Here's an example; there are four similar blocks in the config, for VLAN 808 and VLAN 812, and for IPv4 and IPv6.

    iface em1.812 inet static
        address 131.111.12.{{ ifnum }}
        netmask 24

        up   ip -4 rule  add from 131.111.12.0/24 table 12
        down ip -4 rule  del from 131.111.12.0/24 table 12
        up   ip -4 route add default table 12 via 131.111.12.62
        down ip -4 route del default table 12 via 131.111.12.62

The bug: missing IPv6 policy routing

On Sunday we had some scheduled power work in one of our machine rooms. On Monday I found that the server in that room was not answering correctly over IPv6.

The machine had mostly booted OK, but it had partially failed to configure its network interfaces: everything was there except for the IPv6 policy routing, which meant that answers over IPv6 were being sent out of the wrong interfaces and dropped by the routers.

The logs were not completely clear, but it looked like the server had booted faster than the switch that it was connected to, so it had tried to configure its network interfaces when there was no network.

Two possible fixes

One approach might have been to add a script that waits for the network to come up in /etc/network/if-pre-up.d. But this is likely to be unreliable in bad situations where it is extra important that the server boots predictably.

The other approach, suggested by David McBride, was to try disabling IPv6 duplicate address detection. He found the dad-attempts option in the interfaces(5) man page, which looked very promising.

Edited to add: Chris Share pointed out that there is a third option: DAD can be disabled using sysctl net.ipv6.conf.default.accept_dad=0 which is probably simpler than individually nobbling each network interface.

Debugging

I went downstairs to the machine room in our office building to try booting a server with the ethernet cable unlugged. This nicely reproduced the problem.

I then tried adding the dad-attempts option, and booting again. The server booted successfully!

No need for a horrible pre-up script, yay!

Moans

The ifupdown man pages are not very good at explaining how the program works: they don't explain the /etc/network/if-*.d hook scripts, nor how the dad-attempts option works.

I dug around in its source code, and I found that ifupdown's DAD logic is implemented by the script /lib/ifupdown/settle-dad.sh, which polls the output of ip -6 address list. If it times out while the address is still marked "tentative" (because the network is down) the script declares failure, and ifupdown breaks.

The other key part is the nodad option to ip -6 addr add, which is undocumented.

This made it somewhat harder to find the fix and understand it. Bah.

Risks

I've now disabled duplicate address detection on my DNS servers, though I might have gone a bit far by disabling it on my VMs as well as the recursive servers. The point of DAD is to avoid accidentally breaking the network, so it's a bit arrogant to turn it off. On the other hand, if I have misconfigured duplicate IPv6 addresses, I have almost certainly done the same for IPv4, so I have still accidentally broken the network...