DNS server upgrade

2015-02-09 - News - Tony Finch

This week we will replace the DNS servers with an entirely new setup. This change should be almost invisible: the new servers will behave the same as the old ones, and each switchover from an old to a new server will only take a few seconds so should not cause disruption.

The rollout will switch over the four service addresses on three occasions this week. We are avoiding changes during the working day, and rolling out in stages so we are able to monitor each change separately.

Tuesday 10 February, 18:00 -

  • Recursive server recdns1, 131.111.12.20 (expected outage 15s)

Wednesday 11 February, 08:00 -

  • Recursive server recdns0, 131.111.8.42 (expected outage 15s)
  • Authoritative server authdns1, 131.111.12.37 (expected outage 40s)

Thursday 12 February, 18:00 -

  • Authoritative server authdns0, 131.111.8.37 (expected outage 40s)

There will be a couple of immediate improvements to the DNS service, with more to follow:

  • Automatic failover for recursive DNS servers. There are servers in three different locations, two live, one backup, and when the West Cambridge Data Centre comes online there will be a second backup location.

  • DNSSEC signing moved off authdns0 onto a hidden master server, with support for signing Managed Zone Service domains.

There are extensive improvements to the DNS server management and administration infrastructure:

  • Configuration management and upgrade orchestration moved from ad-hoc to Ansible. The expected switchover timings above are based on test runs of the Ansible rollout / backout playbooks.

  • Revision control moved from SCCS to git, including a history of over 20,000 changes dating back to 1990.

  • Operating system moved from Solaris to Linux, to make better use of our local knowledge and supporting infrastructure.