All articles

How about using 1Password with Ansible?

2021-11-23 - Progress - Tony Finch

i have been looking at how to use the 1Password op command-line tool with Ansible. It works fairly nicely.

You need to install the 1Password command-line tool.

You need a recent enough Ansible with the community.general collection installed, so that it includes the onepassword lookup plugin

To try out an example, create an op.yml file containing:

---
- hosts: localhost
  tasks:
  - name: test 1password
    debug:
      msg: <{{ lookup("onepassword",
                      "Mythic Beasts",
                      field="username") }}>

You might need to choose an item other than Mythic Beasts if you don't have a login with them.

Initialize op and start a login session by typing:

eval $(op signin)

Then see if Ansible works:

ansible-playbook op.yml

Amongst the Ansible verbiage, I get the output:

ok: [localhost] => {
    "msg": "<hostmaster@cam.ac.uk>"
}

Some more detailed notes follow...

aims

I want it to be easy to keep secrets encrypted when they are not in use. Things like ssh private keys, static API credentials, etc. "Not in use" means when not installed on the system that needs them.

In particular, secrets should normally be encrypted on any systems on which we run Ansible, and decrypted only when they need to be deployed.

And it should be easy enough that everyone on the team is able to use it.

what about regpg?

I wrote regpg to tackle this problem in a way I consider to be safe. It is modestly successful: people other than me use it, in more places than just Cambridge University Information Services.

But it was not the right tool for the Network Systems team in which I work. It isn't possible for a simple wrapper like regpg to fix gpg's usability issues: in particular, it's horrible if you don't have a unix desktop, and it's horrible over ssh.

1password

Since I wrote regpg we have got 1Password set up for the team. I have used 1Password for my personal webby login things for years, and I'm happy to use it at work too.

There are a couple of ways to use 1Password for ops automation...

secrets automation and 1password connect

First I looked at the relatively new support for "secrets automation" with 1Password. It is based around a 1Password Connect server, which we would install on site. This can provide an application with short-term access to credentials on demand via a REST API. (Sounds similar to other cloudy credential servers such as Hashicorp Vault or AWS IAM.)

However, the 1Password Connect server needs credentials to get access to our vaults, and our applications that use 1Password Connect need API access tokens. And we need some way to deploy these secrets safely. So we're back to square 1.

1password command line tool

The op command has basically the same functionality as 1Password's GUIs. It has a similar login model, in that you type in your passphrase to unlock the vault, and it automatically re-locks after an inactivity timeout. (This is also similar to the way regpg relies on the gpg agent to cache credentials so that an Ansible run can deploy lots of secrets with only one password prompt.)

So op is clearly the way to go, though there are a few niggles:

  • The op configuration file contains details of the vaults it has been told about, including your 1Password account secret key in cleartext. So the configuration file is sensitive and should be kept safe. (It would be better if op stored the account secret key encrypted using the user's password.)

  • op signin uses an environment variable to store the session key, which is not ideal because it is easy to accidentally leak the contents of environment variables. It isn't obvious that a collection of complicated Ansible playbooks can be trusted to handle environment variables carefully.

  • It sometimes reauires passing secrets on the command line, which exposes them to all users on the system. For instance, the documented way to find out whether a session has timed out is with a command line like:

    $ op signin --session $OP_SESSION_example example
    

I have reported these issues to the 1Password developers.

Ansible and op

Ansible's community.general collection includes some handy wrappers around the op command, in particular the onepassword lookup plugin. (I am not so keen on the others because the documentation suggests to me that they do potentially unsafe things with Ansible variables.)

One of the problems I had with regpg was bad behaviour that occurred when an Ansible playbook was started when the gpg agent wasn't ready; the fix was to add a task to the start of the Ansible playbook which polls the gpg agent in a more controlled manner.

I think a similar preflight task might be helpful for op:

  • check if there is an existing op session; if not, prompt for a passphrase to start a session

  • set up a wrapper command for op that gets the session key from a more sensible place than the environment

To refresh a session safely, and work around the safety issue with op signin mentioned above, we can test the session using a benign command such as op list vaults or op get account, and run op signin if that fails.

The wrapper script can be as simple as:

#!/bin/sh
OP_SESSION_example=SQUEAMISHOSSIFRAGE /usr/local/bin/op "$@"

Assuming there is somewhere sensible and writable on $PATH...

Managed Zone Service CNAME relaxation

2020-06-03 - News - Tony Finch

The MZS is our service for registering non-cam.ac.uk domains.

Underscores are now allowed in the names and targets of CNAME records so that they can be used for non-hostname purposes.

Stealth secondaries and Cisco Jabber

2020-04-30 - News - Tony Finch

The news part of this item is that I've updated the stealth secondary documentation with a warning about configuring servers (or not configuring them) with secondary zones that aren't mentioned in the sample configuration files.

One exception to that is the special Cisco Jabber zones supported by the phone service. There is now a link from our stealth secondary DNS documentation to the Cisco Jabber documentation, but there are tricky requirements and caveats, so you need to take care.

The rest of this item is the story of how we discovered the need for these warnings.

Read more ...

Release announcement: nsdiff-1.79

2020-04-27 - News - Tony Finch

I have released a new version of nsdiff.

This release removes TYPE65534 records from the list of DNSSEC-related types that nsdiff ignores.

TYPE65534 is the private type that BIND uses to keep track of incremental signing. These records usually end up hanging around after signing is complete, cluttering up the zone. It would be neater if they were removed automatically.

In fact, it's safe to try to remove them using DNS UPDATE: if the records can be removed (because signing is complete), thy will be; if they can't be removed then they are quietly left in place, and the rest of the update is applied.

After this change you can clean away TYPE65534 records using nsdiff or nsvi. In our deployment, nspatch runs hourly and will now automatically clean TYPE65534 records when they are not needed.

Firefox and DNS-over-HTTPS

2020-02-27 - News - Tony Finch

The latest release of Firefox enables DoH (encrypted DNS-over-HTTPS) by default for users in the USA, with DNS provided by Cloudflare. This has triggered some discussion and questions, so here's a reminder of what we have done with DoH.

Read more ...

DNS server resilience and network outages

2020-02-18 - News - Tony Finch

Our recursive DNS servers are set up to be fairly resilient. Each server is physical hardware, so that they only need power and networking in order for the DNS to work, avoiding hidden circular dependencies.

We use keepalived to determine which of the physical servers is in live service. It does two things for us:

  • We can move live service from one server to another with minimal disruption, so we can patch and upgrade servers without downtime.

  • The live DNS service can recover automatically from things like server hardware failure or power failure.

This note is about coping with network outages, which are more difficult.

Read more ...

SHA-1 and DNSSEC validation

2020-02-14 - News - Tony Finch

This is a follow-up to my article last month about SHA-1 chosen-prefix collisions and DNSSEC.

Summary

DNSSEC validators should continue to treat SHA-1 signatures as secure until DNSSEC signers have had enough time to perform algorithm rollovers and eliminate SHA-1 from the vast majority of signed zones.

Read more ...

Review of 2019

2020-01-29 - Progress - Tony Finch

Some notes looking back on what happened last year...

Read more ...

Managed Zone Service improvements

2020-01-24 - News - Tony Finch

The MZS is our service for registering non-cam.ac.uk domains.

The web user interface for the MZS has moved to https://mzs.dns.cam.ac.uk/; the old names redirect to the new place.

You can now manage TXT records in your zones in the MZS.

The expiry date of each zone (when its 5 year billing period is up) is now tracked in the MZS database and is visible in the web user interface.

DNSSEC algorithm rollover HOWTO

2020-01-15 - Progress - Tony Finch

Here are some notes on how to upgrade a zone's DNSSEC algorithm using BIND. These are mainly written for colleagues in the Faculty of Maths and the Computer Lab, but they may be of interest to others.

I'll use botolph.cam.ac.uk as the example zone. I'll assume the rollover is from algorithm 5 (RSASHA1) to algorithm 13 (ECDSA-P256-SHA-256).

Read more ...

SHA-1 chosen prefix collisions and DNSSEC

2020-01-09 - News - Tony Finch

Thanks to Viktor Dukhovni for helpful discussions about some of the details that went in to this post.

On the 7th January, a new more flexible and efficient collision attack against SHA-1 was announced: SHA-1 is a shambles. SHA-1 is deprecated but still used in DNSSEC, and this collision attack means that some attacks against DNSSEC are now merely logistically challenging rather than being cryptographically infeasible.

As a consequence, anyone who is using a SHA-1 DNSKEY algorithm (algorithm numbers 7 or less) should upgrade. The recommended algorithms are 13 (ECDSAP256SHA256) or 8 (RSASHA256, with 2048 bit keys).

Update: I have written a follow-up note about SHA-1 and DNSSEC validation

Read more ...

SHA-1 is a shambles

2020-01-07 - News - Tony Finch

Happy new (calendar) year!

Our previous news item on DNS delegation updates explained that we are changing the DNSSEC signature algorithm on all UIS zones from RSA-SHA-1 to ECDSA-P256-SHA-256. Among the reasons I gave was that SHA-1 is rather broken.

SHAmbles

Today I learned that SHA-1 is a shambles: a second SHA-1 collision has been constructed, so it is now more accurate to say that SHA-1 is extremely broken.

The new "SHAmbles" collision is vastly more affordable than the 2017 "SHAttered" collision and makes it easier to construct practical attacks.

DNSSEC implications

As well as the UIS zones (which are now mostly off RSA-SHA-1), Maths and the Computer Lab have a number of zones signed with RSA-SHA-1. These should also be upgraded to a safer algorithm. I will be contacting the relevant people directly to co-ordinate this change.

I have written some more detailed notes on the wider implications of SHA-1 chosen prefix collisions and DNSSEC.

DNS delegation updates

2019-12-18 - News - Tony Finch

Season's greetings! I bring tidings of great joy! A number of long term DNS projects have reached a point where some big items can be struck off the to-do list.

This note starts with two actions item for those for whom we provide secondary DNS. Then, a warning for those who secondary our zones, including stealth secondaries.

Read more ...

A WebDriver tutorial

2019-12-12 - Progress - Tony Finch

As part of my work on superglue I have resumed work on the WebDriver scripts I started in January. And, predictably because they were a barely working mess, it took me a while to remember how to get them working again.

So I thought it might be worth writing a little tutorial describing how I am using WebDriver. These notes have nothing to do with my scripts or the DNS; it's just about the logistics of scripting a web site.

Read more ...

Jackdaw Apache upgrade

2019-11-21 - News - Tony Finch

This afternoon our colleagues who run Jackdaw will upgrade the web server software. (The IP Register database is an application hosted on Jackdaw, alongside the user-admin database and a few others.) This will entail a brief outage for the IP Register web user interface. The DNS will not be affected.

Make before break

2019-11-18 - Progress - Tony Finch

This afternoon I did a tricky series of reconfigurations. The immediate need was to do some prep work for improving our DNS blocks; I also wanted to make some progress towards completing the renaming/renumbering project that has been on the back burner for most of this year; and I wanted to fix a bad quick-and-dirty hack I made in the past.

Along the way I think I became convinced there's an opportunity for a significant improvement.

Read more ...

YAML and Markdown

2019-11-13 - Progress - Tony Finch

This web site is built with a static site generator. Each page on the site has a source file written in Markdown. Various bits of metadata (sidebar links, title variations, blog tags) are set in a bit of YAML front-matter in each file.

Both YAML and Markdown are terrible in several ways.

YAML is ridiculously over-complicated and its minimal syntax can hide minor syntax errors turning them into semantic errors. (A classic example is a list of two-letter country codes, in which Norway (NO) is transmogrified into False.)

Markdown is poorly defined, and has a number of awkward edge cases where its vagueness causes gotchas. It has spawned several dialects to fill in some of its inadequacies, which causes compatibility problems.

However, they are both extremely popular and relatively pleasant to write and read.

For this web site, I have found that a couple of simple sanity checks are really helpful for avoiding cockups.

YAML documents

One of YAML's peculiarities is its idea of storing multiple documents in a stream.

A YAML document consists of a --- followed by a YAML value. You can have multiple documents in a file, like these two:

---
document: one
---
document: two

YAML values don't have to be key/value maps: they can also be simple strings. So you can also have a two-document file like:

--- one
--- two

YAML has a complicated variety of multiline string syntaxes. For the simple case of a preformatted string, you can use the | sigil. This document is like the previous one, except that the strings have newlines:

--- |
one
--- |
two

YAML frontmatter

The source files for this web site each start with something like this (using this page as an example, and cutting off after the title):

---
tags: [ progress ]
authors: [ fanf2 ]
--- |
YAML and Markdown
=================

This is a YAML stream consisting of two documents, the front matter (a key/value map) and the Markdown page body (a preformatted string).

There's a fun gotcha. I like to use underline for headings because it helps to make them stand out in my editor. If I ever have a three-letter heading, that splits the source file into a third YAML document. Oops!

So my static site generator's first sanity check is to verify there are exactly two YAML documents in the file.

Aside: There is also a YAML document end marker, ..., but I have not had problems with accidentally truncated pages because of it!

Tabs and indentation

Practically everything (terminals, editors, pagers, browsers...) by default has tab stops every 8 columns. It's a colossal pain in the arse to have to reconfigure everything for different tab stops, and even more of a pain in the arse if you have to work on projects that expect different tab stop settings. (PostgreSQL is the main offender of the projects I have worked with, bah.)

I don't mind different coding styles, or different amounts of indentation, so long as the code I am working on has a consistent style. I tend to default to KNF (the Linux / BSD kernel normal form) if I'm working on my own stuff, which uses one tab = one indent.

The only firm opinion I have is that if you are not using 8 column tab stops and tabs for indents, then you should use spaces for indents.

Indents in Markdown

Markdown uses indentation for structure, either a 4-space indent or a tab indent. This is a terrible footgun if tabs are displayed in the default way and you accidentally have a mixture of spaces and tabs: an 8 column indent might be one indent level or two, depending on whether it is a tab or spaces, and the difference is mostly invisible.

So my static site generator's second sanity check is to ensure there are no tabs in the Markdown.

This is a backup check, in case my editor configuration is wrong and unintentionally leaks tabs.

BIND security release

2019-10-17 - News - Tony Finch

Last night ISC.org published security releases of BIND.

For full details, please see the announcement messages: https://lists.isc.org/pipermail/bind-announce/2019-October/thread.html

The vulnerabilities affect two features that are new in BIND 9.14, mirror zones and QNAME minimization, and we are not affected because we are not using either feature.

Read more ...

Upcoming change to off-site secondary DNS

2019-10-01 - News - Tony Finch

This notice is mainly for the attention of those who run DNS zones for which the UIS provides secondary servers.

Since 2015 we have used the Internet Systems Consortium secondary name service (ISC SNS) to provide off-site DNS service for University domains.

ISC announced yesterday that the SNS is closing down in January 2020, so we need alternative arrangements.

We have not yet started to make any specific plans, so this is just to let you know that there will be some changes in the next few months. We will let you know when we have more details.

Metadata for login credentials

2019-09-28 - Progress - Tony Finch

This month I have been ambushed by domain registration faff of multiple kinds, so I have picked up a few tasks that have been sitting on the back burner for several months. This includes finishing the server renaming that I started last year, solidifying support for updating DS records to support automated DNSSEC key rollovers, and generally making sure our domain registration contact information is correct and consistent.

I have a collection of domain registration management scripts called superglue, which have always been an appalling barely-working mess that I fettle enough to get some task done then put aside in a slightly different barely-working mess.

I have reduced the mess a lot by coming up with a very simple convention for storing login credentials. It is much more consistent and safe than what I had before.

The login problem

One of the things superglue always lacked is a coherent way to handle login credentials for registr* APIs. It predates regpg by a few years, but regpg only deals with how to store the secret parts of the credentials. The part that was awkward was how to store the non-secret parts: the username, the login URL, commentary about what the credentials are for, and so on. The IP Register system also has this problem, for things like secondary DNS configuration APIs and database access credentials.

There were actually two aspects to this problem.

Ad-hoc data formats

My typical thoughtless design process for the superglue code that loaded credentials was like, we need a username and a password, so we'll bung them in a file separated by a colon. Oh, this service needs more than that, so we'll have a multi-line file with fieldname colon value on each line. Just terrible.

I decided that the best way to correct the sins of the past would be to use an off-the-shelf format, so I can delete half a dozen ad-hoc parsers from my codebase. I chose YAML not because it is good (it's not) but because it is well-known, and I'm already using it for Ansible playbooks and page metadata for this web server's static site generator.

Secret hygiene

When designing regpg I formulated some guidelines for looking after secrets safely.

From our high-level perspective, secrets are basically blobs of random data: we can't usefully look at them or edit them by hand. So there is very little reason to expose them, provided we have tools (such as regpg) that make it easy to avoid doing so.

Although regpg isn't very dogmatic, it works best when we put each secret in its own file. This allows us to use the filename as the name of the secret, which is available without decrypting anything, and often all the metadata we need.

That weasel word "often" tries to hide the issue that when I wrote it two years ago I did not have an answer to the question, what if the filename is not all the metadata we need?

I have found that my ad-hoc credential storage formats are very bad for secret hygiene. They encourage me to use the sinful regpg edit command, and decrypt secrets just to look at the non-secret parts, and generally expose secrets more than I should.

If the metadata is kept in a separate cleartext YAML file, then the comments in the YAML can explain what is going on. If we strictly follow the rule that there's exactly one secret in an encrypted file and nothing else, then there's no reason to decrypt secrets unnecessarily everything we need to know is in the cleartext YAML file.

Implementation

I have released regpg-1.10 which includes ReGPG::Login a Perl library for loading credentials stored in my new layout convention. It's about 20 simple lines of code.

Each YAML file example-login.yml typically looks like:

# commentary explaining the purpose of this login
---
url: https://example.com/login
username: alice
gpg_d:
  password: example-login.asc

The secret is in the file example-login.asc alongside. The library loads the YAML and inserts into the top-level object the decrypted contents of the secrets listed in the gpg_d sub-object.

For cases where the credentials need to be available without someone present to decrypt them, the library looks for a decrypted secret file example-login (without the .asc extension) and loads that instead.

The code loading the file can also list the fields that it needs, to provide some protection against cockups. The result looks something like,

my $login = read_login $login_file, qw(username password url);
my $auth = $login->{username}.':'.$login->{password};
my $authorization = 'Basic ' . encode_base64 $auth, '';
my $r = LWP::UserAgent->new->post($login->{url},
          Authorization => $authorization,
          Content_Type => 'form-data',
          Content => [ hello => 'world' ]
      );

Deployment

Secret storage in the IP Register system is now a lot more coherent, consistent, better documented, safer, ... so much nicer than it was. And I got to delete some bad code.

I only wish I had thought of this sooner!

Firefox and DNS-over-HTTPS

2019-09-19 - News - Tony Finch

We are configuring our DNS to tell Firefox to continue to use the University's DNS servers, and not to switch to using Cloudflare's DNS servers instead.

Most of this article is background information explaining the rationale for this change. The last section below gives an outline of the implementation details.

Read more ...

Migrating a website with Let's Encrypt

2019-09-03 - Progress - Tony Finch

A few months ago I wrote about Let's Encrypt on clustered Apache web servers. This note describes how to use a similar trick for migrating a web site to a new server.

The situation

You have an existing web site, say www.botolph.cam.ac.uk, which is set up with good TLS security.

It has permanent redirects from http://… to https://… and from bare botolph.cam.ac.uk to www.botolph.cam.ac.uk. Permanent redirects are cached very aggressively by browsers, which take "permanent" literally!

The web site has strict-transport-security with a long lifetime.

You want to migrate it to a new server.

The problem

If you want to avoid an outage, the new server must have similarly good TLS security, with a working certificate, before the DNS is changed from the old server to the new server.

But you can't easily get a Let's Encrypt certificate for a server until after the DNS is pointing at it.

A solution

As in my previous note, we can use the fact that Let's Encrypt will follow redirects, so we can provision a certificate on the new server before changing the DNS.

on the old server

In the http virtual hosts for all the sites that are being migrated (both botolph.cam.ac.uk and www.botolph.cam.ac.uk in our example), we need to add redirects like

Redirect /.well-known/acme-challenge/ \
        http://{{newserver}}/.well-known/acme-challenge/

where {{newserver}} is the new server's host name (or IP address).

This redirect needs to match more specifically than the existing http -> https redirect, so that Let's Encrypt is sent to the new server, while other requests are bounced to TLS.

on the new server

Run the ACME client to get a certificate for the web sites that are migrating. The new server needs to serve ACME challenges for the web site names botolph.cam.ac.uk and www.botolph.cam.ac.uk from the {{newserver}} default virtual host. This is straightforward with the ACME client I use, dehydrated.

migrate

It should now be safe to update the DNS to move the web sites from the old server to the new one. To make sure, there are various tricks you can use to test the new server before updating the DNS [1] [2].

Work planning

2019-08-29 - Future - Tony Finch

I'm back from a summer holiday and it is "Back to School" season, so now seems like a good time to take stock and write down some plans.

This is roughly in order of priority.

Read more ...

Brexit and .eu domain names (update)

2019-07-22 - News - Tony Finch

There has been a change to the eligibility criteria for .eu domain names related to Brexit.

Previously, eligibility was restricted to (basically) being located in the EU. The change extends eligibility for domains owned by individuals to EU citizens everywhere.

Organizations in the UK that have .eu domain names will still need to give them up at Brexit.

Thanks to Alban Milroy for bringing this to our attention.

More complicated ops

2019-07-18 - Progress - Tony Finch

This week I am back porting ops pages from v2 to v3.

I'm super keen to hear any complaints you have about the existing user interface. Please let ip-register@uis.cam.ac.uk know of anything you find confusing or awkward! Not everything will be addressed in this round of changes but we'll keep them in mind for future work.

API cookies

Jackdaw has separate pages to download an API cookie and manage API cookies. The latter is modal and switches between an overview list and a per-cookie page.

In v3 they have been commbined into a single page (screenshot below) with less modality, and I have moved the verbiage to a separate API cookie documentation page.

While I was making this work I got terribly confused that my v3 cookie page did not see the same list of cookies as Jackdaw's manage-cookies page, until I realised that I should have been looking at the dev database on Ruff. The silliest bugs take the longest to fix...

Single ops

Today I have started mocking up a v3 "single ops" page. This is a bit of a challenge, because the existing page is rather cluttered and confusing, and it's hard to improve within the constraint that I'm not changing its functionality.

I have re-ordered the page to be a closer match to the v3 box ops page. The main difference is that the address field is near the top because it is frequently used as a primary search key.

There is a downside to this placement, because it separates the address from the other address-related fields which are now at the bottom: the address's mzone, lan, subnet, and the mac and dhcp group that are properties of the address rather than properties of the box.

On the other hand, I wanted to put the address-related fields near the register and search buttons to hint that they are kind of related: you can use the address-related fields to get the database to automatically pick an address for registration following those constraints, or you can search for boxes within the constraints.

Did you know that (like table ops but unlike most other pages) you can use SQL-style wildcards to search on the single ops page?

Finally, a number of people said that the mzone / lan boxes are super awkward, and they explicitly asked for a drop-down list. This breaks the rule against no new functionality, but I think it will be simple enough that I can get away with it. (Privileged users still get the boxes rather than a drop-down with thousands of entries!)

Refactored error handling

2019-07-12 - Progress - Tony Finch

This week I have been refactoring the error handling of the "v3" IP Register web interface that I posted screenshots of last week. There have not been any significant visible changes, but I have reduced the code size by nearly 200 lines compared to last week, and fixed a number of bugs in the process.

On top of the previous refactorings, the new code is quite a lot smaller than the existing web interface on Jackdaw.

pagev2 linesv3 lineschange
box28315153%
vbox32718255%
aname24411547%
cname1735632%
mx25311545%
srv27211642%
motd512243%
totp662030%

Eight ops pages ported

2019-07-04 - Progress - Tony Finch

This week I passed a halfway mark in porting web pages from the old IP Register web interface on Jackdaw to the "v3" web interface. The previous note on this topic was in May when the first ops page was ported to v3, back before the Beer Festival and the server patching work.

Read more ...

DNS update confirmations

2019-06-27 - News - Tony Finch

You might have noticed that the IP Register ops pages on Jackdaw now have a note in the title stating when the last DNS update completed. (Updates start at 53 minutes past each hour and usually take a couple of minutes.)

Occasionlly the update process breaks. It is written fairly conservatively so that if anything unexpected happens it stops and waits for someone to take a look. Some parts of the build process are slightly unreliable, typically parts that push data to other systems. Many of these push actions are not absolutely required to work, and it is OK to retry when the build job runs again in an hour.

Over time we have made the DNS build process less likely to fail-stop, as we have refined the distinction between actions that must work and actions that can be retried in an hour. But the build process changes, and sometimes the new parts fail-stop when they don't need to. That happened earlier this week, which prompted us to add the last update time stamp, so you a little more visibility into how the system is working (or not).

DNS-over-HTTPS and encrypted SNI

2019-06-24 - News - Tony Finch

Recent versions of Firefox make it easier to set up encrypted DNS-over-HTTPS. If you use Firefox on a fixed desktop, go to Preferences -> General -> scroll to Network Settings at the bottom -> Enable DNS over HTTPS, Custom: https://rec.dns.cam.ac.uk/. (Our DNS servers are only available on the CUDN so this setting isn't suitable for mobile devices.)

Very recent versions of Firefox also support encrypted server name indication. When connecting to a web server the browser needs to tell the web server which site it is looking for. HTTPS does this using Server Name Indication, which is normally not encrypted unlike the rest of the connection. ESNI fixes this privacy leak.

To enable ESNI, go to about:config and verify that network.security.esni.enabled is true.

MZS web site upgrade failed

2019-06-20 - News - Tony Finch

[ This problem has been resolved, with help from Peter Heiner. Thanks! ]

Unfortunately the Managed Zone Service operating system upgrade this evening failed when we attempted to swap the old and new servers. As a result the MZS admin web site is unavailable until tomorrow.

DNS service for MZS domains is unaffected.

We apologise for any inconvenience this may cause.

BIND 9.14.3 and CVE-2019-6471

2019-06-20 - News - Tony Finch

Last night, isc.org announced patch releases of BIND

This vulnerability affects all supported versions of BIND.

Hot on the heels of our upgrade to 9.14.2 earlier this week, I will be patching our central DNS servers to 9.14.3 today. There should be no visible interruption to service.

SACK panic

2019-06-18 - News - Tony Finch

I have applied a workaround for the Linux SACK panic bug (CVE-2019-11477) to the DNS and other servers, as a temporary measure until they can be patched properly later today.

Clustering Let's Encrypt with Apache

2019-06-17 - Progress - Tony Finch

A few months ago I wrote about bootstrapping Let's Encrypt on Debian. I am now using Let's Encrypt certificates on the live DNS web servers.

Clustering

I have a smallish number of web servers (currently 3) and a smallish number of web sites (also about 3). I would like any web server to be able to serve any site, and dynamically change which site is on which server for failover, deployment canaries, etc.

If server 1 asks Let's Encrypt for a certificate for site A, but site A is currently hosted on server 0, the validation request will not go to server 1 so it won't get the correct response. It will fail unless server 0 helps server 1 to validate certificate requests from Let's Encrypt.

Validation servers

I considered various ways that my servers could co-operate to get certificates, but they all required extra machinery for authentication and access control that I don't currently have, and which would be tricky and important to get right.

However, there is a simpler option based on HTTP redirects. Thanks to Malcolm Scott for reminding me that ACME http-01 validation requests follow redirects! The Let's Encrypt integration guide mentions this under "picking a challenge type" and "central validation servers".

Decentralized validation

Instead of redirecting to a central validation server, a small web server cluster can co-operate to validate certificates. It goes like this:

  • server 1 requests a cert for site A

  • Let's Encrypt asks site A for the validation response, but this request goes to server 0

  • server 0 discovers it has no response, so it speculatively replies with a 302 redirect to one of the other servers

  • Let's Encrypt asks the other server for the validation response; after one or two redirects it will hit server 1 which does have the response

This is kind of gross, because it turns 404 "not found" errors into 302 redirect loops. But that should not happen in practice.

Apache mod_rewrite

My configuration to do this is a few lines of mod_rewrite. Yes, this doesn't help with the "kind of gross" aspect of this setup, sorry!

The rewrite runes live in a catch-all port 80 <VirtualHost> which redirects everything (except for Let's Encrypt) to https. I am not using the dehydrated-apache2 package any more; instead I have copied its <Directory> section that tells Apache it is OK to serve dehydrated's challenge responses.

I use Ansible's Jinja2 template module to install the configuration and fill in a couple of variables: as usual, {{inventory_hostname}} is the server the file is installed on, and in each server's host_vars file I set {{next_acme_host}} to the next server in the loop. The last server redirects to the first one, like web0 -> web1 -> web2 -> web0. These are all server host names, not virtual hosts or web site names.

Code

<VirtualHost *:80>
 ServerName {{inventory_hostname}}

 RewriteEngine on
 # https everything except acme-challenges
 RewriteCond %{REQUEST_URI} !^/.well-known/acme-challenge/
 RewriteRule ^ https://%{SERVER_NAME}%{REQUEST_URI} [L,R=301]
 # serve files that exist
 RewriteCond /var/lib/dehydrated/acme-challenges/$1 -f
 RewriteRule ^/.well-known/acme-challenge/(.*) \
             /var/lib/dehydrated/acme-challenges/$1 [L]
 # otherwise, try alternate server
 RewriteRule ^ http://{{next_acme_host}}%{REQUEST_URI} [R=302]

</VirtualHost>

<Directory /var/lib/dehydrated/acme-challenges/>
 Options FollowSymlinks
 Options -Indexes
 AllowOverride None
 Require all granted
</Directory>

MZS web site upgrade, Thurs 20 June

2019-06-13 - News - Tony Finch

On Thursday 20th June between 17:00 and 19:00, the Managed Zone Service web site will be unavailable for a few minutes while its operating system is upgraded. (This will not affect the DNS for MZS domains.)

DNS server upgrades, Tues 18 June

2019-06-12 - News - Tony Finch

On Tuesday 18th June our central DNS resolvers will be upgraded from BIND 9.12.4-P1 to BIND 9.14.2.

Read more ...

First ops page ported

2019-05-15 - Progress - Tony Finch

Yesterday I reached a milestone: I have ported the first "ops" page from the old IP Register web user interface on Jackdaw to the new one that will live on the DNS web servers. It's a trivial admin page for setting the message of the day, but it demonstrates that the infrastructure is (mostly) done.

Security checks

I have spent the last week or so trying to get from a proof of concept to something workable. Much of this work has been on the security checks. The old UI has:

  • Cookie validation (for Oracle sessions)

  • Raven authentication

  • TOTP authentication for superusers

  • Second cookie validaion for TOTP

  • CSRF checks

There was an awkward split between the Jackdaw framework and the ipreg-specific parts which meant I needed to add a second cookie when I added TOTP authentication.

In the new setup I have upgraded the cookie to modern security levels, and it handles both Oracle and TOTP session state.

    my @cookie_attr = (
            -name     => '__Host-Session',
            -path     => '/',
            -secure   => 1,
            -httponly => 1,
            -samesite => 'strict',
        );

The various "middleware" authentication components have been split out of the main HTTP request handler so that the overall flow is much easier to see.

State objects

There is some fairly tricky juggling in the old code between:

  • CGI request object

  • WebIPDB HTTP request handler object

  • IPDB database handle wrapper

  • Raw DBI handle

The CGI object is gone. The mod_perl Apache2 APIs are sufficient replacements, and the HTML generation functions are being replaced by mustache templates. (Though there is some programmatic form generation in table_ops that might be awkward!)

I have used Moo roles to mixin the authentication middleware bits to the main request handler object, which works nicely. I might do the same for the IPDB object, though that will require some refactoring of some very old skool OO perl code.

Next

The plan is to port the rest of the ops pages as directly as possible. There is going to be a lot of refactoring, but it will all be quite superficial. The overall workflow is going to remain the same, just more purple.

mobile message of the day form with error

Oracle connection timeouts

2019-05-07 - Progress - Tony Finch

Last week while I was beating mod_perl code into shape, I happily deleted a lot of database connection management code that I had inherited from Jackdaw's web server. Today I had to put it all back again.

Apache::DBI

There is a neat module called Apache::DBI which hooks mod_perl and DBI together to provide a transparent connection cache: just throw in a use statement, throw out dozens of lines of old code, and you are pretty much done.

Connection hangs

Today the clone of Jackdaw that I am testing against was not available (test run for some maintenance work tomorrow, I think) and I found that my dev web server was no longer responding. It started OK but would not answer any requests. I soon worked out that it was trying to establish a database connection and waiting at least 5 minutes (!) before giving up.

DBI(3pm) timeouts

There is a long discussion about timeouts in the DBI documentation which specifically mentions DBD::Oracle as a problem case, with some lengthy example code for implementing a timeout wrapper around DBI::connect.

This is a terrible documentation anti-pattern. Whenever I find myself giving lengthy examples of how to solve a problem I take it as a whacking great clue that the code should be fixed so the examples can be made a lot easier.

In this case, DBI should have connection timeouts as standard.

Sys::SigAction

If you read past the examples in DBI(3pm) there's a reference to a more convenient module which provides a timeout wrapper that can be used like this:

if (timeout_call($connect_timeout, sub {
    $dbh = DBI->connect(@connect_args);
    moan $DBI::errstr unless $dbh;
})) {
    moan "database connection timed out";
}

Undelete

The problem is that there isn't a convenient place to put this timeout code where it should be, so that Apache::DBI can use it transparently.

So I resurrected Jackdaw's database connection cache. But not exacly - I looked through it again and I could not see any extra timeout handling code. My guess is that hung connections can't happen if the database is on the same machine as the web server.

Reskinning IP Register

2019-05-01 - Progress - Tony Finch

At the end of the item about Jackdaw and Raven I mentioned that when the web user interface moves off Jackdaw it will get a reskin.

The existing code uses Perl CGI functions for rendering the HTML, with no styling at all. I'm replacing this with mustache templates using the www.dns.cam.ac.uk Project Light framework. So far I have got the overall navigation structure working OK, and it's time to start putting forms into the pages.

I fear this reskin is going to be disappointing, because although it's superficially quite a lot prettier, the workflow is going to be the same - for example, the various box_ops etc. links in the existing user interface become Project Light local navigation tabs in the new skin. And there are still going to be horrible Oracle errors.

BIND 9.12.4-P1 and Ed25519

2019-04-25 - News - Tony Finch

Last night, isc.org announced patch releases of BIND

I have upgraded our DNS servers to 9.12.4-P1 to address the TCP socket exhaustion vulnerability.

At the same time I have also relinked BIND with a more recent version of OpenSSL, so it is now able to validate the small number of domains that use the new Ed25519 DNSSEC algorithm.

Jackdaw and Raven

2019-04-16 - Progress - Tony Finch

I've previously written about authentication and access control in the IP Register database. The last couple of weeks I have been reimplementing some of it in a dev version of this DNS web server.

Read more ...

Bootstrapping Let's Encrypt on Debian

2019-03-15 - Progress - Tony Finch

I've done some initial work to get the Ansible playbooks for our DNS systems working with the development VM cluster on my workstation. At this point it is just for web-related experimentation, not actual DNS servers.

Of course, even a dev server needs a TLS certificate, especially because these experiments will be about authentication. Until now I have obtained certs from the UIS / Jisc / QuoVadis, but my dev server is using Let's Encrypt instead.

Chicken / egg

In order to get a certificate from Let's Encrypt using the http-01 challenge, I need a working web server. In order to start the web server with its normal config, I need a certificate. This poses a bit of a problem!

Snakeoil

My solution is to install Debian's ssl-cert package, which creates a self-signed certificate. When the web server does not yet have a certificate (if the QuoVadis cert isn't installed, or dehydrated has not been initialized), Ansible temporarily symlinks the self-signed cert for use by Apache, like this:

- name: check TLS certificate exists
  stat:
    path: /etc/apache2/keys/tls-web.crt
  register: tls_cert
- when: not tls_cert.stat.exists
  name: fake TLS certificates
  file:
    state: link
    src: /etc/ssl/{{ item.src }}
    dest: /etc/apache2/keys/{{ item.dest }}
  with_items:
    - src: certs/ssl-cert-snakeoil.pem
      dest: tls-web.crt
    - src: certs/ssl-cert-snakeoil.pem
      dest: tls-chain.crt
    - src: private/ssl-cert-snakeoil.key
      dest: tls.pem

ACME dehydrated boulders

The dehydrated and dehydrated-apache2 packages need a little configuration. I needed to add a cron job to renew the certificate, a hook script to reload apache when the cert is renewed, and tell it which domains should be in the cert. (See below for details of these bits.)

After installing the config, Ansible initializes dehydrated if necessary - the creates check stops Ansible from running dehydrated again after it has created a cert.

- name: initialize dehydrated
  command: dehydrated -c
  args:
    creates: /var/lib/dehydrated/certs/{{inventory_hostname}}/cert.pem

Having obtained a cert, the temporary symlinks get overwritten with links to the Let's Encrypt cert. This is very similar to the snakeoil links, but without the existence check.

- name: certificate links
  file:
    state: link
    src: /var/lib/dehydrated/certs/{{inventory_hostname}}/{{item.src}}
    dest: /etc/apache2/keys/{{item.dest}}
  with_items:
    - src: cert.pem
      dest: tls-web.crt
    - src: chain.pem
      dest: tls-chain.crt
    - src: privkey.pem
      dest: tls.pem
  notify:
    - restart apache

After that, Apache is working with a proper certificate!

Boring config details

The cron script chatters into syslog, but if something goes wrong it should trigger an email (tho not a very informative one).

#!/bin/bash
set -eu -o pipefail
( dehydrated --cron
  dehydrated --cleanup
) | logger --tag dehydrated --priority cron.info

The hook script only needs to handle one of the cases:

#!/bin/bash
set -eu -o pipefail
case "$1" in
(deploy_cert)
    apache2ctl configtest &&
    apache2ctl graceful
    ;;
esac

The configuration needs a couple of options added:

- copy:
    dest: /etc/dehydrated/conf.d/dns.sh
    content: |
      EMAIL="hostmaster@cam.ac.uk"
      HOOK="/etc/dehydrated/hook.sh"

The final part is to tell dehydrated the certificate's domain name:

- copy:
    content: "{{inventory_hostname}}\n"
    dest: /etc/dehydrated/domains.txt

For production, domains.txt needs to be a bit more complicated. I have a template like the one below. I have not yet deployed it; that will probably wait until the cert needs updating.

{{hostname}} {% if i_am_www %} www.dns.cam.ac.uk dns.cam.ac.uk {% endif %}

BIND 9.12.3-P4 and other patch releases

2019-02-27 - News - Tony Finch

Last week, isc.org announced patch releases of BIND

I have upgraded our DNS servers to 9.12.3-P4 to address the memory leak vulnerability.

KSK rollover project status

2019-02-07 - Progress- Future - Tony Finch

I have spent the last week working on DNSSEC key rollover automation in BIND. Or rather, I have been doing some cleanup and prep work. With reference to the work I listed in the previous article...

Done

  • Stop BIND from generating SHA-1 DS and CDS records by default, per RFC 8624

  • Teach dnssec-checkds about CDS and CDNSKEY

Started

  • Teach superglue to use CDS/CDNSKEY records, with similar logic to dnssec-checkds

The "similar logic" is implemented in dnssec-dsfromkey, so I don't actually have to write the code more than once. I hope this will also be useful for other people writing similar tools!

Some of my small cleanup patches have been merged into BIND. We are currently near the end of the 9.13 development cycle, so this work is going to remain out of tree for a while until after the 9.14 stable branch is created and the 9.15 development cycle starts.

Next

So now I need to get to grips with dnssec-coverage and dnssec-keymgr.

Simple safety interlocks

The purpose of the dnssec-checkds improvements is so that it can be used as a safety check.

During a KSK rollover, there are one or two points when the DS records in the parent need to be updated. The rollover must not continue until this update has been confirmed, or the delegation can be broken.

I am using CDS and CDNSKEY records as the signal from the key management and zone signing machinery for when DS records need to change. (There's a shell-style API in dnssec-dsfromkey -p, but that is implemented by just reading these sync records, not by looking into the guts of the key management data.) I am going to call them "sync records" so I don't have to keep writing "CDS/CDNSKEY"; "sync" is also the keyword used by dnssec-settime for controlling these records.

Key timing in BIND

The dnssec-keygen and dnssec-settime commands (which are used by dnssec-keymgr) schedule when changes to a key will happen.

There are parameters related to adding a key: when it is published in the zone, when it becomes actively used for signing, etc. And there are parameters related to removing a key: when it becomes inactive for signing, when it is deleted from the zone.

There are also timing parameters for publishing and deleting sync records. These sync times are the only timing parameters that say when we must update the delegation.

What can break?

The point of the safety interlock is to prevent any breaking key changes from being scheduled until after a delegation change has been confirmed. So what key timing events need to be forbidden from being scheduled after a sync timing event?

Events related to removing a key are particularly dangerous. There are some cases where it is OK to remove a key prematurely, if the DS record change is also about removing that key, and there is another working key and DS record throughout. But it seems simpler and safer to forbid all removal-related events from being scheduled after a sync event.

However, events related to adding a key can also lead to nonsense. If we blindly schedule creation of new keys in advance, without verifying that they are also being properly removed, then the zone can accumulate a ridiculous number of DNSKEY records. This has been observed in the wild surprisingly frequently.

A simple rule

There must be no KSK changes of any kind scheduled after the next sync event.

This rule applies regardless of the flavour of rollover (double DS, double KSK, algorithm rollover, etc.)

Applying this rule to BIND

Whereas for ZSKs, dnssec-coverage ensures rollovers are planned for some fixed period into the future, for KSKs, it must check correctness up to the next sync event, then ensure nothing will occur after that point.

In dnssec-keymgr, the logic should be:

  • If the current time is before the next sync event, ensure there is key coverage until that time and no further.

  • If the current time is after all KSK events, use dnssec-checkds to verify the delegation is in sync.

  • If dnssec-checkds reports an inconsistency and we are within some sync interval dictated by the rollover policy, do nothing while we wait for the delegation update automation to work.

  • If dnssec-checkds reports an inconsistency and the sync interval has passed, report an error because operator intervention is required to fix the failed automation.

  • If dnssec-checkds reports everything is in sync, schedule keys up to the next sync event. The timing needs to be relative to this point in time, since any delegation update delays can make it unsafe to schedule relative to the last sync event.

Caveat

At the moment I am still not familiar with the internals of dnssec-coverage and dnssec-keymgr so there's a risk that I might have to re-think these plans. But I expect this simple safety rule will be a solid anchor that can be applied to most DNSSEC key management scenarios. (However I have not thought hard enough about recovery from breakage or compromise.)

DNSSEC key rollover automation with BIND

2019-01-30 - Future - Tony Finch

I'm currently working on filling in the missing functionality in BIND that is needed for automatic KSK rollovers. (ZSK rollovers are already automated.) All these parts exist; but they have gaps and don't yet work together.

The basic setup that will be necessary on the child is:

  • Write a policy configuration for dnssec-keymgr.

  • Write a cron job to run dnssec-keymgr at a suitable interval. If the parent does not run dnssec-cds then this cron job should also run superglue or some other program to push updates to the parent.

The KSK rollover process will be driven by dnssec-keymgr, but it will not talk directly to superglue or dnssec-cds, which make the necessary changes. In fact it can't talk to dnssec-cds because that is outside the child's control.

So, as specified in RFC 7344, the child will advertise the desired state of its delegation using CDS and CDNSKEY records. These are read by dnssec-cds or superglue to update the parent. superglue will be loosely coupled, and able to work with any DNSSEC key management softare that publishes CDS records.

The state of the keys in the child is controlled by the timing parameters in the key files, which are updated by dnssec-keymgr as determined by the policy configuration. At the moment it generates keys to cover some period into the future. For KSKs, I think it will make more sense to generate keys up to the next DS change, then stop until dnssec-checkds confirms the parent has implemented the change, before continuing. This is a bit different from the ZSK coverage model, but future coverage for KSKs can't be guaranteed because coverage depends on future interactions with an external system which cannot be assumed to work as planned.

Required work

  • Teach dnssec-checkds about CDS and CDNSKEY

  • Teach dnssec-keymgr to set "sync" timers in key files, and to invoke dnssec-checkds to avoid breaking delegations.

  • Teach dnssec-coverage to agree with dnssec-keymgr about sensible key configuration.

  • Teach superglue to use CDS/CDNSKEY records, with similar logic to dnssec-checkds

  • Stop BIND from generating SHA-1 DS and CDS records by default, per draft-ietf-dnsop-algorithm-update

Release announcement: nsdiff-1.77

2019-01-29 - News - Tony Finch

I have released a new version of nsdiff.

This release adds CDS and CDNSKEY records to the list of DNSSEC-related types that are ignored, since by default nsdiff expects them to be managed by the name server, not as part of the zone file. There is now a -C option to revert to the previous behaviour.

Old recdns.csx names have been abolished

2019-01-28 - News - Tony Finch

As previously announced, the old recursive DNS server names have been removed from the DNS, so the new names are now canonical.

131.111.8.42  recdns0.csx.cam.ac.uk -> rec0.dns.cam.ac.uk
131.111.12.20 recdns1.csx.cam.ac.uk -> rec1.dns.cam.ac.uk

A digression for the historically curious: the authdns and recdns names date from May 2006, when they were introduced to prepare for separating authoritative and recursive DNS service.

Until 2006, 131.111.8.42 was known as chimaera.csx.cam.ac.uk. It had been our primary DNS server since September/October 1995. Before then, our DNS was hosted on CUS, the Central Unix Service.

And 131.111.12.20 had been known as c01.csi.cam.ac.uk (or comms01) since before my earliest records in October 1991.

Superglue with WebDriver

2019-01-25 - Progress - Tony Finch

Earlier this month I wrote notes on some initial experiments in browser automation with WebDriver. The aim is to fix my superglue DNS delegation update scripts to work with currently-supported tools.

In the end I decided to rewrite the superglue-janet script in Perl, since most of superglue is already Perl and I would like to avoid rewriting all of it. This is still work in progress; superglue is currently an unusable mess, so I don't recommend looking at it right now :-)

My WebDriver library

Rather than using an off-the-shelf library, I have a very thin layer (300 lines of code, 200 lines of docs) that wraps WebDriver HTTP+JSON calls in Perl subroutines. It's designed for script-style usage, so I can write things like this (quoted verbatim):

# Find the domain's details page.

click '#commonActionsMenuLogin_ListDomains';

fill '#MainContent_tbDomainNames' => $domain,
    '#MainContent_ShowReverseDelegatedDomains' => 'selected';

click '#MainContent_btnFilter';

This has considerably less clutter than the old PhantomJS / CasperJS code!

Asyncrony

I don't really understand the concurrency model between the WebDriver server and the activity in the browser. It appears to be remarkably similar to the way CasperJS behaved, so I guess it is related to the way JavaScript's event loop works (and I don't really understand that either).

The upshot is that in most cases I can click on a link, and the WebDriver response comes back after the new page has loaded. I can immediately interact with the new page, as in the code above.

However there are some exceptions.

On the JISC domain registry web site there are a few cases where selecting from a drop-down list triggers some JavaScript that causes a page reload. The WebDriver request returns immediately, so I have to manually poll for the page load to complete. (This also happened with CasperJS.) I don't know if there's a better way to deal with this than polling...

The WebDriver spec

I am not a fan of the WebDriver protocol specification. It is written as a description of how the code in the WebDriver server / browser behaves, written in spaghetti pseudocode.

It does not have any abstract syntax for JSON requests and responses - no JSON schema or anything like that. Instead, the details of parsing requests and constructing responses are interleaved with details of implementing the semantics of the request. It is a very unsafe style.

And why does the WebDriver spec include details of how to HTTP?

Next steps

This work is part of two ongoing projects:

  • I need to update all our domain delegations to complete the server renaming.

  • I need automated delegation updates to support automated DNSSEC key rollovers.

So I'm aiming to get superglue into a usable state, and hook it up to BIND's dnssec-keymgr.

Happenings in DNS

2019-01-18 - News - Tony Finch

A couple of items worth noting:

DNS flag day

The major DNS resolver providers have declared February 1st to be DNS Flag Day. (See also the ISC blog item on the DNS flag day.)

DNS resolvers will stop working around broken authoritative DNS servers that do not implement EDNS correctly. The effect will be that DNS resolution may fail in some cases where it used to be slow.

The flag day will take effect immediately on some large public resolvers. In Cambridge, it will take effect on our central resolvers after they are upgraded to BIND 9.14, which is the next stable branch due to be released Q1 this year.

I'm running the development branch 9.13 on my workstation, which already includes the Flag Day changes, and I haven't noticed any additional breakage - but then my personal usage is not particularly heavy nor particularly diverse.

Old DNSSEC root key revoked

Last week the old DNSSEC root key was revoked, so DNSSEC validators that implement RFC 5011 trust anchor updates should have deleted the old key (tag 19036) from their list of trusted keys.

For example, on one of my resolvers the output of rndc managed-keys now includes the following. (The tag of the old key changed from 19036 to 19164 when the revoke flag was added.)

name: .
keyid: 20326
    algorithm: RSASHA256
    flags: SEP
    next refresh: Fri, 18 Jan 2019 14:28:17 GMT
    trusted since: Tue, 11 Jul 2017 15:03:52 GMT
keyid: 19164
    algorithm: RSASHA256
    flags: REVOKE SEP
    next refresh: Fri, 18 Jan 2019 14:28:17 GMT
    remove at: Sun, 10 Feb 2019 14:20:18 GMT
    trust revoked

This is the penultimate step of the root key rollover; the final step is to delete the revoked key from the root zone.

Old recdns.csx names to be abolished

2019-01-17 - News - Tony Finch

On Monday 28th January after the 13:53 DNS update, the old recursive DNS server names will be removed from the DNS. They have been renamed like this:

131.111.8.42  recdns0.csx.cam.ac.uk -> rec0.dns.cam.ac.uk
131.111.12.20 recdns1.csx.cam.ac.uk -> rec1.dns.cam.ac.uk

Although there should not be much that depends on the old names, we are giving you a warning in case things like monitoring systems need reconfiguration.

This is part of the ongoing DNS server reshuffle project.

Preserving dhcpd leases across reinstalls

2019-01-14 - Progress - Tony Finch

(This is an addendum to December's upragde notes.)

I have upgraded the IP Register DHCP servers twice this year. In February they were upgraded from Ubuntu 12.04 LTS to 14.04 LTS, to cope with 12.04's end of life, and to merge their setup into the main ipreg git repository (which is why the target version was so old). So their setup was fairly tidy before the Debian 9 upgrade.

Statefulness

Unlike most of the IP Register systems, the dhcp servers are stateful. Their dhcpd.leases files must be preserved across reinstalls. The leases file is a database (in the form of a flat text file in ISC dhcp config file format) which closely matches the state of the network.

If it is lost, the server no longer knows about IP addresses in use by existing clients, so it can issue duplicate addresses to new clients, and hilarity will ensue!

So, just before rebuilding a server, I have to stop the dhcpd and take a copy of the leases file. And before the dhcpd is restarted, I have to copy the leases file back into place.

This isn't something that happens very often, so I have not automated it yet.

Bad solutions

In February, I hacked around with the Ansible playbook to ensure the dhcpd was not started before I copied the leases file into place. This is an appallingly error-prone approach.

Yesterday, I turned that basic idea into an Ansible variable that controls whether the dhcpd is enabled. This avoids mistakes when fiddling with the playbook, but it is easily forgettable.

Better solution

This morning I realised a much neater way is to disable the entire dhcpd role if the leases file doesn't exist. This prevents the role from starting the dhcpd on a newly reinstalled server before the old leases file is in place. After the server is up, the check is a no-op.

This is a lot less error-prone. The only requirement for the admin is knowledge about the importance of preserving dhcpd.leases...

Further improvements

The other pitfall in my setup is that monit will restart dhcpd if it is missing, so it isn't easy to properly stop it.

My dhcpd_enabled Ansible variable takes care of this, but I think it would be better to make a special shutdown playbook, which can also take a copy of the leases file.

Review of 2018

2019-01-11 - Progress - Tony Finch

Some notes looking back on what happened last year...

Stats

1457 commits

4035 IP Register / MZS support messages

5734 cronspam messages

Projects

  • New DNS web site (Feb, Mar, Jun, Sep, Oct, Nov)

    This was a rather long struggle with a lot of false starts, e.g. February / March finding that Perl Template Toolkit was not very satisfactory; realising after June that the server naming and vhost setup was unhelpful.

    End result is quite pleasing

  • IP Register API extensions (Aug)

    API access to xlist_ops

    MWS3 API generalized for other UIS services

    Now in active use by MWS, Drupal Falcon, and to a lesser extent by the HPC OpenStack cluster and the new web Traffic Managers. When old Falcon is wound down we will be able to eliminate Gossamer!

  • Server upgrade / rename (Dec)

    Lots of Ansible review / cleanup. Satisfying.

Future of IP Register

  • Prototype setup for PostgreSQL replication using repmgr (Jan)

  • Prototype infrastructure for JSON-RPC API in Typescript (April, May)

Maintenance

  • DHCP servers upgraded to match rest of IP Register servers (Feb)

  • DNS servers upgraded to BIND 9.12, with some serve-stale related problems. (March)

    Local patches all now incorporated upstream :-)

  • git.uis continues, hopefully not for much longer

IETF

  • Took over as the main author of draft-ietf-dnsop-aname. This work is ongoing.

  • Received thanks in RFC 8198 (DNSSEC negative answer synthesis), RFC 8324 (DNS privacy), RFC 8482 (minimal ANY responses), RFC 8484 (DNS-over-HTTPS).

Open Source

  • Ongoing maintenance of regpg. This has stabilized and reached a comfortable feature plateau.

  • Created doh101, a DNS-over-TLS and DNS-over-HTTPS proxy.

    Initial prototype in March at the IETF hackathon.

    Revamped in August to match final IETF draft.

    Deployed in production in September.

  • Fifteen patches committed to BIND9.

    CVE-2018-5737; extensive debugging work on the serve-stale feature.

    Thanked by ISC.org in their annual review.

  • Significant clean-up and enhancement of my qp trie data structure, used by Knot DNS. This enabled much smaller memory usage during incremental zone updates.

    https://gitlab.labs.nic.cz/knot/knot-dns/issues/591

What's next?

  • Update superglue delegation maintenance script to match the current state of the world. Hook it in to dnssec-keymgr and get automatic rollovers working.

  • Rewrite draft-ietf-dnsop-aname again, in time for IETF104 in March.

  • Server renumbering, and xfer/auth server split, and anycast. When?

  • Port existing ipreg web interface off Jackdaw.

  • Port database from Oracle on Jackdaw to PostgreSQL on my servers.

  • Develop new API / UI.

  • Re-do provisioning system for streaming replication from database to DNS.

  • Move MZS into IP Register database.

Brexit and .eu domain names

2019-01-09 - News - Tony Finch

This message is for the attention of anyone who has used a third-party DNS provider to register a .eu domain name, or a domain name in another European country-class two-letter top-level domain.

Last year, EURID (the registry for .eu domain names) sent out a notice about the effect of Brexit on .eu domain names registered in the UK. The summary is that .eu domains may only be registered by organizations or individuals in the EU, and unless any special arrangements are made (which has not happened) this will not include the UK after Brexit, so UK .eu domain registrations will be cancelled.

https://eurid.eu/en/register-a-eu-domain/brexit-notice/

Other European country-class TLDs may have similar restrictions (for instance, Italy's .it).

Sadly we cannot expect our government to behave sensibly, so you have to make your own arrangements for continuity of your .eu domain.

The best option is for you to find one of your collaborators in another EU country who is able to take over ownership of the domain.

We have contacted the owners of .eu domains registered through our Managed Zone Service. Those who registered a .eu domain elsewhere should contact their DNS provider for detailed support.

Edited to add: Thanks to Elliot Page for pointing out that this problem may apply to other TLDs as well as .eu

Edited to add (2019-07-22): There has been an update to the .eu eligibility criteria.

Notes on web browser automation

2019-01-08 - Progress - Tony Finch

I spent a few hours on Friday looking in to web browser automation. Here are some notes on what I learned.

Context

I have some old code called superglue-janet which drives the JISC / JANET / UKERNA domain registry web site. The web site has some dynamic JavaScript behaviour, and it looks to me like the browser front-end is relatively tightly coupled to the server back-end in a way that I expected would make reverse engineering unwise. So I decided to drive the web site using browser automation tools. My code is written in JavaScript, using PhantomJS (a headless browser based on QtWebKit) and CasperJS (convenience utilities for PhantomJS).

Rewrite needed

PhantomJS is now deprecated, so the code needs a re-work. I also want to use TypeScript instead, where I would previously have used JavaScript.

Current landscape

The modern way to do things is to use a full-fat browser in headless mode and control it using the standard WebDriver protocol.

For Firefox this means using the geckodriver proxy which is a Rust program that converts the WebDriver JSON-over-HTTP protocol to Firefox's native Marionette protocol.

[Aside: Marionette is a full-duplex protocol that exchanges JSON messages prefixed by a message length. It fits into a similar design space to Microsoft's Language Server Protocol, but LSP uses somewhat more elaborate HTTP-style framing and JSON-RPC message format. It's kind of a pity that Marionette doesn't use JSON-RPC.]

The WebDriver protocol came out of the Selenium browser automation project where earlier (incompatible) versions were known as the JSON Wire Protocol.

What I tried out

I thought it would make sense to write the WebDriver client in TypeScript. The options seemed to be:

  • selenium-webdriver, which has Selenium's bindings for node.js. This involves a second proxy written in Java which goes between node and geckodriver. I did not like the idea of a huge wobbly pile of proxies.

  • webdriver.io aka wdio, a native node.js WebDriver client. I chose to try this, and got it going fairly rapidly.

What didn't work

I had enormous difficulty getting anything to work with wdio and TypeScript. It turns out that the wdio typing was only committed a couple of days before my attempt, so I had accidentally found myself on the bleeding edge. I can't tell whether my failure was due to lack of documentation or brokenness in the type declarations...

What next

I need to find a better WebDriver client library. The wdio framework is very geared towards testing rather than general automation (see the wdio "getting started" guide for example) so if I use it I'll be talking to its guts rather than the usual public interface. And it won't be very stable.

I could write it in Perl but that wouldn't really help to reduce the amount of untyped code I'm writing :-)

The missing checklist

2019-01-07 - Progress - Tony Finch

Before I rename/upgrade any more servers, this is the checklist I should have written last month...

For rename

  • Ensure both new and old names are in the DNS

  • Rename the host in ipreg/ansible/bin/make-inventory and run the script

  • Run ipreg/ansible/bin/ssh-knowhosts to update ~/.ssh/known_hosts

  • Rename host_vars/$SERVER and adjust the contents to match a previously renamed server (mutatis mutandis)

  • For recursive servers, rename the host in ipreg/ansible/roles/keepalived/files/vrrp-script and ipreg/ansible/inventory/dynamic

For both

  • Ask infra-sas@uis to do the root privilege parts of the netboot configuration - rename and/or new OS version as required

For upgrade

  • For DHCP servers, save a copy of the leases file by running:

    ansible-playbook dhcpd-shutdown-save-leases.yml \
        --limit $SERVER
    
  • Run the preseed.yml playbook to update the unprivileged parts of the netboot config

  • Reboot the server, tell it to netboot and do a preseed install

  • Wait for that to complete

  • For DHCP servers, copy the saved leases file to the server.

  • Then run:

    ANSIBLE_SSH_ARGS=-4 ANSIBLE_HOST_KEY_CHECKING=False \
        ansible-playbook -e all=1 --limit $SERVER main.yml
    

For rename

  • Update the rest of the cluster's view of the name

    git push
    ansible-playbook --limit new main.yml
    

Notes on recent DNS server upgrades

2019-01-02 - Progress - Tony Finch

I'm now most of the way through the server upgrade part of the rename / renumbering project. This includes moving the servers from Ubuntu 14.04 "Trusty" to Debian 9 "Stretch", and renaming them according to the new plan.

Done:

  • Live and test web servers, which were always Stretch, so they served as a first pass at getting the shared parts of the Ansible playbooks working

  • Live and test primary DNS servers

  • Live x 2 and test x 2 authoritative DNS servers

  • One recursive server

To do:

  • Three other recursive servers

  • Live x 2 and test x 1 DHCP servers

Here are a few notes on how the project has gone so far.

Read more ...

More DNS server rebuilds

2018-12-19 - News - Tony Finch

As announced last week the remaining authoritative DNS servers will be upgraded this afternoon. There will be an outage of authdns1.csx.cam.ac.uk for several minutes, during which our other authoritative servers will be available to provide DNS service.

The primary server ipreg.csi.cam.ac.uk will also be rebuilt, which will involve reconstructing all our DNS zone files from scratch. (This is less scary than it sounds, because the software we use for the hourly DNS updates makes it easy to verify that DNS zones are the same.)

These upgrades will cause secondary servers to perform full zone transfers of our zones, since the incremental transfer journals will be lost.

Authoritative DNS server rebuilds

2018-12-12 - News - Tony Finch

Tomorrow (13 December) we will reinstall the authoritative DNS server authdns0.csx.cam.ac.uk and upgrade its operating system from Ubuntu 14.04 "Trusty" to Debian 9 "Stretch".

During the upgrade our other authoritative servers will be available to provide DNS service. After the upgrade, secondary servers are likely to perform full zone transfers from authdns0 since it will have lost its incremental zone transfer journal.

Next week, we will do the same for authdns1.csx.cam.ac.uk and for ipreg.csi.cam.ac.uk (the primary server).

During these upgrades the servers will have their hostnames changed to auth0.dns.cam.ac.uk, auth1.dns.cam.ac.uk, and pri0.dns.cam.ac.uk, at least from the sysadmin point of view. There are lots of references to the old names which will continue to work until all the NS and SOA DNS records have been updated. This is an early step in the DNS server renaming / renumbering project.

IPv6 prefixes and LAN names

2018-12-06 - Future - Tony Finch

I have added a note to the ipreg schema wishlist that it should be possible for COs to change LAN names associated with IPv6 prefixes.

Postcronspam

2018-11-30 - Progress - Tony Finch

This is a postmortem of an incident that caused a large amount of cronspam, but not an outage. However, the incident exposed a lot of latent problems that need addressing.

Description of the incident

I arrived at work late on Tuesday morning to find that the DHCP servers were sending cronspam every minute from monit. monit thought dhcpd was not working, although it was.

A few minutes before I arrived, a colleague had run our Ansible playbook to update the DHCP server configuration. This was the trigger for the cronspam.

Cause of the cronspam

We are using monit as a basic daemon supervisor for our critical services. The monit configuration doesn't have an "include" facility (or at least it didn't when we originally set it up) so we are using Ansible's "assemble" feature to concatenate configuration file fragments into a complete monit config.

The problem was that our Ansible setup didn't have any explicit dependencies between installing monit config fragments and reassembling the complete config and restarting monit.

Running the complete playbook caused the monit config to be reassembled, so an incorrect but previously inactive config fragment was activated, causing the cronspam.

Origin of the problem

How was there an inactive monit config fragment on the DHCP servers?

The DHCP servers had an OS upgrade and reinstall in February. This was when the spammy broken monit config fragment was written.

What were the mistakes at that time?

  • The config fragment was not properly tested. A good monit config is normally silent, but in this case we didn't check that it sent cronspam when things are broken, whoch would have revealed that the config fragment was not actually installed properly.

  • The Ansible playbook was not verified to be properly idempotent. It should be possible to wipe a machine and reinstall it with one run of Ansible, and a second run should be all green. We didn't check the second run properly. Check mode isn't enough to verify idempotency of "assemble".

  • During routine config changes in the nine months since the servers were reinstalled, the usual practice was to run the DHCP-specific subset of the Ansible playbook (because that is much faster) so the bug was not revealed.

Deeper issues

There was a lot more anxiety than there should have been when debugging this problem, because at the time the Ansible playbooks were going through a lot of churn for upgrading and reinstalling other servers, and it wasn't clear whether or not this had caused some unexpected change.

This gets close to the heart of the matter:

  • It should always be safe to check out and run the Ansible playbook against the production systems, and expect that nothing will change.

There are other issues related to being a (nearly) solo developer, which makes it easier to get into bad habits. The DHCP server config has the most contributions from colleagues at the moment, so it is not really surprising that this is where we find out the consequences of the bad habits of soloists.

Resolutions

It turns out that monit and dhcpd do not really get along. The monit UDP health checker doesn't work with DHCP (which was the cause of the cronspam) and monit's process checker gets upset by dhcpd being restarted when it needs to be reconfigured.

The monit DHCP UDP checker has been disabled; the process checker needs review to see if it can be useful without sending cronspam on every reconfig.

There should be routine testing to ensure the Ansible playbooks committed to the git server run green, at least in check mode. Unfortunately it's risky to automate this because it requires root access to all the servers; at the moment root access is restricted to admins in person.

We should be in the habit of running the complete playbook on all the servers (e.g. before pushing to the git server), to detect any differences between check mode and normal (active) mode. This is necessary for Ansible tasks that are skipped in check mode.

Future work

This incident also highlights longstanding problems with our low bus protection factor and lack of automated testing. The resolutions listed above will make some small steps to improve these weaknesses.

New servers in Maths, and other sample.named.conf changes

2018-11-26 - News - Tony Finch

The Faculty of Mathematics have a revamped DNS setup, with new authoritative DNS servers, authdns0.maths (131.111.20.101) and authdns1.maths (131.111.20.202).

I have updated sample.named.conf and catz.arpa.cam.ac.uk to refer to these new servers for the 11 Maths zones. Also, I have belatedly added the Computer Lab's new reverse DNS range for 2a05:b400:110::/48.

The stealth secondary server documentation now includes separate, simpler configuration files for forwarding BIND resolvers, and for stealth secondaries using catz.arpa.cam.ac.uk. (As far as I can tell I am still the only one using catalog zones at the moment! They are pretty neat, though.)

New DNS web site

2018-11-20 - News - Tony Finch

There is a new web site for DNS in Cambridge at https://www.dns.cam.ac.uk/

The new site is mostly the old (sometimes very old) documentation that was hosted under https://jackdaw.cam.ac.uk/ipreg/. It has been reorganized and reformatted to make it easier to navigate; for example some pages have been rescued from the obscurity of the news archives. There are a few new pages that fill in some of the gaps.

The old pages (apart from the IP Register database interface) will shortly be replaced by redirects to their new homes on the new site.

News feeds

Our DNS news mailing list has been renamed to uis-dns-announce; those who were subscribed to the old cs-nameservers-announce list have been added to the new list. This mailing list is for items of interest to those running DNS servers on the CUDN, but which aren't of broad enough relevance to bother the whole of ucam-itsupport.

There are now Atom feeds for DNS news available from https://www.dns.cam.ac.uk/news/.

This news item is also posted at https://www.dns.cam.ac.uk/news/2018-11-20-web-site.html

Infrastructure

The new site is part of the project to move the IP Register database off Jackdaw. The plan is:

  • New web server; evict documentation. (done)

  • Replcate IP Register web user interface on new server. (This work will mostly be about porting Jackdaw's bespoke "WebDBI" mod_perl / Oracle application framework.)

  • Move the IP Register database off Jackdaw onto a new PostgreSQL database, without altering the external appearance. (This will involve porting the schema and stored procedures, and writing a test suite.)

After that point we should have more tractable infrastructure, making it easier to provide better user interface and APIs.

The new site is written in Markdown. The Project Light templates use Mustache, because it is programming-language-agnostic, so it will work with the existing mod_perl scripts, and with TypeScript in the future.

DNS-OARC and RIPE

2018-10-23 - Progress - Tony Finch

Last week I visited Amsterdam for a bunch of conferences. The 13th and 14th was the joint DNS-OARC and CENTR workshop, and 15th - 19th was the RIPE77 meeting.

I have a number of long-term projects which can have much greater success within the University and impact outside the University by collaborating with people from other organizations in person. Last week was a great example of that, with significant progress on CDS (which I did not anticipate!), ANAME, and DNS privacy, which I will unpack below.

Read more ...

DNSSEC root key rollover this Thursday

2018-10-11 - News - Tony Finch

The rollover has occurred, and everything is OK at least as well as we can tell after one hour.

Before: http://dnsviz.net/d/root/W79zYQ/dnssec/ After: http://dnsviz.net/d/root/W790GQ/dnssec/

There's a lot of measurement happening, e.g. graphs of the view from the RIPE Atlas distributed Internet measurement system at: https://nlnetlabs.nl/

I have gently prodded our resolvers with rndc flushname . so they start using the 2017 key immediately, rather than waiting for the TTL to expire, since I am travelling tomorrow. I expect there will be a fair amount of dicsussion about the rollover at the DNS-OARC meeting this weekend...

DNS-over-TLS snapshot

2018-10-10 - Progress - Tony Finch

Some quick stats on how much the new DNS-over-TLS service is being used:

At the moment (Wednesday mid-afternoon) we have about

  • 29,000 - 31,000 devices on the wireless network

  • 3900 qps total on both recursive servers

  • about 15 concurrent DoT clients (s.d. 4)

  • about 7qps DoT (s.d. 5qps)

  • 5s TCP idle timeout

  • 6.3s mean DoT connection time (s.d. 4s - most connections are just over 5s, they occasionally last as long as 30s; mean and s.d. are not a great model for this distribution)

  • DoT connections very unbalanced, 10x fewer on 131.111.8.42 than on 131.111.12.20

The rule of thumb that number of users is about 10x qps suggests that we have about 70 Android Pie users, i.e. about 0.2% of our userbase.

DNSSEC root key rollover this Thursday

2018-10-08 - News - Tony Finch

This Thursday at 16:00 UTC (17:00 local time), the 2010 DNSSEC root key (tag 19036) will stop being used for signing, leaving only the 2017 root key (tag 20326). The root key TTL is 2 days so the change might not be visible until the weekend.

If you run a DNSSEC validating resolver, you should double check that it trusts the 2017 root key. ICANN have some instructions at the link below; if in doubt you can ask ip-register at uis.cam.ac.uk for advice.

ICANN's DNSSEC trust anchor telemetry data does not indicate any problems for us; however the awkward cases are likely to be older validators that predate RFC 8145.

I am away for the DNS-OARC and RIPE meetings starting on Friday, but I will be keeping an eye on email. This ought to be a non-event but there hasn't been a DNSSEC root key rollover before so there's a chance that lurking horrors will be uncovered.

BIND 9.12.2-P2 and other patch releases

2018-09-20 - News - Tony Finch

Yesterday, isc.org announced patch releases of BIND

I have upgraded our DNS servers to 9.12.2-P2 mainly to address the referral interoperability problem, though we have not received any reports that this was causing noticable difficulties.

There is a security issue related to Kerberos authenticated DNS updates; I'll be interested to hear if anyone in the University is using this feature!

Those interested in DNSSEC may have spotted the inline-signing bug that is fixed by these patches. We do not use inline-signing but instead use nsdiff to apply changes to signed zones, and I believe this is also true for the signed zones run by Maths and the Computer Lab.

DNS-over-TLS and DNS-over-HTTPS

2018-09-05 - News - Tony Finch

The University's central recursive DNS servers now support encrypted queries. This is part of widespread efforts to improve DNS privacy. You can make DNS queries using:

  • Traditional unencrypted DNS using UDP or TCP on port 53 ("Do53")

  • DNS-over-TLS on port 853 - RFC 7858

  • DNS-over-HTTPS on port 443 - RFC 8484

Amongst other software, Android 9 "Pie" uses DoT when possible and you can configure Firefox to use DoH.

There is more detailed information about Cambridge's DNS-over-TLS and DNS-over-HTTPS setup on a separate page.

Renaming and renumbering the DNS servers

2018-09-04 - News - Tony Finch

There is now a draft / outline plan for renaming and renumbering the central DNS servers. This is mainly driven by the need to abolish the old Computing Service domains and by the new IPv6 prefix, among other things. See the reshuffle notes for more details.

Non-interactive access to the xlist_ops page

2018-08-13 - News - Tony Finch

Automated access to the IP Register database has so far been limited to the list_ops page. In order to allow automated registration of systems with IPv6 addresses, it is now possible to use long-term downloaded cookies for the xlist_ops page as well.

BIND 9.12.2 and serve-stale

2018-08-03 - News - Tony Finch

Earlier this year, we had an abortive attempt to turn on BIND 9.12's new serve-stale feature. This helps to make the DNS more resilient when there are local network problems or when DNS servers out on the Internet are temproarily unreachable. After many trials and tribulations we have at last successfully enabled serve-stale.

Popular websites tend to have very short DNS TTLs, which means the DNS stops working quite soon when there are network problems. As a result, network problems look more like DNS problems, so they get reported to the wrong people. We hope that serve-stale will reduce this kind of misattribution.

Read more ...

New IPv6 prefix and reverse zone

2018-06-21 - News - Tony Finch

Our new IPv6 prefix is 2a05:b400::/32

As part of our planning for more eagerly rolling out IPv6, we concluded that our existing allocation from JISC (2001:630:210::/44) would not be large enough. There are a number of issues:

  • A typical allocation to a department might be a /56, allowing for 256 subnets within the department - the next smaller allocation of /60 is too small to allow for future growth. We only had space for 2048 x/56 allocations, or many fewer if we needed to make any /52 allocations for large institutions.

  • There is nowhere near enough room for ISP-style end-user allocations, such as a /64 per college bedroom or a /64 per device on eduroam.

As a result, we have asked RIPE NCC (the European regional IP address registry) to become an LIR (local internet registry) in our own right. This entitles us to get our own provider-independent ISP-scale IPv6 allocations, amongst other things.

We have now been allocated 2a05:b400::/32 and we will start planning to roll out this new address range and deprecate the old one.

We do not currently have any detailed plans for this process; we will make further announcements when we have more news to share. Any institutions that are planning to request IPv6 allocations might want to wait until the new prefix is available, or talk to networks@uis.cam.ac.uk if you have questions.

The first bit of technical setup for the new address space is to create the reverse DNS zone, 0.0.4.b.5.0.a.2.ip6.arpa. This is now present and working on our DNS servers, though it does not yet contain anything interesting! We have updated the sample stealth secondary nameserver configuration to include this new zone. If you are using the catalog zone configuration your nameserver will already have the new zone.

Edited to add: Those interested in DNSSEC might like to know that this new reverse DNS zone is signed with ECDSA P256 SHA256, whereas our other zones are signed with RSA SHA1. As part of our background project to improve DNSSEC key management, we are going to migrate our other zones to ECDSA as well, which will reduce the size of our zones and provide some improvement in cryptographic security.

DNSSEC validation and the root key rollover

2018-06-14 - News - Tony Finch

Those running DNSSEC validating resolvers should be aware that ICANN is preparing to replace the root key later this year, after last year's planned rollover was delayed.

Some of you need to take action to ensure your validating resolvers are properly configured.

There is more information at https://www.icann.org/resources/pages/ksk-rollover

ICANN have started publishing IP addresses of resolvers which are providing RFC 8145 trust anchor telemetry information that indicates they do not yet trust the new KSK. The announcement is at https://mm.icann.org/pipermail/ksk-rollover/2018-June/000418.html

IP addresses belonging to our central DNS resolvers appear on this list: 2001:630:212:8::d:2 and 2001:630:212:12::d:3

ICANN's data says that they are getting inconsistent trust anchor telemetry from our servers. Our resolvers trust both the old and new keys, so their TAT signals are OK; however our resolvers are also relaying TAT signals from other validating resolvers on the CUDN that only trust the old key.

I am going to run some packet captures on our resolvers to see if I can track down where the problem trust anchor telemetry signals are coming from, so that I can help you to fix your resolvers before the rollover.

External references to IP addresses

2018-06-13 - News - Tony Finch

After some experience with the the relaxed rules for references to off-site servers we have changed our process slightly. Instead of putting the IP addresses in the ucam.biz zone, we are going to enter them into the IP Register database, so that these non-CUDN IP addresses appear directly in the cam.ac.uk zone.

There are a few reasons for this:

  • Both the ucam.biz and IP Register setups are a bit fiddly, but the database is more easily scripted;

  • It reduces the need for us to set up separate HTTP redirections on the web traffic managers;

  • It reduces problems with ACME TLS certificate authorization at off-site web hosting providers;

  • It is closer to what we have in mind for the future.

The new setup registers off-site IP addresses in an OFF-SITE mzone, attached to an off-site vbox. The addresses are associated with web site hostnames using aname objects. This slightly round-about arrangement allows for IP addresses that are used by multiple web sites.

Long-form domain aliases

2018-05-31 - News - Tony Finch

Our documentation on domain names now includes a page on long-form aliases for top-level domains under cam.ac.uk.

Web servers on bare domains

2018-05-30 - News - Tony Finch

The DNS does not allow CNAME records to exist at the same name as other records. This restriction causes some friction for bare domains which you want to use for both mail and web. The IP Register database does not make it particularly easy to work around this restriction in the DNS, but we now have some tips for setting web sites on bare domain names

BIND security release

2018-05-21 - News - Tony Finch

On Friday, ISC.org released a security patch version of BIND 9.12.

The serve-stale vulnerability (CVE-2018-5737) is the one that we encountered on our live servers on the 27th March.

There are still some minor problems with serve-stale which will be addressed by the 9.12.2 release, so I plan to enable it after the next release.

A note on prepared transactions

2018-04-24 - Future - Tony Finch

Some further refinements of the API behind shopping-cart style prepared transactions:

On the server side, the prepared transaction is a JSON-RPC request blob which can be updated with HTTP PUT or PATCH. Ideally the server should be able to verify that the result of the PATCH is a valid JSON-RPC blob so that it doesn't later try to perform an invalid request. I am planning to do API validity checks using JSON schema.

This design allows the prepared transaction storage to be just a simple JSON blob store, ignorant of what the blob is for except that it has to match a given schema. (I'm not super keen on nanoservices so I'll just use a table in the ipreg database to store it, but in principle there can be some nice decoupling here.)

It also suggests a more principled API design: An immediate transaction (typically requested by an API client) might look like the following (based on JSON-RPC version 1.0 system.multicall syntax):

    { jsonrpc: "2.0", id: 0,
      method: "rpc.transaction",
      params: [ { jsonrpc: "2.0", id: 1, method: ... },
                { jsonrpc: "2.0", id: 2, method: ... }, ... ] }

When a prepared transaction is requested (typically by the browser UI) it will look like:

    { jsonrpc: "2.0", id: 0,
      method: "rpc.transaction",
      params: { prepared: "#" } }

The "#" is a relative URI referring to the blob stored on the JSON-RPC endpoint (managed by the HTTP methods other than POST) - but it could in principle be any URI. (Tho this needs some thinking about SSRF security!) And I haven't yet decided if I should allow an arbitrary JSON pointer in the fragment identifier :-)

If we bring back rpc.multicall (JSON-RPC changed the reserved prefix from system. to rpc.) we gain support for prepared non-transactional batches. The native batch request format becomes a special case abbreviation of an in-line rpc.multicall request.

DNS server QA traffic

2018-03-28 - Future - Tony Finch

Yesterday I enabled serve-stale on our recursive DNS servers, and after a few hours one of them crashed messily. The automatic failover setup handled the crash reasonably well, and I disabled serve-stale to avoid any more crashes.

How did this crash slip through our QA processes?

Test server

My test server is the recursive resolver for my workstations, and the primary master for my personal zones. It runs a recent development snapshot of BIND. I use it to try out new features, often months before they are included in a release, and I help to shake out the bugs.

In this case I was relatively late enabling serve-stale so I was only running it for five weeks before enabling it in production.

It's hard to tell whether a longer test at this stage would have exposed the bug, because there are relatively few junk queries on my test server.

Pre-heat

Usually when I roll out a new version of BIND, I will pre-heat the cache of an upgraded standby server before bringing it into production. This involves making about a million queries against the server based on a cache dump from a live server. This also serves as a basic smoke test that the upgrade is OK.

I didn't do a pre-heat before enabling serve-stale because it was just a config change that can be done without affecting service.

But it isn't clear that a pre-heat would have exposed this bug because the crash required a particular pattern of failing queries, and the cache dump did not contain the exact problem query (though it does contain some closely related ones).

Possible improvements?

An alternative might be to use live traffic as test data, instead of a static dump. A bit of code could read a dnstap feed on a live server, and replay the queries against another server. There are two useful modes:

  • test traffic: replay incoming (recursive client-facing) queries; this reproduces the current live full query load on another server for testing, in a way that is likely to have reproduced yesterday's crash.

  • continuous warming: replay outgoing (iterative Internet-facing) queries; these are queries used to refill the cache, so they are relatively low volume, and suitable for keeping a standby server's cache populated.

There are a few cases where researchers have expressed interest in DNS query data, of either of the above types. In order to satisfy them we would need to be able to split a full dnatap feed so that recipients only get the data they want.

This live DNS replay idea needs a similar dnstap splitter.

More upgrades

2018-03-27 - News - Tony Finch

Edited to add:

A few hours after the item below, we disabled the new serve-stale feature following problems on one of our recursive DNS servers. We are working with ISC.org to get serve-stale working better.

Original item follows:

The DNS servers are now running BIND 9.12.1. This version fixes an interoperability regression that affected resolution of bad domains with a forbidden CNAME at the zone apex.

We have also enabled the new serve-stale feature, so that when a remote DNS server is not available, our resolvers will return old answers instead of a failure. The max-stale-ttl is set to one hour, which should be long enough to cover short network problems, but not too long to make malicious domains hang around long after they are taken down.

In other news, the DNS rebuild scripts (that run at 53 minutes past each hour) have been amended to handle power outages and server maintenance more gracefully. This should avoid most of the cases where the DNS build has stopped running due to excessive caution.

IPv6 DAD-die issues

2018-03-26 - Progress - Tony Finch

Here's a somewhat obscure network debugging tale...

Read more ...

Transactions and JSON-RPC

2018-03-02 - Future - Tony Finch

The /update API endpoint that I outlined turns out to be basically JSON-RPC 2.0, so it seems to be worth making the new IP Register API follow that spec exactly.

However, there are a couple of difficulties wrt transactions.

The current not-an-API list_ops page runs each requested action in a separate transaction. It should be possible to make similar multi-transation batch requests with the new API, but my previous API outline did not support this.

A JSON-RPC batch request is a JSON array of request objects, i.e. the same syntax as I previously described for /update transactions, except that JSON-RPC batches are not transactional. This is good for preserving list_ops functionality but it loses one of the key points of the new API.

There is a simple way to fix this problem, based on a fairly well-known idea. XML-RPC doesn't have batch requests like JSON-RPC, but they were retro-fitted by defining a system.multicall method which takes an array of requests and returns an array of responses.

We can define transactional JSON-RPC requests in the same style, like this:

    { "jsonrpc": "2.0",
      "id": 0,
      "method": "transaction",
      "params": [ {
          "jsonrpc": "2.0",
          "id": 1,
          "method": "foo",
          "params": { ... }
        }, {
          "jsonrpc": "2.0",
          "id": 2,
          "method": "bar",
          "params": { ... }
        } ]
    }

If the transaction succeeds, the outer response contains a "result" array of successful response objects, exactly one for each member of the request params array, in any order.

If the transaction fails, the outer response contains an "error" object, which has "code" and "message" members indicating a transaction failure, and an "error" member which is an array of response objects. This will contain at least one failure response; it may contain success responses (for actions which were rolled back); some responses may be missing.

Edited to add: I've described some more refinements to this idea

Upgraded to BIND 9.12.0

2018-02-20 - News - Tony Finch

The DNS servers are now running BIND 9.12.0. This version includes official versions of all the patches we needed for production, so we can now run servers built from unpatched upstream source.

First, a really nice DNSSEC-related performance enhancement is RFC 8198 negative answer synthesis: BIND can use NSEC records to generate negative responses, rather than re-querying authoritative servers. Our current configuration includes a lot of verbiage to suppress junk queries, all of which can be removed because of this new feature.

Second, a nice robustness improvement: when upstream authoritative DNS servers become unreachable, BIND will serve stale records from its cache after their time-to-live has expired. This should improve your ability to reach off-site servers when there are partial connectivity problems, such as DDoS attacks against their DNS servers.

Third, an operational simplifier: by default BIND will limit journal files to twice the zone file size, rather than letting them grow without bound. This is a patch I submitted to ISC.org about three years ago, so it has taken a very long time to get included in a release! This feature means I no longer need to run a patched BIND on our servers.

Fourth, a DNSSEC automation tool, dnssec-cds. (I mentioned this in a message I sent to this list back in October.) This is I think my largest single contribution to BIND, and (in contrast to the previous patch) it was one of the fastest to get committed! There's still some more work needed before we can put it into production, but we're a lot closer.

There are numerous other improvements, but those are the ones I am particularly pleased by. Now, what needs doing next ...

Deprocrastinating

2018-02-16 - Progress - Tony Finch

I'm currently getting several important/urgent jobs out of the way so that I can concentrate on the IP Register database project.

Read more ...

User interface sketch

2018-02-12 - Future - Tony Finch

The current IP Register user interface closely follows the database schema: you choose an object type (i.e. a table) and then you can perform whatever search/create/update/delete operations you want. This is annoying when I am looking for an object and I don't know its type, so I often end up grepping the DNS or the textual database dumps instead.

I want the new user interface to be search-oriented. The best existing example within the UIS is Lookup. The home page is mostly a search box, which takes you to a search results page, which in turn has links to per-object pages, which in turn are thoroughly hyperlinked.

Read more ...

ANAME vs aname

2018-02-01 - Future - Tony Finch

The IETF dnsop working group are currently discussing a draft specification for an ANAME RR type. The basic idea is that an ANAME is like a CNAME, except it only works for A and AAAA IP address queries, and it can coexist with other records such as SOA (at a zone apex) or MX.

I'm following the ANAME work with great interest because it will make certain configuration problems much simpler for us. I have made some extensive ANAME review comments.

An ANAME is rather different from what the IP Register database calls an aname object. An aname is a name for a set of existing IP addresses, which can be an arbitrary subset of the combined addresses of multiple boxes or vboxes, whereas an ANAME copies all the addresses from exactly one target name.

There is more about the general problem of aliases in the IP Register database in one of the items I posted in December. I am still unsure how the new aliasing model might work; perhaps it will become more clear when I have a better idea about how the existing aname implementation and its limitations.

Support for Ed25519 SSHFP records

2018-01-23 - News - Tony Finch

The IP Register database now allows SSHFP records with algorithm 4 (Ed25519) See our previous announcement for details about SSHFP records.

An interesting bug in BIND

2018-01-12 - Progress - Tony Finch

(This item isn't really related to progress towards a bright shiny future, but since I'm blogging here I might as well include other work-related articles.)

This week I have been helping Mark Andrews and Evan Hunt to track down a bug in BIND9. The problem manifested as named occasionally failing to re-sign a DNSSEC zone; the underlying cause was access to uninitialized memory.

It was difficult to pin down, partly because there is naturally a lot of nondeterminism in uninitialized memory bugs, but there is also a lot of nondeterminism in the DNSSEC signing process, and it is time-dependent so it is hard to re-run a failure case, and normally the DNSSEC signing process is very slow - three weeks to process a zone, by default.

Timeline

  • Oct 9 - latent bug exposed

  • Nov 12 - first signing failure

    I rebuild and restart my test DNS server quite frequently, and the bug is quite rare, which explains why it took so long to appear.

  • Nov 18 - Dec 6 - Mark fixes several signing-related bugs

  • Dec 28 - another signing failure

  • Jan 2 - I try adding some debugging diagnostics, without success

  • Jan 9 - more signing failures

  • Jan 10 - I make the bug easier to reproduce

    Mark and Evan identify a likely cause

  • Jan 11 - I confirm the cause and fix

The debugging process

The incremental re-signing code in named is tied into BIND's core rbtdb data structure (the red-black tree database). This is tricky code that I don't understand, so I mostly took a black-box approach to try to reproduce it.

I started off by trying to exercise the signing code harder. I set up a test zone with the following options:

    # signatures valid for 1 day (default is 30 days)
    # re-sign 23 hours before expiry
    # (whole zone is re-signed every hour)
    sig-validity-interval 1 23;
    # restrict the size of a batch of signing to examine
    # at most 10 names and generate at most 2 signatures
    sig-signing-nodes 10;
    sig-signing-signatures 2;

I also populated the zone with about 500 records (not counting DNSSEC records) so that several records would get re-signed each minute.

This helped a bit, but I often had to wait a long time before it went wrong. I wrote a script to monitor the zone using rndc zonestatus, so I could see if the "next resign time" matches the zone's earliest expiring signature.

There was quite a lot of flailing around trying to exercise the code harder, by making the zone bigger and changing the configuration options, but I was not successful at making the bug appear on demand.

To make it churn faster, I used dnssec-signzone to construct a version of the zone in which all the signatures expire in the next few minutes:

    rndc freeze test.example
    dig axfr test.example | grep -v RRSIG |
    dnssec-signzone -e now+$((86400 - 3600 - 200)) \
            -i 3600 -j 200 \
            -f signed -o test.example /dev/stdin
    rm -f test.example test.example.jnl
    mv signed test.example
    # re-load the zone
    rndc thaw test.example
    # re-start signing
    rndc sign test.example

I also modified BIND's re-signing co-ordination code; normally each batch will re-sign any records that are due in the next 5 seconds; I reduced that to 1 second to keep batch sizes small, on the assumption that more churn would help - which it did, a little bit.

But the bug still took a random amount of time to appear, sometimes within a few minutes, sometimes it would take ages.

Finding the bug

Mark (who knows the code very well) took a bottom-up approach; he ran named under valgrind which identified an access to uninitialized memory. (I don't know what led Mark to try valgrind - whether he does it routinely or whether he tried it just for this bug.)

Evan had not been able to reproduce the bug, but once the cause was identified it became clear where it came from.

The commit on the 9th October that exposed the bug was a change to BIND's memory management code, to stop it from deliberately filling newly-allocated memory with garbage.

Before this commit, the missing initialization was hidden by the memory fill, and the byte used to fill new allocations (0xbe) happened to have the right value (zero in the bottom bit) so the signer worked correctly.

Evan builds BIND in developer mode, which enables memory filling, which stopped him from being able to reproduce it.

Verifying the fix

I changed BIND to fill memory with 0xff which (if we were right) should provoke signing failures much sooner. And it did!

Then applying the one-character fix to remove the access to uninitialized memory made the signer work properly again.

Lessons learned

BIND has a lot of infrastructure that tries to make C safer to use, for instance:

  • Run-time assertions to ensure that internal APIs are used correctly;

  • Canary elements at the start of most objects to detect memory overruns;

  • buffer and region types to prevent memory overruns;

  • A memory management system that keeps statistics on memory usage, and helps to debug memory leaks and other mistakes.

The bug was caused by failing to use buffers well, and hidden by the memory management system.

The bug occurred when initializing an rdataslab data structure, which is an in-memory serialization of a set of DNS records. The records are copied into the rdataslab in traditional C style, without using a buffer. (This is most blatant when the code manually serializes a 16 bit number instead of using isc_buffer_putuint16.) This code is particularly ancient which might explain the poor style; I think it needs refactoring for safety.

It's ironic that the bug was hidden by the memory management code - it's supposed to help expose these kinds of bug, not hide them! Nowadays, the right approach would be to link to jemalloc or some other advanced allocator, rather than writing a complicated wrapper around standard malloc. However that wasn't an option when BIND9 development started.

Conclusion

Memory bugs are painful.

High-level API design

2018-01-09 - Future - Tony Finch

This is just to record my thoughts about the overall shape of the IP Register API; the details are still to be determined, but see my previous notes on the data model and look at the old user interface for an idea of the actions that need to be available.

Read more ...

The first Oracle to PostgreSQL trial

2017-12-24 - Progress - Tony Finch

I have used ora2pg to do a quick export of the IP Register database from Oracle to PostgreSQL. This export included an automatic conversion of the table structure, and the contents of the tables. It did not include the more interesting parts of the schema such as the views, triggers, and stored procedures.

Oracle Instant Client

Before installing ora2pg, I had to install the Oracle client libraries. These are not available in Debian, but Debian's ora2pg package is set up to work with the following installation process.

  • Get the Oracle Instant Client RPMs

    from Oracle's web site. This is a free download, but you will need to create an Oracle account.

    I got the basiclite RPM - it's about half the size of the basic RPM and I didn't need full i18n. I also got the sqlplus RPM so I can talk to Jackdaw directly from my dev VMs.

    The libdbd-oracle-perl package in Debian 9 (Stretch) requires Oracle Instant Client 12.1. I matched the version installed on Jackdaw, which is 12.1.0.2.0.

  • Convert the RPMs to debs (I did this on my workstation)

    $ fakeroot alien oracle-instantclient12.1-basiclite-12.1.0.2.0-1.x86_64.rpm
    $ fakeroot alien oracle-instantclient12.1-sqlplus-12.1.0.2.0-1.x86_64.rpm
    
  • Those packages can be installed on the dev VM, with libaio1 (which is required by Oracle Instant Client but does not appear in the package dependencies), and libdbd-oracle-perl and ora2pg.

  • sqlplus needs a wrapper script that sets environment variables so that it can find its libraries and configuration files. After some debugging I foud that although the documentation claims that glogin.sql is loaded from $ORACLE_HOME/sqlplus/admin/ in fact it is loaded from $SQLPATH.

    To configure connections to Jackdaw, I copied tnsnames.ora and sqlnet.ora from ent.

Running ora2pg

By default, ora2pg exports the table definitions of the schema we are interested in (i.e. ipreg). For the real conversion I intend to port the schema manually, but ora2pg's automatic conversion is handy for a quick trial, and it will probably be a useful guide to translating the data type names.

The commands I ran were:

$ ora2pg --debug
$ mv output.sql tables.sql
$ ora2pg --debug --type copy
$ mv output.sql rows.sql

$ table-fixup.pl <tables.sql >fixed.sql
$ psql -1 -f functions.sql
$ psql -1 -f fixed.sql
$ psql -1 -f rows.sql

The fixup script and SQL functions were necessary to fill in some gaps in ora2pg's conversion, detailed below.

Compatibility problems

  • Oracle treats the empty string as equivalent to NULL but PostgreSQL does not.

    This affects constraints on the lan and mzone tables.

  • The Oracle substr function supports negative offsets which index from the right end of the string, but PostgreSQL does not.

    This affects subdomain constraints on the unique_name, maildom, and service tables. These constraints should be replaced by function calls rather than copies.

  • The ipreg schema uses raw columns for IP addresses and prefixes; ora2pg converted these to bytea.

    The v6_prefix table has a constraint that relies on implicit conversion from raw to a hex string. PostgreSQL is stricter about types, so this expression needs to work on bytea directly.

  • There are a number of cases where ora2pg represented named unique constraints as unnamed constraints with named indexes. This unnecessarily exposes an implementation detail.

  • There were a number of Oracle functions which PostgreSQL doesn't support (even with orafce), so I implemented them in the functions.sql file.

    • regexp_instr()
    • regexp_like()
    • vzise()

Other gotchas

  • The mzone_co, areader, and registrar tables reference the pers table in the jdawadm schema. These foreign key constraints need to be removed.

  • There is a weird bug in ora2pg which mangles the regex [[:cntrl:]] into [[cntrl:]]

    This is used several times in the ipreg schema to ensure that various fields are plain text. The regex is correct in the schema source and in the ALL_CONSTRAINTS table on Jackdaw, which is why I think it is an ora2pg bug.

  • There's another weird bug where a regexp_like(string,regex,flags) expression is converted to string ~ regex, flags which is nonsense.

    There are other calls to regexp_like() in the schema which do not get mangled in this way, but they have non-trivial string expressions whereas the broken one just has a column name.

Performance

The export of the data from Oracle and the import to PostgreSQL took an uncomfortably long time. The SQL dump file is only 2GB so it should be possible to speed up the import considerably.

IP Register schema wishlist

2017-12-19 - Future - Tony Finch

Here are some criticisms of the IP Register database schema and some thoughts on how we might change it.

There is a lot of infrastructure work to do before I am in a position to make changes - principally, porting from Oracle to PostgreSQL, and developing a test suite so I can make changes with confidence.

Still, it's worth writing down my thoughts so far, so colleagues can see what I have in mind, and so we have some concrete ideas to discuss.

I expect to add to this list as thoughts arise.

Read more ...

How to get a preseed file into a Debian install ISO

2017-12-12 - Progress - Tony Finch

Goal: install a Debian VM from scratch, without interaction, and with a minimum of external dependencies (no PXE etc.) by putting a preseed file on the install media.

Sadly the documentation for how to do this is utterly appalling, so here's a rant.

Starting point

The Debian installer documentation, appendix B.

https://www.debian.org/releases/stable/amd64/apbs02.html.en

Some relevant quotes:

Putting it in the correct location is fairly straightforward for network preseeding or if you want to read the file off a floppy or usb-stick. If you want to include the file on a CD or DVD, you will have to remaster the ISO image. How to get the preconfiguration file included in the initrd is outside the scope of this document; please consult the developers' documentation for debian-installer.

Note there is no link to the developers' documentation.

If you are using initrd preseeding, you only have to make sure a file named preseed.cfg is included in the root directory of the initrd. The installer will automatically check if this file is present and load it.

For the other preseeding methods you need to tell the installer what file to use when you boot it. This is normally done by passing the kernel a boot parameter, either manually at boot time or by editing the bootloader configuration file (e.g. syslinux.cfg) and adding the parameter to the end of the append line(s) for the kernel.

Note that we'll need to change the installer boot process in any case, in order to skip the interactive boot menu. But these quotes suggest that we'll have to remaster the ISO, to edit the boot parameters and maybe alter the initrd.

So we need to guess where else to find out how to do this.

Wiki spelunking

https://wiki.debian.org/DebianInstaller

This suggests we should follow https://wiki.debian.org/DebianCustomCD or use simple-cdd.

simple-cdd

I tried simple-cdd but it failed messily.

It needs parameters to select the correct version (it defaults to Jessie) and a local mirror (MUCH faster).

$ time simple-cdd --dist stretch \
        --debian-mirror http://ftp.uk.debian.org/debian
[...]
ERROR: missing required packages from profile default:  less
ERROR: missing required packages from profile default:  simple-cdd-profiles
WARNING: missing optional packages from profile default:  grub-pc grub-efi popularity-contest console-tools console-setup usbutils acpi acpid eject lvm2 mdadm cryptsetup reiserfsprogs jfsutils xfsprogs debootstrap busybox syslinux-common syslinux isolinux
real    1m1.528s
user    0m34.748s
sys     0m1.900s

Sigh, looks like we'll have to do it the hard way.

Modifying the ISO image

Eventually I realise the hard version of making a CD image without simple-cdd is mostly about custom package selections, which is not something I need.

This article is a bit more helpful...

https://wiki.debian.org/DebianInstaller/Preseed

It contains a link to...

https://wiki.debian.org/DebianInstaller/Preseed/EditIso

That requires root privilege and is a fair amount of faff.

That page in turn links to...

https://wiki.debian.org/DebianInstaller/Modify

And then...

https://wiki.debian.org/DebianInstaller/Modify/CD

This has a much easier way of unpacking the ISO using bsdtar, and instructions on rebuilding a hybrid USB/CD ISO using xorriso. Nice.

Most of the rest of the page is about changing package selections which we already determined we don't need.

Boot configuration

OK, so we have used bsdtar to unpack the ISO, and we can see various boot-related files. We need to find the right ones to eliminate the boot menu and add the preseed arguments.

There is no syslinux.cfg in the ISO so the D-I documentation's example is distressingly unhelpful.

I first tried editing boot/grub/grub.cfg but that had no effect.

There are two boot mechanisms on the ISO, one for USB and one for CD/DVD. The latter is in isolinux/isolinux.cfg.

Both must be edited (in similar but not identical ways) to get the effect I want regardless of the way the VM boots off the ISO.

Unpacking and rebuilding the ISO takes less than 3 seconds on my workstation, which is acceptably fast.

Authentication and access control

2017-12-06 - Future - Tony Finch

The IP Register database is an application hosted on Jackdaw, which is a platform based on Oracle and Apache mod_perl.

IP Register access control

Jackdaw and Raven handle authentication, so the IP Register database only needs to concern itself with access control. It does this using views defined with check option, as is briefly described in the database overview and visible in the SQL view DDL.

There are three levels of access to the database:

  • the registrar table contains privileged users (i.e. the UIS network systems team) who have read/write access to everything via the views with the all_ prefix.

  • the areader table contains semi-privileged users (i.e. certain other UIS staff) who have read-only access to everything via the views with the ra_ prefix.

  • the mzone_co table contains normal users (i.e. computer officers in other institutions) who have read-write access to their mzone(s) via the views with the my_ prefix.

Apart from a few special cases, all the underlying tables in the database are available in all three sets of views.

IP Register user identification

The first part of the view definitions is where the IP Register database schema is tied to the authenticated user. There are two kinds of connection: either a web connection authenticated via Raven, or a direct sqlplus connection authenticated with an Oracle password.

SQL users are identified by Oracle's user function; Raven users are obtained from the sys_context() function, which we will now examine more closely.

Porting to PostgreSQL

We are fortunate that support for create view with check option was added to PostgreSQL by our colleague Dean Rasheed.

The sys_context() function is a bit more interesting.

The Jackdaw API

Jackdaw's mod_perl-based API is called WebDBI, documented at https://jackdaw.cam.ac.uk/webdbi/

There's some discussion of authentication and database connections at https://jackdaw.cam.ac.uk/webdbi/webdbi.html#authentication and https://jackdaw.cam.ac.uk/webdbi/webdbi.html#sessions but it is incomplete or out of date; in particular it doesn't mention Raven (and I think basic auth support has been removed).

The interesting part is the description of sessions. Each web server process makes one persistent connection to Oracle which is re-used for many HTTP requests. How is one database connection securely shared between different authenticated users, without giving the web server enormously privileged access to the database?

Jackdaw authentication - perl

Instead of mod_ucam_webauth, WebDBI has its own implementation of the Raven protocol - see jackdaw:/usr/local/src/httpd/Database.pm.

This mod_perl code does not do all of the work; instead it calls stored procedures to complete the authentication. On initial login it calls raven_auth.create_raven_session() and for a returning user with a cookie it calls raven_auth.use_raven_session().

Jackdaw authentication - SQL

These raven_auth stored procedures set the authenticated user that is retrieved by the sys_context() call in the IP Register views - see jackdaw:/usr/local/src/httpd/raven_auth/.

Most of the logic is written in PL/SQL, but there is also an external procedure written in C which does the core cryptography - see jackdaw:/usr/local/oracle/extproc/RavenExtproc.c.

Porting to PostgreSQL - reprise

On the whole I like Jackdaw's approach to preventing the web server from having too much privilege, so I would like to keep it, though in a simplified form.

As far as I know, PostgreSQL doesn't have anything quite like sys_context() with its security properties, though you can get similar functionality using PL/Perl.

However, in the future I want more heavy-weight sessions that have more server-side context, in particular the "shopping cart" pending transaction.

So I think a better way might be to have a privileged session table, keyed by the user's cookie and containing their username and jsonb session data, etc. This table is accessed via security definer functions, with something like Jackdaw's create_raven_session(), plus functions for getting the logged-in user (to replace sys_context()) and for manipulating the jsonb session data.

We can provide ambient access to the cookie using the set session command at the start of each web request, so the auth functions can retrieve it using the current_setting() function.

Relaxed rules for external references from the cam.ac.uk domain

2017-10-13 - News - Tony Finch

We have relaxed the rules for external references from the cam.ac.uk domain so that CNAMEs are no longer required; external references can refer to IP addresses when a hostname isn't available.

One of the reasons for the old policy was that the IP Register database only knows about IP addresses on the CUDN. However, an old caveat says, "CUDN policy is not defined by this database, rather the reverse." The old policy proved to be inconvenient both for the Hostmaster team and for our colleagues around the University who requested external references. We didn't see any benefit to compensate for this inconvenience, so we have relaxed the policy.

At the moment we aren't easily able to change the structure of the IP Register database. In order to work around the technical limitations, when we need to make an external reference to an IP address, the Hosmaster team will create the address records in the domain ucam.biz and set up a CNAME in the database from cam.ac.uk to ucam.biz. This is slightly more fiddly for the Hostmaster team but we expect that it will make the overall process easier.

Ongoing DNSSEC work

2017-10-05 - Progress - Tony Finch

We reached a nice milestone today which I'm pretty chuffed about, so I wanted to share the good news. This is mostly of practical interest to the Computer Lab and Mathematics, since they have delegated DNSSEC signed zones, but I hope it is of interest to others as well.

I have a long-term background project to improve the way we manage our DNSSEC keys. We need to improve secure storage and backups of private keys, and updating public key digests in parent zones. As things currently stand it requires tricky and tedious manual work to replace keys, but it ought to be zero-touch automation.

We now have most of the pieces we need to support automatic key management.

regpg

For secure key storage and backup, we have a wrapper around GPG called regpg which makes it easier to repeatably encrypt files to a managed set of "recipients" (in GPG terminology). In this case the recipients are the sysadmins and they are able to decrypt the DNS keys (and other secrets) for deployment on new servers. With regpg the key management system will be able to encrypt newly generated keys but not able to decrypt any other secrets.

At the moment regpg is in use and sort-of available (at the link below) but this is a temporary home until I have released it properly.

Edited to link to the regpg home page

dnssec-cds

There are a couple of aspects to DNSKEY management: scheduling the rollovers, and keeping delegations in sync.

BIND 9.11 has a tool called dnssec-keymgr which makes rollovers a lot easier to manage. It needs a little bit of work to give it proper support for delegation updates, but it's definitely the way of the future. (I don't wholeheartedly recommend it in its current state.)

For synchronizing delegations, RFC 7344 describes special CDS and CDNSKEY records which a child zone can publish to instruct its parent to update the delegation. There's some support for the child side of this protocol in BIND 9.11, but it will be much more complete in BIND 9.12.

I've written dnssec-cds, an implementation of the parent side, which was committed to BIND this morning. (Yay!) My plan is to use this tool for managing our delegations to the CL and Maths. BIND isn't an easy codebase to work with; the reason for implementing dnssec-cds this way is (I hope) to encourage more organizations to deploy RFC 7344 support than I could achieve with a standalone tool.

https://gitlab.isc.org/isc-projects/bind9/commit/ba37674d038cd34d0204bba105c98059f141e31e

Until our parent zones become enlightened to the ways of RFC 7344 (e.g. RIPE, JANET, etc.) I have a half-baked framework that wraps various registry/registrar APIs so that we can manage delegations for all our domains in a consistent manner. It needs some work to bring it up to scratch, probably including a rewrite in Python to make it more appealing.

Conclusion

All these pieces need to be glued together, and I'm not sure how long that will take. Some of this glue work needs to be done anyway for non-DNSSEC reasons, so I'm feeling moderately optimistic.

DNSSEC lookaside validation decommissioned

2017-10-02 - News - Tony Finch

In the bumper July news item there is a note about DNSSEC lookaside validation (DLV) being deprecated.

During the DNS OARC27 meeting at the end of last week, DLV was decommissioned by emptying the dlv.isc.org zone. The item on the agenda was titled "Deprecating RFC5074" - there are no slides because the configuration change was made live in front of the meeting.

If you have not done so already, you should remove any dnssec-lookaside (BIND) or dlv-anchor (Unbound) from your server configuration.

The effect is that the reverse DNS for our IPv6 range 2001:630:210::/44 and our JANET-specific IPv4 ranges 193.60.80.0/20 and 193.63.252.0/32 can no longer be validated.

Other Cambridge zones which cannot be validated are our RFC 1918 reverse DNS address space (because of the difficulty of distributing trust anchors); private.cam.ac.uk; and most of our Managed Zone Service zones. This may change because we would like to improve our DNSSEC coverage.

DNSSEC root key rollover postponed

2017-09-29 - News - Tony Finch

In the bumper July news item there is a note about the DNSSEC root key rollover, which has been in careful preparation this year.

ICANN announced last night that the DNSSEC root key rollover has been postponed, and will no longer take place on the 11th October. The delay is because telemetry data reveals that too many validators do not trust the new root key.

Split views for private.cam.ac.uk

2017-09-27 - News - Tony Finch

Since private.cam.ac.uk was set up in 2002, our DNS servers have returned a REFUSED error to queries for private zones from outside the CUDN. Hiding private zones from the public Internet is necessary to avoid a number of security problems.

In March the CA/Browser Forum decided that after the 8th September 2017, certificate authorities must check CAA DNS records before issuing certificates. CAA records specify restrictions on which certificate authorities are permitted to issue certificates for a particular domain.

However, because names under private.cam.ac.uk cannot be resolved on the public Internet outside the CUDN, certificate authorities became unable to successfuly complete CAA checks for private.cam.ac.uk. The CAA specification RFC 6844 implies that a CA should refuse to issue certificates in this situation.

In order to fix this we have introduced a split view for private.cam.ac.uk.

There are now two different versions of the private.cam.ac.uk zone: a fully-populated internal version, same as before; and a completely empty external version.

With the split view, our authoritative servers will give different answers to different clients: devices on the CUDN will get full answers from the internal version of private.cam.ac.uk, and devices on the public Internet will get negative empty answers (instead of an error) from the external version.

There is no change to the "stealth secondary" arrangements for replicating the private.cam.ac.uk zone to other DNS servers on the CUDN.

The authoritative server list for private.cam.ac.uk has been pruned to include just the UIS authdns servers which have the split view configuration. Our thanks to the Computer Lab and the Engineering Department for providing authoritative service until this change.

A Cambridge Catalog Zone

2017-09-06 - News - Tony Finch

Catalog Zones are a new feature in BIND 9.11 which allow a econdary server to automatically configure itself using a specially-formatted zone. The isc.org knowledge base has an introduction to catalog zones and Jan-Piet Mens has some notes on his catalog zone tests.

We can use this new feature to make "stealth secondary" configurations much shorter and lower-maintenance. Accordingly, there is now a catz.arpa.cam.ac.uk catalog zone corresponding to our recommended stealth secondary configuration, and our sample BIND configuration has been updated with notes on how to use it.

Background

This started off with some testing of the in-progress BIND 9.12 implementation of RFC 8198, which allows a validating DNSSEC resolver to use NSEC records to synthesize negative responses. (This spec is known as the Cheese Shop after an early draft which refers to a Monty Python sketch, https://tools.ietf.org/html/draft-wkumari-dnsop-cheese-shop / https://tools.ietf.org/html/rfc8198)

RFC 8198 is very effective at suppressing unnecessary queries especially to the root DNS servers and the upper levels of the reverse DNS. A large chunk of my DNS server configuration previously tried to help with that by adding a lot of locally-served empty zones (as specified by RFC 6761 etc.) With the cheese shop all that becomes redundant.

The other big chunk of my configuration is the stealth slave list. I have previously not investigated catalog-zones in detail, since they aren't quite expressive enough for use by our central DNS servers, and in any case their configuration is already automated. But it's just right for the stealth slave configuration on my test server (and ppsw, etc.)

Setting up a Cambridge catalog zone was not too difficult. Altogether it allowed me to delete over 100 zone configurations from my test server.

Deleting "localhost" entries from the cam.ac.uk DNS zone

2017-09-01 - News - Tony Finch

Between the 4th and 8th September we will delete all the localhost entries from the cam.ac.uk DNS zone. This change should have no effect, except to avoid certain obscure web security risks.

RFC 1537, "Common DNS Data File Configuration Errors", says "all domains that contain hosts should have a localhost A record in them." and the cam.ac.uk zone has followed this advice since the early 1990s (albeit not entirely consistently).

It has belatedly come to our attention that this advice is no longer considered safe, because localhost can be used to subvert web browser security policies in some obscure situations.

Deleting our localhost DNS records should have no effect other than fixing this security bug and cleaning up the inconsistency. End-user systems handle queries for localhost using their hosts file, without making DNS queries, and without using their domain search list to construct queries for names like localhost.cam.ac.uk. We verified this by analysing query traffic on one of the central DNS resolvers, and the number of unwanted queries was negligible, less than one every 15 minutes, out of about 1000 queries per second.

CST delegated, plus DNSSEC-related news

2017-07-18 - News - Tony Finch

From October, the Computer Laboratory will be known as the Department of Computer Science and Technology.

Our colleagues in the CL have set up the zone cst.cam.ac.uk to go with the new name, and it has been added to our sample nameserver configuration file.

The first root DNSSEC key rollover is happening

The new key (tag 20326) was published on 11th July, and validating resolvers that follow RFC 5011 rollover timing will automatically start trusting it on the 10th August. There's a lot more information about the root DNSSEC key rollover on the ISC.org blog. I have added some notes on how to find out about your server's rollover state on our DNSSEC validation page.

DNSSEC lookaside validation is deprecated

The DLV turndown was announced in 2015 and the dlv.isc.org zone is due to be emptied in 2017. You should delete any dnssec-lookaside option you have in your configuration to avoid complaints in named's logs.

Annoyingly, we were relying on DLV as a stop-gap while waiting for JISC to sign their reverse DNS zones. Some of our IPv4 address ranges and our main IPv6 allocation are assigned to us from JISC. Without DLV these zones can no longer be validated.

BIND 9.11

2017-07-11 - News - Tony Finch

The central DNS servers have been upgraded from BIND 9.10 to BIND 9.11, which has a number of new features a few of which are particularly relevant to us.

On the authoritative servers, the minimal-any anti-DDOS feature was developed by us and contributed to isc.org. Happily we no longer have to maintain this as a patch.

On the recursive servers, there are a couple of notable features.

Firstly, BIND 9.11 uses EDNS cookies to identify legitimate clients so they can bypass DDoS rate limiting. Unfortunately EDNS options can encounter bugs in old badly-maintained third-party DNS servers. We are keeping an eye out for problems and if necessary we can add buggy servers to a badlist of those who can't have cookies.

Secondly, we now have support for "negative trust anchors" which provide a workaround for third party DNSSEC failures. Fortunately we have not so far had significant problems due to the lack of this feature.

BIND CVE-2017-3142 and CVE-2017-3143

2017-06-30 - News - Tony Finch

In case you have not already seen it, last night ISC.org announced a serious vulnerability in BIND: if you have a server which allows dynamic DNS UPDATE then a remote attacker may me able to alter your zones without proper authentication. For more details see:

Note that update-policy local; uses a well-known TSIG key name, and does not include any IP address ACL restrictions, so it is extremely vulnerable to attack. To mitigate this you can replace update-policy local; with

allow-update { !{ !localhost; any; }; key local-ddns; };

This denies updates that come from everywhere except localhost, and then allows updates with the built-in local-ddns key. For a longer explanation, see https://kb.isc.org/article/AA-00723/0/Using-Access-Control-Lists-ACLs-with-both-addresses-and-keys.html You can still use nsupdate -l with this configuration.

Our master DNS server has very strict packet filters which should be effective at mitigating this vulnerability until I can update the servers.

April BIND security release

2017-04-13 - News - Tony Finch

Yesterday evening there was a BIND security release fixing three vulnerabilities.

The most serious one is CVE-2017-3137 which can crash recursive servers. (It is related to the previous DNAME/CNAME RRset ordering bugs which led to security releases in January and November.)

The other vulnerabilities are in DNS64 support (which I don't think any of us use) and in the rndc control channel (which is mainly a worry if you have opened up read-only access in BIND 9.11).

More details on the bind-announce list, https://lists.isc.org/pipermail/bind-announce/2017-April/thread.html

I have patched the central DNS servers and the ppsw-specific resolvers.

An update on Cloudflare

2017-03-20 - News - Tony Finch

The UIS no longer plans to deploy Cloudflare on a large scale; we will use Cloudflare only for www.cam.ac.uk.

As such the automated Cloudflare provisioning system described previously has been decommissioned.

Security upgrade

2017-03-06 - News - Tony Finch

A number of security vulnerabilities in the IP Register web user interface have been fixed.

Read more ...

SPF records

2017-01-30 - News - Tony Finch

The Sender Policy Framework is a way to publish in the DNS which mail servers may send email "from" a particular mail domain. It uses specially formatted TXT records alongside the mail domain's MX records.

Over the last several months, we have added SPF records for mail domains under cam.ac.uk which have mail hosted offsite. The most common offsite host is Microsoft Office 365 Exchange Online, but we have a few others using survey or mailshot services.

These SPF records are managed by the IP Register / Hostmaster team, in co-operation with the Mail Support / Postmaster team. Please email us if you would like any changes.

Name servers need patching: four BIND CVEs

2017-01-12 - News - Tony Finch

ISC.org have just announced several denial-of-service vulnerabilities in BIND's handling of DNS responses. Recursive DNS servers are particularly vulnerable.

I am in the process of patching our central DNS servers; you should patch yours too.

These bugs appear to be a similar class of error to the previous BIND CVE a couple of months ago.

Streaming replication from PostgreSQL to the DNS

2016-12-23 - Future - Tony Finch

This entry is backdated - I'm writing this one year after I made this experimental prototype.

Our current DNS update mechanism runs as an hourly batch job. It would be nice to make DNS changes happen as soon as possible.

user interface matters

Instant DNS updates have tricky implications for the user interface.

At the moment it's possible to make changes to the database in between batch runs, knowing that broken intermediate states don't matter, and with plenty of time to check the changes and make sure the result will be OK.

If the DNS is updated immediately, we need a way for users to be able to prepare a set of inter-related changes, and submit them to the database as a single transaction.

(Aside: I vaguely imagine something like a shopping-cart UI that's available for collecting more complicated changes, though it should be possible to submit simple updates without a ceremonial transaction.)

This kind of UI change is necessary even is we simply run the current batch process more frequently. So we can't reasonalbly deploy this without a lot of front-end work.

back-end experiments

Ideally I would like to keep the process of exporting the database to the DNS and DHCP servers as a purely back-end matter; the front-end user interface should only be a database client.

So, assuming we have a better user interface, we would like to be able to get instant DNS updates by improvements to the back end without any help from the front end.

PostgreSQL has a very tempting replication feature called "logical decoding", which takes a replication stream and turns it into a series of database transactions. You can write a logical decoding plugin which emits these transactions in whatever format you want.

With logical decoding, we can (with a bit of programming) treat the DNS as a PostgreSQL replication target, with a script that looks something like pg_recvlogical | nsupdate.

I wrote a prototype along these lines, which is published at https://git.uis.cam.ac.uk/x/uis/ipreg/pg-decode-dns-update.git

status of this prototype

The plugin itself works in a fairly satisfactory manner.

However it needs a wrapper script to massage transactions before they are fed into nsupdate, mainly to split up very large transactions that cannot fit in a single UPDATE request.

The remaining difficult work is related to starting, stopping, and pausing replication without losing transactions. In particular, during initial deployment we need to be able to pause replication and verify that the replicated updates are faithfully reproducing what the batch update would have done. We can use the same pause/batch/resume mechanism to update the parts of the DNS that are not maintained in the database.

At the moment we are not doing any more work in this area until the other prerequisites are in place.

Cloudflare

2016-11-29 - News - Tony Finch

Cloudflare is a web content delivery network with an emphasis on denial-of-service protection.

The UIS are aiming to deploy Cloudflare in front of the University's most prominent / sensitive web sites; this service might be extended more widely to other web sites, though it is not currently clear if this will be feasible.

There is a separate document with more details of how the IP Register database and Cambridge DNS setup support Cloudflare.

Recursive servers need patching: BIND CVE 2016-8864

2016-11-01 - News - Tony Finch

ISC.org have just announced a denial-of-service vulnerability in BIND's handling of DNAME records in DNS responses. Recursive DNS servers are particularly vulnerable.

I am in the process of patching our central DNS servers; you should patch yours too.

(This bug was encountered by Marco Davids of SIDN Labs, and I identified it as a security vulnerability and reported it to ISC.org. You can find us in the acknowledgments section of the security advisory.)

Urgent patching required: BIND CVE 2016-2776

2016-10-05 - News - Tony Finch

On 28th September we wrote:

Yesterday evening, ISC.org announced a denial-of-service vulnerability in BIND's buffer handling. The crash can be triggered even if the apparent source address is excluded by BIND's ACLs (allow-query).

All servers are vulnerable if they can receive request packets from any source.

If you have not yet patched, you should be aware that this bug is now being actively exploited.

Urgent patching required: BIND CVE 2016-2776

2016-09-28 - News - Tony Finch

Yesterday evening, ISC.org announced a denial-of-service vulnerability in BIND's buffer handling. The crash can be triggered even if the apparent source address is excluded by BIND's ACLs (allow-query).

All servers are vulnerable if they can receive request packets from any source.

Most machines on the CUDN are protected to a limited extent from outside attack by the port 53 packet filter. DNS servers that have an exemption are much more at risk.

http://www.ucs.cam.ac.uk/network/infoinstitutions/techref/portblock

I am in the process of patching our central DNS servers; you should patch yours too.

(This is another bug found by ISC.org's fuzz testing campaign; they have slowed down a lot since the initial rush that started about a year ago; the last one was in March.)

recovering from the DNS update service outage

2016-04-06 - News - Tony Finch

Sorry about the extended lack of DNS updates today.

Unfortunately the VM host system lost the RAID set that held the filesystem for our DNS master server (amongst others). We determined that it would be faster to rebuild some servers from scratch rather than waiting for more intensive RAID recovery efforts.

The DNS master server is set up so it can be rebuit from scratch without too much difficulty - all the data on its filesystem comes from our configuration management systems, and from the IP register and MZS databases.

The main effect of this is that the zone transfers following the rebuild will be full transfers from scratch - incremental transfers are not possible. There is likely to be some additional load which slows down zone transfers while everything catches up.

BIND CVE-2016-1286 etc.

2016-03-10 - News - Tony Finch

Last night the ISC announced another security release of BIND to fix three vulnerabilities. For details see https://lists.isc.org/pipermail/bind-announce/2016-March/thread.html

Probably the most risky is CVE-2016-1286 which is a remote denial-of-service vulnerability in all versions of BIND without a workaround. CVE-2016-1285 can be mitigated, and probably is already mitigated on servers with a suitably paranoid configuration. CVE-2016-2088 is unlikely to be a problem.

I have updated the central DNS servers to BIND 9.10.3-P4.

I have also made a change to the DNS servers' name compression behaviour. Traditionally, BIND used to compress domain names in responses so they match the case of the query name. Since BIND 9.10 it has tried to preserve the case of responses from servers, which can lead to case mismatches between queries and answers. This exposed a case-sensitivity bug in Nagios, so after the upgrade it falsely claimed that our resolvers were not working properly! I have added a no-case-compress clause to the configuration so our resolvers now behave in the traditional manner.

recursive DNS server packet filters

2016-03-02 - News - Tony Finch

Yesterday I changed the iptables packet filters on the central recursive DNS servers, 131.111.8.42 and 131.111.12.20, to harden them against denial of service attacks from outside the CUDN.

Previously we were rejecting queries from outside the CUDN using DNS-level REFUSED responses; now, TCP connections from outside the CUDN are rejected at the network layer using ICMP connection refused.

This change should not have any visible effect; I am letting you know because others who run DNS servers on the CUDN might want to make a similar change, and because there is some interesting background.

For most purposes, incoming DNS queries are blocked by the JANET border packet filters. http://www.ucs.cam.ac.uk/network/infoinstitutions/techref/portblock You only really need an exemption to this block for authoritative DNS servers. If you are running recursive-only DNS servers that are exempted from the port 53 block, you should consider changing your packet filters.

The particular reason for this change is that BIND's TCP connection listener is trivially easy to flood. The inspiration for this change is a cleverly evil exploit announced by Cloudflare earlier this week which relies on TCP connection flooding. Although their particular attack doesn't work with BIND, it would still be unpleasant if anyone tried it on us.

I have published a blog article with more background and context at http://fanf.livejournal.com/141807.html

DNS DoS mitigation by patching BIND to support draft-ietf-dnsop-refuse-any

2016-02-05 - Progress - Tony Finch

Last weekend one of our authoritative name servers (authdns1.csx.cam.ac.uk) suffered a series of DoS attacks which made it rather unhappy. Over the last week I have developed a patch for BIND to make it handle these attacks better.

The attack traffic

On authdns1 we provide off-site secondary name service to a number of other universities and academic institutions; the attack targeted imperial.ac.uk.

For years we have had a number of defence mechanisms on our DNS servers. The main one is response rate limiting, which is designed to reduce the damage done by DNS reflection / amplification attacks.

However, our recent attacks were different. Like most reflection / amplification attacks, we were getting a lot of QTYPE=ANY queries, but unlike reflection / amplification attacks these were not spoofed, but rather were coming to us from a lot of recursive DNS servers. (A large part of the volume came from Google Public DNS; I suspect that is just because of their size and popularity.)

My guess is that it was a reflection / amplification attack, but we were not being used as the amplifier; instead, a lot of open resolvers were being used to amplify, and they in turn were making queries upstream to us. (Consumer routers are often open resolvers, but usually forward to their ISP's resolvers or to public resolvers such as Google's, and those query us in turn.)

What made it worse

Because from our point of view the queries were coming from real resolvers, RRL was completely ineffective. But some other configuration settings made the attacks cause more damage than they might otherwise have done.

I have configured our authoritative servers to avoid sending large UDP packets which get fragmented at the IP layer. IP fragments often get dropped and this can cause problems with DNS resolution. So I have set

max-udp-size 1420;
minimal-responses yes;

The first setting limits the size of outgoing UDP responses to an MTU which is very likely to work. (The ethernet MTU minus some slop for tunnels.) The second setting reduces the amount of information that the server tries to put in the packet, so that it is less likely to be truncated because of the small UDP size limit, so that clients do not have to retry over TCP.

This works OK for normal queries; for instance a cam.ac.uk IN MX query gets a svelte 216 byte response from our authoritative servers but a chubby 2047 byte response from our recursive servers which do not have these settings.

But ANY queries blow straight past the UDP size limit: the attack queries for imperial.ac.uk IN ANY got obese 3930 byte responses.

The effect was that the recursive clients retried their queries over TCP, and consumed the server's entire TCP connection quota. (Sadly BIND's TCP handling is not up to the standard of good web servers, so it's quite easy to nadger it in this way.)

draft-ietf-dnsop-refuse-any

We might have coped a lot better if we could have served all the attack traffic over UDP. Fortunately there was some pertinent discussion in the IETF DNSOP working group in March last year which resulted in draft-ietf-dnsop-refuse-any, "providing minimal-sized responses to DNS queries with QTYPE=ANY".

This document was instigated by Cloudflare, who have a DNS server architecture which makes it unusually difficult to produce traditional comprehensive responses to ANY queries. Their approach is instead to send just one synthetic record in response, like

cloudflare.net.  HINFO  ( "Please stop asking for ANY"
                          "See draft-jabley-dnsop-refuse-any" )

In the discussion, Evan Hunt (one of the BIND developers) suggested an alternative approach suitable for traditional name servers. They can reply to an ANY query by picking one arbitrary RRset to put in the answer, instead of all of the RRsets they have to hand.

The draft says you can use either of these approaches. They both allow an authoritative server to make the recursive server go away happy that it got an answer, and without breaking odd applications like qmail that foolishly rely on ANY queries.

I did a few small experiments at the time to demonstrate that it really would work OK in the real world (unlike some of the earlier proposals) and they are both pretty neat solutions (unlike some of the earlier proposals).

Attack mitigation

So draft-ietf-dnsop-refuse-any is an excellent way to reduce the damage caused by the attacks, since it allows us to return small UDP responses which reduce the downstream amplification and avoid pushing the intermediate recursive servers on to TCP. But BIND did not have this feature.

I did a very quick hack on Tuesday to strip down ANY responses, and I deployed it to our authoritative DNS servers on Wednesday morning for swift mitigation. But it was immediately clear that I had put my patch in completely the wrong part of BIND, so it would need substantial re-working before it could be more widely useful.

I managed to get back to the Patch on Thursday. The right place to put the logic was in the fearsome query_find() which is the top-level query handling function and nearly 2400 lines long! I finished the first draft of the revised patch that afternoon (using none of the code I wrote on Tuesday), and I spent Friday afternoon debugging and improving it.

The result this patch which adds a minimal-qtype-any option. I'm currently running it on my toy nameserver, and I plan to deploy it to our production servers next week to replace the rough hack.

I have submitted the patch to ISC.org; hopefully something like it will be included in a future version of BIND. And it prompted a couple of questions about draft-ietf-dnsop-refuse-any that I posted to the DNSOP working group mailing list.

January BIND security release

2016-01-20 - News - Tony Finch

Last night the ISC published yet another security release of BIND.

For details, please see the announcement messages: https://lists.isc.org/pipermail/bind-announce/2016-January/thread.html

The central DNS servers have been upgraded to BIND 9.10.3-P3.

BIND security release

2015-12-17 - News - Tony Finch

On Tuesday night the ISC published security releases of BIND which fix a couple of remote denial of service vulnerabilities. If you are running a recursive DNS server then you should update as soon as possible.

If you build your own BIND packages linked to OpenSSL 1.0.1 or 1.0.2 then you should also be aware of the OpenSSL security release that occurred earlier this month. The new versions of BIND will refuse to build with vulnerable versions of OpenSSL.

For more information see the bind-announce list, https://lists.isc.org/pipermail/bind-announce/2015-December/thread.html

The central nameservers and the resolvers on the central mail relays were updated to BIND 9.10.3-P2 earlier today.

Isaac Newton Institute delegated

2015-10-22 - News - Tony Finch

There are two new zones in the sample nameserver configuration,

  • newton.cam.ac.uk
  • 145.111.131.in-addr.arpa

These have been delegated like the other domains managed by the Faculty of Mathematics.

Cutting a zone with DNSSEC

2015-10-21 - Progress - Tony Finch

This week we will be delegating newton.cam.ac.uk (the Isaac Newton Institute's domain) to the Faculty of Mathematics, who have been running their own DNS since the very earliest days of Internet connectivity in Cambridge.

Unlike most new delegations, the newton.cam.ac.uk domain already exists and has a lot of records, so we have to keep them working during the process. And for added fun, cam.ac.uk is signed with DNSSEC, so we can't play fast and loose.

In the absence of DNSSEC, it is mostly OK to set up the new zone, get all the relevant name servers secondarying it, and then introduce the zone cut. During the rollout, some servers will be serving the domain from the old records in the parent zone, and other servers will serve the domain from the new child zone, which occludes the old records in its parent.

But this won't work with DNSSEC because validators are aware of zone cuts, and they check that delegations across cuts are consistent with the answers they have received. So with DNSSEC, the process you have to follow is fairly tightly constrained to be basically the opposite of the above.

The first step is to set up the new zone on name servers that are completely disjoint from those of the parent zone. This ensures that a resolver cannot prematurely get any answers from the new zone - they have to follow a delegation from the parent to find the name servers for the new zone. In the case of newton.cam.ac.uk, we are lucky that the Maths name servers satisfy this requirement.

The second step is to introduce the delegation into the parent zone. Ideally this should propagate to all the authoritative servers promptly, using NOTIFY and IXFR.

(I am a bit concerned about DNSSEC software which does validation as a separate process after normal iterative resolution, which is most of it. While the delegation is propagating it is possible to find the delegation when resolving, but get a missing delegation when validating. If the validator is persistent at re-querying for the delegation chain it should be able to recover from this; but quick propagation minimizes the problem.)

After the delegation is present on all the authoritative servers, and old data has timed out of caches, the new child zone can (if necessary) be added to the parent zone's name servers. In our case the central cam.ac.uk name servers and off-site secondaries also serve the Maths zones, so this step normalizes the setup for newton.cam.ac.uk.

(lack of) DNS root hints

2015-09-28 - News - Tony Finch

Another change I made to sample.named.conf on Friday was to remove the explicit configuration of the root name server hints. I was asked why, so I thought I should explain to everyone.

BIND comes with a built-in copy of the hints, so there is no need to explicitly configure them. It is important to keep BIND up-to-date for security reasons, so the root hints should not be stale. And even if they are stale, the only negative effect is a warning in the logs.

So I regard explicitly configuring root hints as needless extra work.

It is worth noting that the H-root name server IP addresses are going to change on the 1st December 2015. We will not be making any special effort in response since normal BIND updates will include this change in due course.

There is a history of root name server IP address changes at http://root-servers.org/news.html

New CUDN-wide private addresses 10.128.0.0/9

2015-09-25 - News - Tony Finch

You should be aware of our previous announcements about changing the status of 10.128.0.0/9 to CUDN-wide private address space.

The central name servers now have DNS zones for 10.128.0.0/9. There are not yet any registrations in this address space, so the zones are currently almost empty. We have updated the name server configuration advice to cover these new zones.

https://jackdaw.cam.ac.uk/ipreg/nsconfig/

On the CUDN the RFC 1918 address block 10.0.0.0/8 is divided in two. The bottom half, 10.0.0.0/9, is for institution-private usage and is not routed on the CUDN. The top half, 10.128.0.0/9, was previously reserved; it has now been re-assigned as CUDN-wide private address space.

To provide DNS for 10.0.0.0/8 we have a mechanism for conveniently sharing the zone 10.in-addr.arpa between institution-private and CUDN-wide private uses. The arrangement we are using is similar to the way 128.232.0.0/16 is divided between the Computer Lab and the rest of the University.

We have two new zones for this address space,

  • 10.in-addr.arpa
  • in-addr.arpa.private.cam.ac.uk

The sample nameserver configuration has been updated to include them.

Institutions that are using the bottom half, 10.0.0.0/9, should provide their own version of 10.in-addr.arpa with DNAME redirections to in-addr.arpa.private.cam.ac.uk for the CUDN-wide addresses.

BIND security update

2015-09-03 - News - Tony Finch

Last night the ISC released new versions of BIND, 9.9.7-P3 and 9.10.2-P4, which address a couple of remote denial-of-service vulnerabilities, CVE-2015-5722 (DNSKEY parsing bug) and CVE-2015-5986 (OPENPGPKEY parsing bug). There is some background information on the recent spate of security releases at https://www.isc.org/blogs/summer_security_vulnerabilities/

If you are running BIND as a recursive DNS server you should update it urgently. We will be patching the central DNS servers this morning.

CVE-2015-5477: critical remote crash bug in BIND

2015-07-29 - News - Tony Finch

If you have a DNS server running BIND, you should apply the latest security patch as soon as possible.

The bind-announce mailing list has the formal vulnerability notification and release announcements:

The authors of BIND have also published a blog post emphasizing that there are no workarounds for this vulnerability: it affects both recursive and authoritative servers and I understand that query ACLs are not sufficient protection.

Our central DNS servers authdns* and recdns* have been patched.

More frequent DNS updates

2015-02-18 - News - Tony Finch

DNS updates now occur every hour at 53 minutes past the hour. (There is a mnemonic for the new timing of DNS updates: 53 is the UDP and TCP port number used by the DNS.) Previously, the interval between DNS update runs was four hours.

The update job takes a minute or two to run, after which changes are immediately visible on our public authoritative DNS servers, and on our central recursive servers 131.111.8.42 and 131.111.12.20.

We have also reduced the TTL of our DNS records from 24 hours to 1 hour. (The time-to-live is the maximum time old data will remain in DNS caches.) This shorter TTL means that users of other recursive DNS servers around the University and elsewhere will observe DNS changes within 2 hours of changes to the IP Register database.

There are two other DNS timing parameters which were reduced at the time of the new DNS server rollout.

The TTL for negative answers (in response to queries for data that is not present in the DNS) has been reduced from 4 hours to 1 hour. This can make new entries in the DNS available faster.

Finally, we have reduced the zone refresh timer from 4 hours to 30 minutes. This means that unofficial "stealth secondary" nameservers will fetch DNS updates within 90 minutes of a change being made to the IP Register database. Previously the delay could be up to 8 hours.

DNS server rollout report

2015-02-16 - Progress - Tony Finch

Last week I rolled out my new DNS servers. It was reasonably successful - a few snags but no showstoppers.

Read more ...

New DNS servers

2015-02-15 - News - Tony Finch

The DNS servers have been replaced with an entirely new setup.

The immediate improvements are:

  • Automatic failover for recursive DNS servers. There are servers in four different locations, two live, two backup.

  • DNSSEC signing moved off authdns0 onto a hidden master server, with support for signing Managed Zone Service domains.

There are extensive improvements to the DNS server management and administration infrastructure:

  • Configuration management and upgrade orchestration moved from ad-hoc to Ansible.

  • Revision control moved from SCCS to git, including a history of over 20,000 changes dating back to 1990.

  • Operating system moved from Solaris to Linux, to make better use of our local knowledge and supporting infrastructure.

DNS server upgrade

2015-02-09 - News - Tony Finch

This week we will replace the DNS servers with an entirely new setup. This change should be almost invisible: the new servers will behave the same as the old ones, and each switchover from an old to a new server will only take a few seconds so should not cause disruption.

The rollout will switch over the four service addresses on three occasions this week. We are avoiding changes during the working day, and rolling out in stages so we are able to monitor each change separately.

Tuesday 10 February, 18:00 -

  • Recursive server recdns1, 131.111.12.20 (expected outage 15s)

Wednesday 11 February, 08:00 -

  • Recursive server recdns0, 131.111.8.42 (expected outage 15s)
  • Authoritative server authdns1, 131.111.12.37 (expected outage 40s)

Thursday 12 February, 18:00 -

  • Authoritative server authdns0, 131.111.8.37 (expected outage 40s)

There will be a couple of immediate improvements to the DNS service, with more to follow:

  • Automatic failover for recursive DNS servers. There are servers in three different locations, two live, one backup, and when the West Cambridge Data Centre comes online there will be a second backup location.

  • DNSSEC signing moved off authdns0 onto a hidden master server, with support for signing Managed Zone Service domains.

There are extensive improvements to the DNS server management and administration infrastructure:

  • Configuration management and upgrade orchestration moved from ad-hoc to Ansible. The expected switchover timings above are based on test runs of the Ansible rollout / backout playbooks.

  • Revision control moved from SCCS to git, including a history of over 20,000 changes dating back to 1990.

  • Operating system moved from Solaris to Linux, to make better use of our local knowledge and supporting infrastructure.

Recursive DNS rollout plan - and backout plan!

2015-01-30 - Progress - Tony Finch

The last couple of weeks have been a bit slow, being busy with email and DNS support, an unwell child, and surprise 0day. But on Wednesday I managed to clear the decks so that on Thursday I could get down to some serious rollout planning.

My aim is to do a forklift upgrade of our DNS servers - a tier 1 service - with negligible downtime, and with a backout plan in case of fuckups.

Read more ...

BIND patches as a byproduct of setting up new DNS servers

2015-01-17 - Progress - Tony Finch

On Friday evening I reached a BIG milestone in my project to replace Cambridge University's DNS servers. I finished porting and rewriting the dynamic name server configuration and zone data update scripts, and I was - at last! - able to get the new servers up to pretty much full functionality, pulling lists of zones and their contents from the IP Register database and the managed zone service, and with DNSSEC signing on the new hidden master.

There is still some final cleanup and robustifying to do, and checks to make sure I haven't missed anything. And I have to work out the exact process I will follow to put the new system into live service with minimum risk and disruption. But the end is tantalizingly within reach!

In the last couple of weeks I have also got several small patches into BIND.

Read more ...

Recursive DNS server failover with keepalived --vrrp

2015-01-09 - Progress - Tony Finch

I have got keepalived working on my recursive DNS servers, handling failover for testdns0.csi.cam.ac.uk and testdns1.csi.cam.ac.uk. I am quite pleased with the way it works.

Read more ...

Network setup for Cambridge's new DNS servers

2015-01-07 - Progress - Tony Finch

The SCCS-to-git project that I wrote about previously was the prelude to setting up new DNS servers with an entirely overhauled infrastructure.

Read more ...

Uplift from SCCS to git

2014-11-27 - Progress - Tony Finch

My current project is to replace Cambridge University's DNS servers. The first stage of this project is to transfer the code from SCCS to Git so that it is easier to work with.

Ironically, to do this I have ended up spending lots of time working with SCCS and RCS, rather than Git. This was mainly developing analysis and conversion tools to get things into a fit state for Git.

If you find yourself in a similar situation, you might find these tools helpful.

Read more ...

The early days of the Internet in Cambridge

2014-10-30 - Progress - Tony Finch

I'm currently in the process of uplifting our DNS development / operations repository from SCCS (really!) to git. This is not entirely trivial because I want to ensure that all the archival material is retained in a sensible way.

I found an interesting document from one of the oldest parts of the archive, which provides a good snapshot of academic computer networking in the UK in 1991. It was written by Tony Stonely, aka <ajms@cam.ac.uk>. AJMS is mentioned in RFC 1117 as the contact for Cambridge's IP address allocation. He was my manager when I started work at Cambridge in 2002, though he retired later that year.

The document is an email discussing IP connectivity for Cambridge's Institute of Astronomy. There are a number of abbreviations which might not be familiar...

  • Coloured Book: the JANET protocol suite
  • CS: the University Computing Service
  • CUDN: the Cambridge University Data Network
  • GBN: the Granta Backbone Network, Cambridge's duct and fibre infrastructure
  • grey: short for Grey Book, the JANET email protocol
  • IoA: the Institute of Astronomy
  • JANET: the UK national academic network
  • JIPS: the JANET IP service, which started as a pilot service early in 1991; IP traffic rapidly overtook JANET's native X.25 traffic, and JIPS became an official service in November 1991, about when this message was written
  • PSH: a member of IoA staff
  • RA: the Rutherford Appleton Laboratory, a national research institute in Oxfordshire the Mullard Radio Astronomy Observatory, an outpost at Lords Bridge near Barton, where some of the dishes sit on the old Cambridge-Oxford railway line. (I originally misunderstood the reference.)
  • RGO: The Royal Greenwich Observatory, which moved from Herstmonceux to the IoA site in Cambridge in 1990
  • Starlink: a UK national DECnet network linking astronomical research institutions

Edited to correct the expansion of RA and to add Starlink

    Connection of IoA/RGO to IP world
    ---------------------------------

This note is a statement of where I believe we have got to and an initial
review of the options now open.

What we have achieved so far
----------------------------

All the Suns are properly connected at the lower levels to the
Cambridge IP network, to the national IP network (JIPS) and to the
international IP network (the Internet). This includes all the basic
infrastructure such as routing and name service, and allows the Suns
to use all the usual native Unix communications facilities (telnet,
ftp, rlogin etc) except mail, which is discussed below. Possibly the
most valuable end-user function thus delivered is the ability to fetch
files directly from the USA.

This also provides the basic infrastructure for other machines such as
the VMS hosts when they need it.

VMS nodes
---------

Nothing has yet been done about the VMS nodes. CAMV0 needs its address
changing, and both IOA0 and CAMV0 need routing set for extra-site
communication. The immediate intention is to route through cast0. This
will be transparent to all parties and impose negligible load on
cast0, but requires the "doit" bit to be set in cast0's kernel. We
understand that PSH is going to do all this [check], but we remain
available to assist as required.

Further action on the VMS front is stalled pending the arrival of the
new release (6.6) of the CMU TCP/IP package. This is so imminent that
it seems foolish not to await it, and we believe IoA/RGO agree [check].

Access from Suns to Coloured Book world
---------------------------------------

There are basically two options for connecting the Suns to the JANET
Coloured Book world. We can either set up one or more of the Suns as
full-blown independent JANET hosts or we can set them up to use CS
gateway facilities. The former provides the full range of facilities
expected of any JANET host, but is cumbersome, takes significant local
resources, is complicated and long-winded to arrange, incurs a small
licence fee, is platform-specific, and adds significant complexity to
the system managers' maintenance and planning load. The latter in
contrast is light-weight, free, easy to install, and can be provided
for any reasonable Unix host, but limits functionality to outbound pad
and file transfer either way initiated from the local (IoA/RGO) end.
The two options are not exclusive.

We suspect that the latter option ("spad/cpf") will provide adequate
functionality and is preferable, but would welcome IoA/RGO opinion.

Direct login to the Suns from a (possibly) remote JANET/CUDN terminal
would currently require the full Coloured Book package, but the CS
will shortly be providing X.29-telnet gateway facilities as part of
the general infrastructure, and can in any case provide this
functionality indirectly through login accounts on Central Unix
facilities. For that matter, AST-STAR or WEST.AST could be used in
this fashion.

Mail
----

Mail is a complicated and difficult subject, and I believe that a
small group of experts from IoA/RGO and the CS should meet to discuss
the requirements and options. The rest of this section is merely a
fleeting summary of some of the issues.
Firstly, a political point must be clarified. At the time of writing
it is absolutely forbidden to emit smtp (ie Unix/Internet style) mail
into JIPS. This prohibition is national, and none of Cambridge's
doing. We expect that the embargo will shortly be lifted somewhat, but
there are certain to remain very strict rules about how smtp is to be
used. Within Cambridge we are making best guesses as to the likely
future rules and adopting those as current working practice. It must
be understood however that the situation is highly volatile and that
today's decisions may turn out to be wrong.

The current rulings are (inter alia)

        Mail to/from outside Cambridge may only be grey (Ie. JANET
        style).

        Mail within Cambridge may be grey or smtp BUT the reply
        address MUST be valid in BOTH the Internet AND Janet (modulo
        reversal). Thus a workstation emitting smtp mail must ensure
        that the reply address contained is that of a current JANET
        mail host. Except that -

        Consenting machines in a closed workgroup in Cambridge are
        permitted to use smtp between themselves, though there is no
        support from the CS and the practice is discouraged. They
        must remember not to contravene the previous two rulings, on
        pain of disconnection.

The good news is that a central mail hub/distributer will become
available as a network service for the whole University within a few
months, and will provide sufficient gateway function that ordinary
smtp Unix workstations, with some careful configuration, can have full
mail connectivity. In essence the workstation and the distributer will
form one of those "closed workgroups", the workstation will send all
its outbound mail to the distributer and receive all its inbound mail
from the distributer, and the distributer will handle the forwarding
to and from the rest of Cambridge, UK and the world.

There is no prospect of DECnet mail being supported generally either
nationally or within Cambridge, but I imagine Starlink/IoA/RGO will
continue to use it for the time being, and whatever gateway function
there is now will need preserving. This will have to be largely
IoA/RGO's own responsibility, but the planning exercise may have to
take account of any further constraints thus imposed. Input from
IoA/RGO as to the requirements is needed.

In the longer term there will probably be a general UK and worldwide
shift to X.400 mail, but that horizon is probably too hazy to rate more
than a nod at present. The central mail switch should in any case hide
the initial impact from most users.

The times are therefore a'changing rather rapidly, and some pragmatism
is needed in deciding what to do. If mail to/from the IP machines is
not an urgent requirement, and since they will be able to log in to
the VMS nodes it may not be, then the best thing may well be to await
the mail distributer service. If more direct mail is needed more
urgently then we probably need to set up a private mail distributer
service within IoA/RGO. This would entail setting up (probably) a Sun
as a full JANET host and using it as the one and only (mail) route in
or out of IoA/RGO. Something rather similar has been done in Molecular
Biology and is thus known to work, but setting it up is no mean task.
A further fall-back option might be to arrange to use Central Unix
facilities as a mail gateway in similar vein. The less effort spent on
interim facilities the better, however.

Broken mail
-----------

We discovered late in the day that smtp mail was in fact being used
between IoA and RA, and the name changing broke this. We regret having
thus trodden on existing facilities, and are willing to help try to
recover any required functionality, but we believe that IoA/RGO/RA in
fact have this in hand. We consider the activity to fall under the
third rule above. If help is needed, please let us know.

We should also report sideline problem we encountered and which will
probably be a continuing cause of grief. CAVAD, and indeed any similar
VMS system, emits mail with reply addresses of the form
"CAVAD::user"@....  This is quite legal, but the quotes are
syntactically significant, and must be returned in any reply.
Unfortunately the great majority of Unix systems strip such quotes
during emission of mail, so the reply address fails. Such stripping
can occur at several levels, notably the sendmail (ie system)
processing and the one of the most popular user-level mailers. The CS
is fixing its own systems, but the problem is replicated in something
like half a million independent Internet hosts, and little can be done
about it.

Other requirements
------------------

There may well be other requirements that have not been noticed or,
perish the thought, we have inadvertently broken. Please let us know
of these.

Bandwidth improvements
----------------------

At present all IP communications between IoA/RGO and the rest of the
world go down a rather slow (64Kb/sec) link. This should improve
substantially when it is replaced with a GBN link, and to most of
Cambridge the bandwidth will probably become 1-2Mb/sec. For comparison,
the basic ethernet bandwidth is 10Mb/sec. The timescale is unclear, but
sometime in 1992 is expected. The bandwidth of the national backbone
facilities is of the order of 1Mb/sec, but of course this is shared with
many institutions in a manner hard to predict or assess.

For Computing Service,
Tony Stoneley, ajms@cam.cus
29/11/91

Some small changes to sample.named.conf

2014-07-23 - News - Tony Finch

The UIS have taken over the management of the DNS entries for the MRC Biostatistics Unit subnets 193.60.[86-87].x. As a result, the zones

  • 86.60.193.in-addr.arpa
  • 87.60.193.in-addr.arpa

can now be slaved from the authdns*.csx servers by hosts within the CUDN, and they have been added to the sample BIND configuration at

https://jackdaw.cam.ac.uk/ipreg/nsconfig/sample.named.conf

Those who feel they must slave enough reverse zones to cover the whole CUDN may want to include them. These zones are not yet signed, but we expect them to be within a week or two.

A number of cosmetic changes to the comments in the sample configuration have also been made, mostly bringing up to date matters like the versions of BIND still being actively supported by ISC.

Those who use an explicit root hints file may want to note that a new version was issued in early June, adding an IPv6 address to B.ROOT-SERVERS.NET. The copy at https://jackdaw.cam.ac.uk/ipreg/nsconfig/db.cache was updated.

SRV records

2014-07-14 - News - Chris Thompson

There is now a service_ops web page available which allows authorised users to create service (SRV) records in the DNS for names in the domains to which they have access. See the service_ops help page for more details.

SSHFP records

2014-07-08 - News - Chris Thompson

SSH fingerprint records can now be added to the DNS via the IP registration database. Such records are described in RFC 4225 as updated by RFC 6594.

More information can be found on the IP Register SSHFP documentation page

DHCP-related data in the IP registration database

2014-07-07 - News - Chris Thompson

Many users of the IP registration database web interface will have noticed the appearance some time ago of mac and dhcp_group fields on the single_ops page, as well as related changes visible via the table_ops page.

These were intended in the first instance for maintaining a DHCP service for internal use in the UCS. It was perhaps unwise of us to make them visible outside and raise users' expectations prematurely. It remains a work in progress, and we have had to make changes of detail that affected some of those who had set these fields. The notes here describe the current state.

Although the single_ops page doesn't make this obvious, the mac and dhcp_group fields are properties of the IP address rather than the box. If a box or vbox has multiple IP addresses, each one can have its own values for them. The fields are cleared automatically when the IP address is rescinded.

MAC addresses can be entered in any of the usual formats but are displayed as colon-separated. Because the intent is to support DHCP servers, MAC addresses (if set) are required to be unique within any particular mzone/lan combination. A non-null dhcp_group value is intended to indicate non-default DHCP options. To support automated processing, it must correspond to a registered dhcp_group object for the given mzone/lan which can be created, modified or deleted via table_ops. The values should contain only alphanumeric, hyphen and underline characters.

The degree to which any of this is of use to users outside the UIS is currently very limited. We do intend to add more usability features, though.

Representing network access controls in the database

2014-05-20 - News - Chris Thompson

The scheme described in news item 2008-12-15 has been reworked to represent a larger number of references to specific IP addresses from the various parts of the CUDN infrastructure. The intention remains the same: to prevent such IP addresses being rescinded or reused without appropriate changes being made to the CUDN configuration.

There are now four "anames" used instead of three:

  • janet-filter.net.private.cam.ac.uk for exceptions at the CUDN border routers, often permitting some network traffic that would otherwise be blocked. This is essentially the same as the old janet-acl.net.private.cam.ac.uk which is temporarily an alias.

  • cudn-filter.net.private.cam.ac.uk for exceptions at internal CUDN routers. This includes the old high-numbered port blocking, where it is still in use, but also many other sorts of exception which were previously not represented. The old name cudn-acl.net.private.cam.ac.uk is temporarily an alias.

  • cudn-blocklist.net.private.cam.ac.uk for addresses for which all IP traffic is completely blocked, usually as the result of a security incident. This is essentially the same as the old block-list.net.private.cam.ac.uk which is temporarily an alias.

  • cudn-config.net.private.cam.ac.uk for addresses that are referred to in the CUDN routing infrastructure. This is completely new.

Both IPv4 and IPv6 addresses may appear in these lists (although at the moment only cudn-config has any IPv6 addresses).

Requests for the creation or removal of network access control exceptions, or explanations of existing ones, should in most cases be sent to network-support@uis.cam.ac.uk in the first instance, who will redirect them if necessary. However, the CERT team at cert@cam.ac.uk are solely responsible for the cudn-blocklist contents in particular.

Upgrade to BIND 9.9.5 may possibly cause problems

2014-02-18 - News - Chris Thompson

The most recently released BIND versions (9.9.5, 9.8.7, 9.6-ESV-R11) have implemented a more pedantic interpretation of the RFCs in the area of compressing responses. It is just possible that this will cause problems for some resolving software. ISC have written a Knowledge Base article about it, which can be found at https://kb.isc.org/article/AA-01113

In particular, the name in the answer section of a response may now have a different case from that in the question section (which will always be identical to that in the original query). Previously they would (after decompression) have been identical. Resolvers are meant to use case- insensitive comparisons themselves, but this change could expose non- conformance in this area.

However, experiments we have performed so far, and information from the DNS community at large, suggests that such non-conformance is quite rare. We are therefore planning to upgrade the CUDN central nameservers (both authoritative and recursive) to BIND 9.9.5 over the next few days. Please keep an eye out for any problems that might be caused by the change, and let us (hostmaster at ucs.cam.ac.uk) know as soon as possible, while we still have the option of backing off.

The "consolidated reverse zone" in-addr.arpa.cam.ac.uk is now signed

2014-01-26 - News - Chris Thompson

In November, I wrote to this list (cs-nameservers-announce):

As regards DNSSEC validation, cl.cam.ac.uk now has a chain of trust from the root zone. We expect that 232.128.in-addr.arpa will also have one before long.

This last happened earlier in January. That made it sensible to sign the "consolidated reverse zone" in-addr.arpa.cam.ac.uk which provides reverse lookup results for IPv4 addresses in the range 128.232.[128-255].x. This has now been done, and the results of such reverse lookup can be fully validated using chains of trust from the root zone.

There is more information at

https://jackdaw.cam.ac.uk/ipreg/nsconfig/dnssec-signed.html
  • Moved to https://www.dns.cam.ac.uk/domains/signed.html

    https://jackdaw.cam.ac.uk/ipreg/nsconfig/consolidated-reverse-zones.html

  • Moved to https://www.dns.cam.ac.uk/domains/reverse/

which have been brought up to date.

Computer Laboratory zones are now signed

2013-11-19 - News - Chris Thompson

First, a note that the IPv6 reverse zone delegated to the Computer Laboratory, 2.0.2.1.2.0.0.3.6.0.1.0.0.2.ip6.arpa, has been added to the list of zones in https://jackdaw.cam.ac.uk/ipreg/nsconfig/sample.named.conf that can be slaved stealthily within the CUDN. Some of the commentary in that file has also been brought up to date.

The main news is that the zones

  • cl.cam.ac.uk
  • 232.128.in-addr.arpa
  • 2.0.2.1.2.0.0.3.6.0.1.0.0.2.ip6.arpa

are now all signed. They are therefore much larger than before, and have larger and more frequent incremental updates. Those who are slaving them may need to be aware of that.

As regards DNSSEC validation, cl.cam.ac.uk now has a chain of trust from the root zone. We expect that 232.128.in-addr.arpa will also have one before long. The IP reverse zone has DS (delegation signer) records in 1.2.0.0.3.6.0.1.0.0.2.ip6.arpa, but that itself can be validated only via the dlv.isc.org lookaside zone, as JANET have not yet signed its parent zone 0.3.6.0.1.0.0.2.ip6.arpa (despite an 18-month-old promise on their part).

New xlist_ops web page

2013-07-01 - News - Chris Thompson

There is a new xlist_ops page that provides more general list operations on the IP registration database than does the list_ops page. In particular it allows downloads of lists of boxes, vboxes, cnames or anames, and uploads to perform bulk operations on multihomed boxes, vboxes, cnames or (for registrars only) anames. See the xlist_ops help page for details.

The opportunity has been taken to make a number of other small modifications. The order of links in the standard page header has been altered, and multihome_ops has been relegated to a link from the preferred box_ops page.

Removing some registrations in dlv.isc.org

2012-07-08 - News - Chris Thompson

The following will be relevant primarily to those who are performing DNSSEC validation.

It will soon be the second anniversary of the date on which the root zone was signed, 15 July 2010. By now, everyone seriously into the business of DNSSEC validation should be using a trust anchor for the root zone, whether or not they also use lookaside validation via a trust anchor for dlv.isc.org. The latter facility was always meant to be an aid to early deployment of DNSSEC, not a permanent solution. While it remains useful to cover the many unsigned gaps in the tree of DNS zones, it no longer seems appropriate to have dlv.isc.org entries for DNS zones that can be validated via a chain of trust from the root zone.

Therefore, on or about 15 July 2012, we shall be dropping the entries for the two zones cam.ac.uk and 111.131.in-addr.arpa from the dlv.isc.org zone, as these have now have had chains of trust from the root zone for well over a year. We will be retaining the entries for a number of our signed reverse zones whose parent zones are not yet signed - for details see

https://jackdaw.cam.ac.uk/ipreg/nsconfig/dnssec-signed.html

Phasing out of the old IPv6 address block

2012-02-07 - News - Chris Thompson

Nearly all IPv6 use within the CUDN has now been transferred from the old block 2001:630:200::/48 to the new block 2001:630:210::/44. Therefore the following changes have been made to the sample configuration for stealth servers at https://jackdaw.cam.ac.uk/ipreg/nsconfig/sample.named.conf

  1. The old block has been dropped from the definition of the "camnets" ACL.

  2. The reverse zone 0.0.2.0.0.3.6.0.1.0.0.2.ip6.arpa has been dropped from the list of those which may be slaved.

We advise you to make the corresponding changes in your nameserver configurations if they are relevant.

New BIND vulnerability CVE-2011-4313

2011-11-16 - News - Chris Thompson

ISC have issued the BIND advisory

http://www.isc.org/software/bind/advisories/cve-2011-4313

It concerns a bug, thought to be a remotely exploitable, that crashes recursive nameservers, and they have provided new BIND versions (9.4-ESV-R5-P1, 9.6-ESV-R5-P1, 9.7.4-P1, 9.8.1-P1) which are proof against crashing from this cause, although the precise sequence of events that leads to it remains obscure.

Although we are not aware of any local nameservers that have been affected by this problem, several other sites have been badly affected in the last 24 hours.

The CUDN central recursive nameservers at 131.111.8.42 & 131.111.12.20 are now running BIND 9.8.1-P1.

IPv6 addresses of the CUDN central nameservers

2011-08-23 - News - Chris Thompson

The IPv6 routing prefixes for the vlans on which the CUDN central nameservers are located are being altered. As a result, their IPv6 addresses are changing as follows:

               Old IPv6 address          New IPv6 address
authdns0.csx   2001:630:200:8080::d:a0   2001:630:212:8::d:a0
authdns1.csx   2001:630:200:8120::d:a1   2001:630:212:12::d:a1
recdns0.csx    2001:630:200:8080::d:0    2001:630:212:8::d:0
recdns1.csx    2001:630:200:8120::d:1    2001:630:212:12::d:1

The new addresses are working now, and the old addresses will continue to work as well until Monday 5 September, when they will be removed. If you are using them (e.g. in nameserver or stub resolver configuration files) you should switch to the new addresses (or the IPv4 ones) before then.

The comments in the sample configuration file

https://jackdaw.cam.ac.uk/ipreg/nsconfig/sample.named.conf

about using IPv6 addresses to address the nameservers have been modified appropriately.

Two ISC advisories about BIND

2011-07-05 - News - Chris Thompson

ISC have issued two BIND advisories today:

http://www.isc.org/software/bind/advisories/cve-2011-2464

This affects most still-supported versions of BIND. A suitably crafted UPDATE packet can trigger an assertion failure. Apparently not yet seen in the wild...

http://www.isc.org/software/bind/advisories/cve-2011-2465

This affects only users of Response Policy Zones in 9.8.x.

Fixed versions are 9.6-ESV-R4-P3, 9.7.3-P3 and 9.8.0-P4.

New IPv6 address block for Cambridge

2011-06-21 - News - Chris Thompson

You may be aware that we have been negotiating with JANET for a larger IPv6 address block. These negotiations have (eventually) been successful. We are being allocated 2001:630:210::/44, and the existing use of 2001:630:200::/48 will be phased out over (we hope) the next few months. Details of how the new space will be divided up will be available from Networks in due course.

As immediate consequences, the following changes have been made to http://jackdaw.cam.ac.uk/ipreg/nsconfig/sample.named.conf:

  • The "camnets" ACL has 2001:630:210::/44 added to it.

  • The reverse zone "1.2.0.0.3.6.0.1.0.0.2.ip6.arpa" is listed as available for (stealth) slaving.

Of course, the reverse zone has nothing significant in it yet! But if you are slaving the existing IPv6 reverse zone, you should probably start slaving the new one as well.

There will of course be other changes during the transition that may affect local nameserver administrators. In particular the IPv6 addresses

of the CUDN central authoritative and recursive nameservers will change at some point: this list will be informed before that happens.

A few minor issues while I have your attention:

  1. The zone amtp.cam.ac.uk (old name for damtp.cam.ac.uk) is no longer delegated, and is about to vanish entirely. If you are still slaving it even after the message here on 9 March, now is the time to stop.

  2. There has been another small change to the official root hints file ftp://ftp.internic.net/domain/named.cache, and the copy at http://jackdaw.cam.ac.uk/ipreg/nsconfig/db.cache has been updated accordingly. The change is the addition of an IPv6 address for d.root-servers.net, and rather appropriately it was made on "IPv6 day".

  3. My description of the BIND vulnerability CVE-2011-1910 was defective in two directions:

    • It isn't necessary to have DNSSEC validation turned on to be vulnerable to it.

    • On the other hand, only moderately recent versions of BIND are vulnerable: old enough ones are not.

    The information at

    http://www.isc.org/software/bind/advisories/cve-2011-1910
    

    about which versions are affected is accurate (bearing in mind that some OS vendors make their own changes without altering the version number). If you are compiling from source, I can advise you on the code fragment to look for.

BIND high severity advisory CVE-2011-1910

2011-05-27 - News - Chris Thompson

ISC have issued a high severity advisory

http://www.isc.org/software/bind/advisories/cve-2011-1910

and several fixed BIND versions are now available (9.4-ESV-R4-P1, 9.6-ESV-R4-P1, 9.7.3-P1, 9.8.0-P2).

This bug can only be exercised if DNSSEC validation is turned on, but that is increasingly becoming the default setup these days.

New box_ops web page

2011-05-22 - News - Chris Thompson

There is a new box_ops page which can be used as an alternative to the multihome_ops page to manipulate the registrations of hosts ("boxes" in the terminology of the IP registration database) with more than one IP address.

Its functions and display are simpler than those of multihome_ops and more in line with those of the other web pages. Unlike multihome_ops it supports the addition or removal of IPv6 addresses (if any are assigned to the user's management zones) as well as IPv4 ones. However, it is lacking some of the facilities available with multihome_ops such as: using wildcards with display, selecting by address, and displaying detailed properties of the associated IP address objects.

We hope to add at least some of these facilities to box_ops (and to other pages, such as vbox_ops) in due course, and to eliminate the necessity to keep mutihome_ops in its current form. The main reason for releasing box_ops now in this somewhat undeveloped state is its support for IPv6 addresses.

Changes to sample.named.conf for delegated Maths domains

2011-03-09 - News - Chris Thompson

There is a new version of the sample nameserver configuration at

http://jackdaw.cam.ac.uk/ipreg/nsconfig/sample.named.conf

The following zones, which were not previously delegated, have been added:

  • maths.cam.ac.uk
  • statslab.cam.ac.uk
  • 20.111.131.in-addr.arpa

The following zone, which is being phased out, has been removed:

  • amtp.cam.ac.uk

There are no other changes.

Consolidated reverse zone in-addr.arpa.cam.ac.uk

2011-03-03 - News - Chris Thompson

We have progressed past step (2), as in:

  1. If all goes well, during the first week in March we will get the delegations of the 32 zones replaced by DNAMEs in the parent zone 232.128.in-addr.arpa.

with thanks to the Computer Lab hostmaster for his co-operation. We have no reports of any problems at this stage.

The sample nameserver configuration

https://jackdaw.cam.ac.uk/ipreg/nsconfig/sample.named.conf

has been updated to remove the 32 zones [224-255].232.128.in-addr.arpa from the list that may be slaved. (Apart from some modifications to the comments before "in-addr.arpa.cam.ac.uk", that is the only change.)

If you are slaving any or all of these 32 reverse zones, you should stop doing so now. Sometime next week we will start logging such slaving activity, and alert the administrators of any hosts involved.

The target date for step (3), the complete removal of these 32 reverse zones, remains Monday 14 March.

Consolidated reverse zone in-addr.arpa.cam.ac.uk

2011-02-25 - News - Chris Thompson

We performed step (1) -

  1. On Monday 21 February we will replace the the 32 zones [224-255].232.128.in-addr.arpa by versions using DNAMEs that indirect into in-addr.arpa.cam.ac.uk.

on Monday as planned, but had to back off for 12 out of the 32 zones (those covering PWF subnets) because of a problem with a local script used in the PWF re-imaging process. This has now been fixed, and all 32 zones are using indirecting DNAMEs again.

At present we do not think that this delay will significantly affect the schedule for steps (2) and (3). If you are experiencing any problems which you think might be related to these changes, please contact hostmaster at ucs.cam.ac.uk as soon as possible.

Consolidated reverse zone in-addr.arpa.cam.ac.uk

2011-02-16 - News - Chris Thompson

We are planning to extend the IP address range covered by the consolidated reverse zone in-addr.arpa.cam.ac.uk, described here last November, to include 128.232.[224-255].x. The web page

http://jackdaw.cam.ac.uk/ipreg/nsconfig/consolidated-reverse-zones.html

has been updated with the planned schedule and some new advice for users of Windows DNS Server.

To summarise:

  1. On Monday 21 February we will replace the the 32 zones [224-255].232.128.in-addr.arpa by versions using DNAMEs that indirect into in-addr.arpa.cam.ac.uk.

  2. If all goes well, during the first week in March we will get the delegations of the 32 zones replaced by DNAMEs in the parent zone 232.128.in-addr.arpa.

  3. If all still goes well, we plan to remove the 32 zones [224-255].232.128.in-addr.arpa completely on Monday 14 March.

The schedule is rather tight because we want to complete this work during full term if possible. If there have to be substantial delays, some of the later steps will be postponed until after Easter.

BIND users who want to slave zones providing reverse lookup for substantially the whole CUDN should slave "in-addr.arpa.cam.ac.uk" and "232.128.in-addr.arpa" (the latter from the CL nameservers) if they are not already doing so, and they should cease slaving the 32 zones [224-255].232.128.in-addr.arpa after step (2) but before step (3). [There will be a further announcement here when step (2) has been completed.]

Windows DNS Server users should note that we no longer recommend that they should stealth slave any zones, see

http://www-tus.csx.cam.ac.uk/windows_support/Current/activedirectory/dns/dnssec.html

If you do feel you must continue such stealth slaving, the earlier link contains advice about which versions support zones

containing DNAMEs and which do not. In particular, those using Windows 2003 or 2003R2 should cease slaving any of the zones

[224-255].232.128.in-addr.arpa as soon as possible, before step (1).

Consolidated reverse zone in-addr.arpa.cam.ac.uk

2010-11-26 - News - Chris Thompson

The sample nameserver configuration at

http://jackdaw.cam.ac.uk/ipreg/nsconfig/sample.named.conf

has been updated to include the zone in-addr.arpa.cam.ac.uk. Until recently this was contained within the cam.ac.uk zone, but it is now a separate (unsigned) delegated zone. It currently provides the reverse lookup records for IP addresses in the range 128.232.[128-223].x but we hope to extend that to cover the whole of 128.232.[128-255].x eventually.

A description of the zone and our plans for it can be found at

http://jackdaw.cam.ac.uk/ipreg/nsconfig/consolidated-reverse-zones.html

Please be reassured that there will be further announcements here (and probably elsewhere) before the extension to cover 128.232.[224-255].x is implemented.

Signed root zone

2010-07-19 - News - Chris Thompson

As expected, the DNS root zone became properly signed on 15 July. See http://www.root-dnssec.org/ for details.

A trust anchor for the root zone has now been added to the configuration of the CUDN central recursive nameservers (at 131.111.8.42 & 131.111.12.20), in addition to the existing one for dlv.isc.org used for "lookaside validation". There is no immediate prospect of being able to drop the latter, as there are still huge gaps in the signed delegation tree (the "ac.uk" zone, for example).

For those running their own validating recursive nameservers, the pages

https://jackdaw.cam.ac.uk/ipreg/nsconfig/dnssec-validation.html
https://jackdaw.cam.ac.uk/ipreg/nsconfig/dnssec-testing.html

have been updated with some relevant information.

New root hints file, and validating DNSSEC-signed zones

2010-06-21 - News - Chris Thompson

A new version of the root zone hints file has been published, and http://jackdaw.cam.ac.uk/ipreg/nsconfig/db.cache has been updated with a copy. The substantive change is the addition of an IPv6 address for i.root-servers.net. As usual with such changes, there is little urgency to update your copies.

The rest of this posting is about validating DNSSEC-signed zones.

ICANN have held their first "key signing ceremony" and appear to be on target to sign the root zone on Thursday 15 July. See http://www.root-dnssec.org/ for details. We expect to be including a trust anchor for the signed root zone on the CUDN central recursive nameservers (131.111.8.42 and 131.111.12.20) shortly after it is available.

If you are operating a validating nameserver, there are issues about the supported signing algorithms. There are currently three important ones:

Mnemonic    Code    Supported by    Can be used with which
                  BIND versions[1]    negative responses

RSASHA1        5       9.4          Only zones using NSEC
NSEC3RSASHA1   7       9.6          Zones using NSEC or NSEC3[2]
RSASHA256      8    9.6.2 or 9.7    Zones using NSEC or NSEC3

[1] or later.

[2] but as NSEC3RSASHA1 is otherwise identical to RSASHA1, it is almost invariably used with zones using NSEC3 records.

Zones signed only with algorithms unsupported by particular software will be treated by them as unsigned.

Only RSASHA1 is officially mandatory to support according to current IETF standards, but as the intention is to sign the root zone with RSASHA256, it will become effectively mandatory as well. (Other organisations are already assuming this. For example, Nominet have signed the "uk" top-level domain using RSASHA256, although they do not intend to publish a trust anchor for it other than by having a signed delegation in the root zone.)

Therefore, if you want to be able to use a trust anchor for the root zone you will need software that supports the RSASHA256 algorithm, e.g. BIND versions 9.6.2 / 9.7 or later. As an aid for checking this, the test zone dnssec-test.csi.cam.ac.uk is now signed using RSASHA256. For details on how to test, see http://jackdaw.cam.ac.uk/ipreg/nsconfig/dnssec-testing.html

There are no immediate plans to change the algorithm used to sign our production DNS zones from RSASHA1.

cs-nameservers-announce copies on ucam.comp.tcp-ip newsgroup

2010-06-09 - News - Chris Thompson

On Sep 30 2008, I wrote:

It has been the practice to post copies of messages posted to the cs-nameservers-announce mailing list to the local newsgroup ucam.comp.tcp-ip.

The local newsgroups ucam.* are expected to be phased out before long, so I propose that we discontinue this practice. If anyone feels differently, please let us know.

At that time, we received pleas to continue the copies to the ucam.comp.tcp-ip newsgroup for as long as it remained in existence (which has in fact been much longer than was then anticipated). However, its demise now really is imminent, see e.g.

http://ucsnews.csx.cam.ac.uk/articles/2010/03/30/newsgroups-and-bulletin-boards

Therefore I have removed the references to ucam.comp.tcp-ip from the mailing list description and from the sample.named.conf file, and this message will be the last one copied to the newsgroup.

Updates to sample.named.conf

2010-04-26 - News - Chris Thompson

There is a new sample configuration for nameservers on the CUDN at

http://jackdaw.cam.ac.uk/ipreg/nsconfig/sample.named.conf

The following changes have been made:

Some references relating to DNSSEC validation have been added. For more details, though, consult as before

http://jackdaw.cam.ac.uk/ipreg/nsconfig/dnssec-validation.html

A recommended setting for "max-journal-size" is included. Without this, the journal files for incrementally updated zones will grow indefinitely, and for signed zones in particular they can become extremely large.

The most significant change concerns the zone private.cam.ac.uk. Previously, there was no delegation for this zone in cam.ac.uk. However, we have found that with the most recent versions of BIND, defining private.cam.ac.uk as either "type stub" or "type forward" in combination with using DNSSEC validation, led to validation failures due to BIND's inability to prove private.cam.ac.uk unsigned while cam.ac.uk is signed.

On consideration, we have decided to create a delegation for private.cam.ac.uk after all. (The only effect for users outside the CUDN should be that they will consistently get a REFUSED response for names in that zone, instead of sometimes getting NXDOMAIN instead.) This will also allow us to increase the number of official nameservers for private.cam.ac.uk (within the CUDN, obviously), and perhaps to sign it without having to advertise a trust anchor for it by special means.

Nameservers on the CUDN should therefore either slave private.cam.ac.uk, or not define it at all in their configuration. (Using "type stub" or "type forward" will continue to work for non-validating servers, but should be phased out.)

However, our corresponding reverse zones 16.172.in-addr.arpa through 30.172.in-addr.arpa cannot be delegated from the parent zone "172.in-addr.arpa". Luckily there are delegations there to the IANA "black hole" (AS112) servers, and this suffices to make the zones provably unsigned. Any of "type slave", "type stub" or "type forward" can be used for these zones (with or without validation) and one of them must be used or reverse lookups will fail.

Rationale for the recent changes to recommended AD configurations

2010-02-03 - News - Chris Thompson

You will probably have seen Andy Judd's message to ucam-itsupport last Friday announcing new recommendations for Active Directory and Windows DNS Server configurations within the CUDN, described more fully at

http://www-tus.csx.cam.ac.uk/windows_support/Current/activedirectory/dns/ad_dns_config_info.html

These were the result of discussions between our PC Support group and our Hostmaster group. This message gives part of the background to our thinking, and some points may be relevant to institutions not using Windows DNS Server at all.

It will be no surprise that the advice not to ("stealth") slave zones from the CUDN central (authoritative) nameservers was motivated by the deficiencies of the various versions of Windows DNS Server when slaving signed zones (not to mention other defects in its treatment of unknown DNS record types and SOA serial number handling). Not slaving zones such as cam.ac.uk does have the disadvantage that resolving of names and addresses of hosts local to the institution may fail if it is cut off from the rest of the CUDN, but we think this should be tolerated because of the other advantages.

The advice to forward requests not resolved locally to the CUDN central (recursive) nameservers may seem contrary to advice given in https://jackdaw.cam.ac.uk/ipreg/nsconfig/sample.named.conf and in previous messages to this list. In the case of Windows DNS Server configurations the primary intent was to make sure that queries for names in private.cam.ac.uk and corresponding reverse lookups worked correctly. (Configured laboriously via GUI, many zones were omitted from Windows DNS Server setups in the past.) However, there is the more general point that the central servers provide DNSSEC validation for the increasing proportion of names for which it is available, and forwarding requests to them takes advantage of that if validation is not being performed locally. We should admit, though, that the communication path between the institution and the CUDN central nameservers is not yet secured cryptographically. (If there is a fully functional validating recursive nameserver local to the institution, that could of course be used instead of the CUDN central nameservers.)

Another issue is the likelihood that we will be changing the set of reverse zones available for slaving during the next year. In particular we are likely to want to extend the scheme described at http://people.pwf.cam.ac.uk/cet1/prune-reverse-zones which we are already using for reverse lookup of the 128.232.[128-223].x range to cover 128.232.[224-255].x as well, eliminating the 32 individual zones used for the latter range at present.

Unicode in the IP registration database

2009-12-09 - News - Chris Thompson

(These changes first went into service on 2009-12-07, but had to withdrawn due to problems with uploads, in particular. They are now back in service.)

When the Jackdaw Oracle server was moved to new hardware and upgraded to Oracle 10 earlier this year, the opportunity was taken to change the encoding it uses for character data from ISO Latin 1 to UTF-8. However, little change will have been apparent at the user interfaces, because translation from and to ISO Latin 1 were made for input and output.

This has now been changed so that all interfaces use UTF-8. In particular, the IP registration web pages now use UTF-8 encoding, and so do files downloaded from the list_ops page. Files uploaded should also be in UTF-8: invalid encodings (such as might be caused by using the ISO Latin 1 encoding instead) will be replaced by Unicode replacement characters '�' (U+FFFD).

Only those fields that are allowed to contain arbitrary text (such as equipment, location, owner, sysadmin, end_user, remarks) are affected by this change. Values (the great majority) that are in 7-bit ASCII will not be affected because it is the common subset of ISO Latin 1 and UTF-8.

We have identified a few values in the IP registration data which have suffered the unfortunate fate of being converted from ISO Latin 1 to UTF-8 twice. We will be contacting the relevant institutional COs about them.

Problem with SOA serial numbers and Windows DNS Server

2009-09-29 - News - Chris Thompson

In conjunction with PC Support we suggest the following guidelines for dealing with Windows DNS servers in the wake of the SOA serial number wrap-around:

All zones which are copied from any of the UCS servers (cam.ac.uk, private.cam.ac.uk, and the reverse zones) need to be refreshed so they have a serial number which starts 125... rather than 346... The serial number can be found in the Start of Authority tab for the zones properties.

To refresh the zones try the following steps;

  1. In a DNS MMC select the DNS server, right click and select clear cache. For any zone you copy, right click and select Transfer from Master. Check the serial number for the zone once it has loaded.

    If the serial number hasn't been updated you may have tried too soon, wait a couple more minutes and try again. However if after ten minutes it hasn't updated you can also try;

  2. If the serial number hasn't been updated delete the zone, clear the cache and re-create. Check the serial number once it has fully loaded.

  3. Final resort: delete the zone, clear the cache, delete the files from C:\Windows\System32\DNS then re-create.

In most cases methods 1 or 2 will work.

For those with older copies of notes from the Active Directory course which are being used as reference, don't. You should check your configuration information at the following locations.

http://www-tus.csx.cam.ac.uk/windows_support/Current/activedirectory/dns/configureserver.html

http://www-tus.csx.cam.ac.uk/windows_support/Current/activedirectory/dns/dnssec.html

Incidentally, Windows 2008 DNS Server is not immune to the problem (but method 1 above should normally work for it).

Problem with SOA serial numbers and Windows DNS Server

2009-09-28 - News - Chris Thompson

Last Saturday (26 September) we started to change SOA serial numbers for the zones managed by the Computing Service from "seconds since 1900" to "seconds since 1970" (the latter being familiar as the POSIX time_t value). We had made sure that this was an increase in RFC 1982 (published August 1996) terms. No version of BIND has any problem with this.

Unfortunately, we did not foresee that many versions of Windows DNS Server (apparently even those as late as Windows 2003 R2) cannot cope with this change, repeatedly attempting to transfer the zone at short intervals and discarding the result. We are seeing a great deal of churning on our authoritative nameservers as a result. (This affects servers that are fetching from 131.111.12.73 [fakedns.csx.cam.ac.uk] as well.)

It is too late for us to undo this change. If you are running Windows DNS Server and are failing to fetch cam.ac.uk and similar DNS zones, you should discard your existing copy of the zone(s). Andy Judd advises us that you "need to delete the zone in a DNS MMC and then delete the zone files from C:\Windows\System32\dns and C:\Windows\System32\dns\backup, then re-create the zone". Please ask Hostmaster and/or PC Support for assistance if necessary.

We shall be contacting the administrators of the hosts that are causing the most continuous zone-fetching activity on our servers.

Two reverse zones to be signed on Tuesday 29 September

2009-09-24 - News - Chris Thompson

We judge that the signing of the DNS zone cam.ac.uk since 3 August has been a success. We intend to start signing two reverse zones

111.131.in-addr.arpa
0.0.2.0.0.3.6.0.1.0.0.2.ip6.arpa

next Tuesday morning, 29 September 2009.

For those who stealth slave either or both of these zones, but cannot cope with signed zones, unsigned versions will remain available from fakedns.csx.cam.ac.uk [131.111.12.73]. Other relevant information may be found via the DNSSEC-related links on

https://jackdaw.cam.ac.uk/ipreg/nsconfig/

In future, we may not always announce when particular zones are expected to become signed.

Any problems should be referred to hostmaster@ucs.cam.ac.uk

cam.ac.uk to be signed on Monday 3 August

2009-07-31 - News - Chris Thompson

We intend to make cam.ac.uk a signed DNS zone next Monday morning, 3 August 2009. We believe that those most likely to be adversely affected are the Windows DNS Server clients within the CUDN that are slaving it. The following is taken from

http://jackdaw.cam.ac.uk/ipreg/nsconfig/dnssec-windows.html

which we will update in the light of experience.


Only Windows 2008 R2 is practically trouble-free in this context. Earlier versions will generate very large numbers of messages in the system log about unknown record types, and may not result in a usable copy of the zone.

However, with Windows 2003 R2 or Windows 2008 you can use the registry option described at

(using the 0x2 setting) and this should allow you to slave a signed zone, although not actually to use the signatures.

For other versions, or in any case if problems arise, you can slave the zone from 131.111.12.73 [fakedns.csx.cam.ac.uk] instead of from 131.111.8.37 and/or 131.111.12.37. This server provides unsigned versions of all the zones described as available for slaving from

the latter addresses in

http://jackdaw.cam.ac.uk/ipreg/nsconfig/sample.named.conf

for transfer to clients within within the CUDN. It should not be used for any other purpose.


Any problems should be referred to hostmaster@ucs.cam.ac.uk.

BIND security alert

2009-07-29 - News - Chris Thompson

If you are using BIND and are not already aware of it, please see the security advisory at https://www.isc.org/node/474

This is high severity denial-of-service bug which is being exploited in the wild. Nameservers are vulnerable if

  • They have any zone of "type master", whose name is known to the attacker. Note that this includes zones such as "localhost" (but apparently not BIND's generated "automatic empty zones").

  • The attacker can get a DNS update request through to the server. For example, those with a port 53 block at the CUDN border router can be attacked (directly) only from within the CUDN. Access controls within BIND cannot protect against the vulnerability.

Those who use versions of BIND supplied with their operating system should look for advisories from their respective suppliers.

SOA serial numbers in UCS-maintained zones

2009-07-21 - News - Chris Thompson

This should only be of concern to those who look at SOA serial numbers for diagnostic information. Up to now we have used the

<4-digit-year><2-digit-month><2-digit-day><2-more-digits>

format for the zones the Computing Service maintains. We are about to switch to using "seconds since 1900-01-01" (not 1970-01-01, because we need the change to be an increase, in RFC 1982 terms). This is part of the preparations for using DNSSEC-signed zones, where some SOA serial increases are imposed by BIND as part of the re-signing operations.

All of our zones now contain an HINFO record at the apex which contains version information in the old format; e.g.

$ dig +short hinfo cam.ac.uk
"SERIAL" "2009072120"

We expect these to remain a human-readable version indication, although not necessarily in exactly this format.

More about DNSSEC validation, and signing the cam.ac.uk zone

2009-07-01 - News - Chris Thompson

The web page describing how to set up DNSSEC validation on your own recursive nameservers, using the dlv.isc.org lookaside validation zone dlv.isc.org, has been updated and is now at

http://jackdaw.cam.ac.uk/ipreg/nsconfig/dnssec-validation.html

We continue to make progress towards signing cam.ac.uk. The previous signed near-clone "cam.test" will be removed at the end of this week. Instead we have a new such zone "dnssec-test.csi.cam.ac.uk" which is properly delegated and registered at dlv.isc.org. Instructions on how to slave it or validate against it are at

http://jackdaw.cam.ac.uk/ipreg/nsconfig/dnssec-testing.html

We have had almost no feedback so far. We would like to hear from anyone who has successfully slaved it, but even more from those who tried and failed. We believe that much old nameserver software will be unable to cope, and expect to have to provide "dumbed-down" unsigned versions of the signed zones for such clients. We need to estimate how large the demand will be for such a service.

Recursive nameservers using DNSSEC validation

2009-06-16 - News - Chris Thompson

We have been using DNSSEC validation on the main recursive nameservers at

131.111.8.42   or 2001:630:200:8080::d:0
131.111.12.20  or 2001:630:200:8120::d:1

since the morning of Tuesday 9 June, and no significant problems have arisen. We now expect this state to persist indefinitely.

Therefore, will all those who kindly assisted us by pointing their resolvers at the testing validating nameservers please switch back to using the regular ones. We shall be monitoring the use of the testing addresses and in due course contacting those who are still using them. Eventually they will be reused for other testing purposes.

More about DNSSEC validation, and signing the cam.ac.uk zone

2009-05-28 - News - Chris Thompson

Further to the request posted on 6 May to try using the testdns*.csi validating nameservers (and with thanks to the few who did so!) there have been some queries as to how you can configure DNSSEC validation in your own recursive nameservers. There are some notes on that here:

http://people.pwf.cam.ac.uk/cet1/dnssec-validation.html

As a separate but related exercise, we plan to sign our own zones, starting with cam.ac.uk, as soon as we can. To investigate the problems involved, we have set up a signed almost-clone of cam.ac.uk, called cam.test, and made it available in various ways within the CUDN. Some of the things you could try doing with it are described here:

http://people.pwf.cam.ac.uk/cet1/signed-cam.html

[The fact that these web pages are in a personal space rather than in, say, http://jackdaw.cam.ac.uk/ipreg/ emphasizes their temporary and provisional nature. Please don't let that stop you reading them!]

Recursive nameservers using DNSSEC validation available for testing

2009-05-06 - News - Chris Thompson

We are hoping to turn on DNSSEC ("Secure DNS") validation in our main central recursive nameservers within the next few weeks. We have set up testing nameservers with essentially the expected configuration, and Computing Service staff have already been exercising them. You are now invited to do the same: details can be found at

http://people.pwf.cam.ac.uk/cet1/dnssec-testing.html

Use of DNAMEs for reverse lookup of CUDN addresses

2009-03-18 - News - Chris Thompson

We have started to use a scheme involving DNAMEs (domain aliases) for the reverse lookup up of some IP addresses within the CUDN. The primary motivation is to reduce the number of individual reverse zones. A description of the mechanism, written for an audience not restricted to the university, can be found in

http://people.pwf.cam.ac.uk/cet1/prune-reverse-zones
  • Moved to https://www.dns.cam.ac.uk/domains/reverse/

At the moment we are using this method for these address ranges:

  • 192.153.213.*
  • 192.84.5.*

    • these subnets are or will be used for CUDN infrastructure (although within the CUDN, the corresponding reverse zones are not listed in the sample.named.conf configuration)
  • 128.232.[128-223].*

    • some of this address space will be used for Eduroam

Some nameserver software (especially Windows DNS Server) may be unable to cope with zones containing DNAMEs: they will have to avoid stealth slaving (for example) 232.128.in-addr.arpa. We don't believe that any stub resolvers fail to cope with the "synthesised CNAMEs" generated from DNAMEs, although at least some versions of the glibc resolver log warning messages about the DNAME (but give the right answer anyway). If anyone experiences problems as a result of what we are doing, please let us know.

In the light of experience, we may later extend this scheme to other address ranges, e.g. 128.232.[224-255].* which is currently covered by 32 separate reverse zones. However, we will give plenty of warning before making such a change.

Balancing the use of CUDN central recursive nameservers

2009-03-18 - News - Chris Thompson

The Computing Service provides two general-purpose recursive nameservers for use within the CUDN, at IPv4 addresses 131.111.8.42 and 131.111.12.20 (or IPv6 addresses 2001:630:200:8080::d:0 and 2001:630:200:8120::d:1).

Historically, there were compelling reasons to favour 131.111.8.42 over 131.111.12.20, and therefore to list them in that order in resolver configurations. The machine servicing 131.111.12.20 was severely overloaded and often had much poorer response times.

For the last two years, this has not been the case. The two services run on machines with equal power and for nearly all locations within the CUDN there is no reason to prefer one over the other. Since last September, one of them has been in our main machine room on the New Museums Site, and one at Redstone, providing improved physical redundancy.

However. we observe that the load on 131.111.8.42 is still several times that on 131.111.12.20, presumably as a result of the historical situation. For a while now we have been randomising the order in which the two addresses appear in the "nameservers:" line generated when the "register" or "infofor*" functions are used on the ipreg/single_ops web page, but we suspect that COs rarely act on that when actually setting up resolver configurations.

We would like to encourage you to do a bit of randomising yourselves, or even to deliberately prefer 131.111.12.20 to redress the current imbalance. If you have resolvers which support it, and you are configuring only these two addresses as nameservers, then you could sensibly use "options rotate" to randomise the order they are tried within a single host. (Unfortunately, this doesn't work well if you have a preferred local resolver and want to use the two CS nameservers only as backups.)

Frequency of DNS updates

2009-03-17 - News - Chris Thompson

This message, like the earlier ones referred to, was sent to ucam-itsupport at lists because it is of concern to all IPreg database updaters, not just to stealth slave administrators. However, it has been plausibly suggested that they ought to have been sent to cs-nameservers-announce at lists as well, if only so that they appear in its archives. Therefore, this one is being so sent!

Subsequent to the changes of schedule to every 12 hours (September) and every 6 hours (November), we have now made a further increase in the number of (potential) updates to our DNS zones. Currently the regular update job runs at approximately

01:00, 09:00, 13:00, 17:00 and 21:00

each day (the exact times are subject to variation and should not be relied upon). We are reserving the 05:00 slot, at which actual changes would be very rare, for other maintenance activity.

The "refresh" parameter for these zones has also been reduced from 6 hours to 4 hours: this is the amount by which stealth slaves may be out of date (in the absence of network problems). The TTL values for individual records remains 24 hours: this is how long they can remain in caches across the Internet.

Representing network access controls in the database

2008-12-15 - News - Chris Thompson

(Updated and partly obsoleted on 2014-05-20)

(Updated 2009-01-13)

Various exceptions to the general network access controls are applied at CUDN routers for some individual IP addresses. Some of these are at the border routers between the CUDN and JANET, and others at the individual CUDN routers interfacing to institutional networks.

We have implemented a scheme which we hope will enable us to keep better control over these exceptions. When an exception is created for a registered IP address, that address is added to one of the following anames

  • janet-acl.net.private.cam.ac.uk for exceptions at the border routers, usually permitting some network traffic that would otherwise be blocked,

  • cudn-acl.net.private.cam.ac.uk for exceptions at the local CUDN routers, usually allowing some use of high-numbered ports for those vlans for which such a restriction is imposed.

  • block-list.net.private.cam.ac.uk for addresses for which all IP traffic is completely blocked, usually as the result of a security incident.

As long as the attachment to the aname remains, it prevents the main registration from being rescinded. The intent is that this will result in the institutional COs requesting removal of the exception at that point.

If the IP address is not registered, then it is first registered as reserved.net.cam.ac.uk or reserved.net.private.cam.ac.uk as appropriate, and then processed as above. This prevents it being reused while the exception still exist. (Some of these cases are due to the fact that we did not have the scheme in the past, and there are several now-unregistered IP addresses whose exceptions were never removed.)

Note that this apparatus only deals with exceptions for individual IP addresses, not those for whole subnets.

Requests for the creation or removal of network access control exceptions should be sent to cert@cam.ac.uk.

cs-nameservers-announce copies on ucam.comp.tcp-ip newsgroup

2008-09-30 - News - Chris Thompson

It has been the practice to post copies of messages posted to the cs-nameservers-announce mailing list to the local newsgroup ucam.comp.tcp-ip. This is promised both in the descriptive text for the mailing list, and in the initial comments in

The local newsgroups ucam.* are expected to be phased out before long, so I propose that we discontinue this practice. If anyone feels differently, please let us know.

The archives of the mailing list are accessible to non-members, at

and there is no intention to change that.

pmms.cam.ac.uk zone should no longer be slaved

2008-08-03 - News - Chris Thompson

The zone pmms.cam.ac.uk has been removed from the list of zones that may be slaved given in

http://jackdaw.cam.ac.uk/ipreg/nsconfig/sample.named.conf

Historically, this zone was a clone of "dpmms.cam.ac.uk", but it is now essentially empty and will soon be removed entirely. If your nameserver currently slaves pmms.cam.ac.uk, you should remove it from its configuration file as soon as is convenient.

Independently, some comments have been added to the sample configuration file about IPv6 addresses that can be used as alternative to the IPv4 ones for fetching zones or forwarding requests, for those whose nameservers themselves have IPv6 connectivity.

Multiple DNS implementations vulnerable to cache poisoning

2008-07-09 - News - Chris Thompson

There has been a lot of recent publicity, some of it rather garbled, on this subject. Please refer to

for an authoritative account. The remainder of this note refers specifically to what to do if you are running a recursive nameserver using BIND. (Authoritative-only servers have [almost] no cache and are not affected.)

For full details, see http://www.isc.org/ , especially the links under "Hot Topics" - "Upgrade Now!". In summary, ISC have released the following new versions:

if you are using upgrade to or if you are prepared use a "beta" version BIND 9.5.x 9.5,0-P1 9.5.1b1 BIND 9.4.x 9.4.2-P1 9.4.3b2 BIND 9.3.x 9.3.5-P1 BIND 9.2.x (or earlier) - no fix available - time to move!

Note that the earlier round of changes in July last year (versions 9.2.8-P1, 9.3.4-P1, 9.4.1-P1, 9.5.0a6), that improve defences against cache poisoning by randomising query ids, are no longer considered adequate. The new fixes rework the randomisation of query ids and also randomise the UDP port numbers used to make queries. Note that if you specify a specific port in the "query-source" setting, e.g. to work your way through a recalcitrant firewall, you will lose much of the advantage of the new fixes.

If you are not in a position to upgrade, you can forward all requests to other recursive nameservers that you trust. The recursive nameservers provided by the Computing Service, at IP addresses 131.111.8.42 and 131.111.12.20, are now running BIND 9.4.2-P1 and can be used in this way by hosts on the CUDN.

If you need advice about this, please contact hostmaster@ucs.cam.ac.uk.

General relaxation of the rules on use of sub-domains

2008-05-13 - News - Chris Thompson

The restructuring of the database to allow free use of sub-domains, mooted in a previous article, has now been implemented.

As before, all names in the database have an associated domain whose value must be in a predefined table and is used to control user access. However this can now be any suffix part of the name following a dot (or it can be the whole name). If a CO has access to the domain dept.cam.ac.uk, then they can register names such as foobar.dept.cam.ac.uk (as previously) or foo.bar.dept.cam.ac.uk, or even dept.cam.ac.uk alone (although this last may be inadvisable). Such names can be used for "boxes" as registered and rescinded via the single_ops page, and also (to the rather limited extent that COs have delegated control over them) for vboxes and anames.

There are cases when one already registered domain name is a suffix of another, e.g. sub.dept.cam.ac.uk and dept.cam.ac.uk. Often these are in the same management zone and the longer name is present only to satisfy the previously enforced constraints. In these cases we shall phase out the now unnecessary domain. However, in a few cases they are in different management zones, with different sets of COs having access to them. It is possible for a CO with access only to dept.cam.ac.uk to register a name such as foobar.sub.dept.cam.ac.uk, but its domain part will be taken as dept.cam.ac.uk and not sub.dept.cam.ac.uk. This is likely to cause confusion, and we will be relying on the good sense of COs to avoid such situations.

For CNAMEs, the mechanism using strip_components described in the previous article still exists at the moment, but it will be soon be replaced by a cname_ops web page in which the domain part is deduced automatically, as for the other database object types mentioned above, rather than having to be specified explicitly. (Now implemented, 2008-06-05.)

We advise that COs should not use sub-domains too profligately, and plan their naming schemes carefully. Any questions about the new facilities should be emailed to us.

IPv6 addresses for the root nameservers

2008-02-05 - News - Chris Thompson

Six of the thirteen root nameservers are now advertising IPv6 addresses. There is some background information about this change at

There is also a new root hints file with these addresses added, and the copy at

http://jackdaw.cam.ac.uk/ipreg/nsconfig/db.cache

has been updated.

Of course, the IPv6 addresses are not useful if your nameserver does not (yet) have IPv6 connectivity, but they should do no harm, and on general principles it's inadvisable to let one's root hints file get too out of date.

More changes to the sample nameserver configuration

2008-01-30 - News - Chris Thompson

A number of changes have been made to the sample configuration for "stealth" nameservers at

http://jackdaw.cam.ac.uk/ipreg/nsconfig/

None of these require urgent action.

First, the set of locally defined empty reverse zones, intended to stop queries for the corresponding IP addresses being sent to the Internet's root nameservers, has been brought into line with those created automatically by BIND 9.4 and later. Some of the IP address ranges covered are larger than before, while some are smaller. If you are actually running BIND 9.4 or later, then you can omit most of these zone definitions, but note that "0.in-addr.arpa" should not yet be omitted (as of BIND 9.4.2), and nor should those for the RFC1918 institution-wide private addresses.

There are new versions of the zone files db.null, db.localhost, and db.localhost-rev. The first has been made identical to that which BIND 9.4 generates internally, except that the SOA.mname value is "localhost" rather than a copy of the zone name (this avoids a warning message from BIND when it is loaded). The other two, intended to provide forward and reverse lookup for the name "localhost", have been modified in a similar way. These files no longer have "sample" in their name, because they no longer require any local modification before being used by BIND.

Some changes to sample.named.conf have been made in support of IPv6. The CUDN IPv6 range 2001:630:200::/48 has been added to the "camnets" ACL definition: this becomes relevant if you are running a nameserver providing service over IPv6. The corresponding reverse zone "0.0.2.0.0.3.6.0.1.0.0.2.ip6.arpa" has been added to the list that can be slaved from 131.111.8.37 and 131.111.12.37: it may be desirable to do that if your nameserver is providing a lookup service to clients on IPv6-enabled networks, whether it uses IPv6 itself or not.

In addition, a number of comments have been corrected or clarified. Note in particular that BIND does not require a "controls" statement in the configuration file to make run-time control via the "rndc" command work. See the comments for more details. It should only rarely be necessary to actually restart a BIND daemon due to a change in its configuration.

Change of address of a root nameserver

2007-11-03 - News - Chris Thompson

The IP address of one of the root nameservers, l.root-servers.net, has changed from 198.32.64.12 to 199.7.83.42. (Such changes are rare: the last one was in January 2004.)

If you are running a nameserver with a root hints zone file, that should be updated. There are a number of ways of generating a new version, but the official with-comments one is at

ftp://ftp.internic.net/domain/named.root

and there is a copy of that locally at

http://jackdaw.cam.ac.uk/ipreg/nsconfig/db.cache

Modern versions of BIND 9 have a compiled-in version of the root hints zone to use if none is defined in the configuration file. As a result of this change, the compiled-in version will be out of date for existing BIND versions: a corrected version has been promised for the next versions of BIND 9.3.x, 9.4.x and 9.5.x.

Using a slightly out-of-date root hints zone is unlikely to cause serious problems, but it is something that should not be allowed to persist indefinitely.

Relaxation of the name-to-domain rules for CNAMEs

2007-08-23 - News - Chris Thompson

All names in the database have an associated domain whose value must be in a predefined table and is used to control user access. Until now, the domain has always been formed by stripping exactly one leading component from the name. This leads, for example, to the often unwelcome advice to use www-foo.dept.cam.ac.uk rather than www.foo.dept.cam.ac.uk.

We have tentative plans to restructure the database to liberalise this constraint everywhere, but this is a major undertaking and will not happen soon. However, we have been able to provide partial relief in the special case of CNAMEs.

In the table_ops page under object type cname there is now a field strip_components. This can be set to a number which controls how many leading components are stripped from the name value to convert it to a domain. (Note that it has no affect on the treatment of target_name.) For example, setting it to 2 for www.foo.dept.cam.ac.uk associates it with the domain dept.cam.ac.uk rather than the (probably non-existent) domain foo.dept.cam.ac.uk. Leaving the field null is equivalent to setting it to 1. (0 is an allowed value, but note that creating a CNAME dept.cam.ac.uk is disallowed if there is a mail domain with that name.)

Changes to the sample nameserver configuration

2007-08-20 - News - Chris Thompson

Three changes have been made to the sample configuration for "stealth" slave nameservers on the CUDN.

First, the configuration files have been moved from

ftp://ftp.cus.cam.ac.uk/pub/IP/Cambridge

to

http://jackdaw.cam.ac.uk/ipreg/nsconfig

and internal references have been adjusted to match. The old location will contain copies of the updated files only for a very limited overlap period.

Second, the sample.named.conf file now recommends use of

notify no;

in the "options" statement. BIND is by default profligate with its use of notify messages, and a purely stealth nameserver can and should dispense with them. See the comments in the file for what to do if you also master or officially slave other DNS zones.

Third, comments in the file previously suggested that one could use a "type forward" zone for private.cam.ac.uk. Although this does work for the corresponding private reverse zones, it does not for the forward zone if cam.ac.uk itself is being slaved. In that case, if you don't want to slave the whole of private.cam.ac.uk, then you should use a "type stub" zone instead. See the new comments for details.

Recent problems with CUDN central nameservers

2007-07-16 - News - Chris Thompson

In the normal state, one machine hosts 131.111.8.37 [authdns0.csx] and 131.111.8.42 [recdns0.csx] while another hosts 131.111.12.37 [authdns1.csx] and 131.111.12.20 [recdns1.csx]. (On each machine the different nameservers run in separate Solaris 10 zones.) On the evening of Friday 13 July, work being done on the second machine (in preparation for keeping the machines running during the electrical testing work on Sunday) caused it to lose power unexpectedly, and recovery took us some time, so that the authdns1 and recdns1 services were unavailable from about 17:24 to 19:20.

Unfortunately, our recovery procedure was flawed, and introduced creeping corruption into the filing system. The relevant machine became unusable at about 14:45 today (Monday 16 July). In order to get the most important services functional again,

  • the recursive nameserver at 131.111.12.20 [recdns1.csx] was moved to a new Solaris 10 zone on the machine already hosting authdns0 & recdns0: this was functional from about 15:45 (although there were some short interruptions later);

  • the non-recursive authoritative nameserver at 131.111.12.37 [authdns1.csx] had its address added to those being serviced by the authdns0 nameserver at about 20:10 this evening.

Of course, we hope to get the failed machine operational again as soon as possible, and authdns1 & recdns1 will then be moved back to it.

Please send any queries about these incidents or their consequences to hostmaster@ucs.cam.ac.uk.

Splitting of authoritative from recursive nameservers

2007-04-23 - News - Chris Thompson

A few weeks ago we told you

We currently plan to lock down the recursive nameservers at 131.111.8.42 and 131.111.12.20, so that they do not respond to queries from outside the CUDN and also do not allow zone transfers, during the first week of the Easter term (23-27 April). We will update you on this closer to the time.

We now intend to make these changes, at least insofar as zone transfers are concerned, early on Thursday 26 April.

We would like to thank all those who made changes to nameservers in their jurisdiction to fetch DNS zones from 131.111.8.37 / 131.111.12.37 instead. Logging has shown that the number of hosts still fetching from 131.111.8.42 / 131.111.12.20 is now quite small. Some final reminders will be sent to those who still have not made the change.

Splitting of authoritative from recursive nameservers

2007-03-21 - News - Chris Thompson

Some minor changes have been made to the sample configuration for "stealth" slave nameservers on the CUDN at

ftp://ftp.cus.cam.ac.uk/pub/IP/Cambridge/sample.named.conf

Firstly, one of the MRC-CBU subnets was incorrectly omitted from the "camnets" ACL, and has been added.

Secondly, questions were asked about the setting of "forwarders" in the "options" statement, and so I have added some comments about that. We used to recommend its use, but have not done so for some time now, except in situations where the nameserver doing the forwarding does not have full access to the Internet. However, if query forwarding is used, it should always be to recursive nameservers, hence to 131.111.8.42 and 131.111.12.20 rather than to the authoritative but non-recursive nameservers at 131.111.8.37 and 131.111.12.37.

We are now logging all outgoing zone transfers from 131.111.8.42 and 131.111.12.20, and will be contacting users who have not made the change to fetch from 131.111.8.37 and 131.111.12.37 instead, as time and effort permit. Help us by making the change before we get around to you!

We currently plan to lock down the recursive nameservers at 131.111.8.42 and 131.111.12.20, so that they do not respond to queries from outside the CUDN and also do not allow zone transfers, during the first week of the Easter term (23-27 April). We will update you on this closer to the time.

Splitting of authoritative from recursive nameservers

2007-03-08 - News - Chris Thompson

There is a new version of the sample configuration for "unofficial" slave nameservers on the CUDN at

ftp://ftp.cus.cam.ac.uk/pub/IP/Cambridge/sample.named.conf

This is a major revision, which includes new reverse zones, advice on access control settings, and several other changes. However the most important, and one which anyone managing such a slave nameserver should act on as soon as possible, is that the zones which were previously being fetched from

 masters { 131.111.8.42; 131.111.12.20; };

should now be fetched from

 masters { 131.111.8.37; 131.111.12.37; };

instead. The background to this is described below.

We are in the process of separating the authoritative nameservers for the Cambridge University DNS zones from those providing a recursive DNS lookup service for clients on the CUDN. To minimise the pain, it is the latter which have to retain the existing IP addresses. When the transformation is complete we will have

authdns0.csx.cam.ac.uk [131.111.8.37]
authdns1.csx.cam.ac.uk [131.111.12.37]

providing non-recursive authoritative access to our zones (and zone transfer for appropriate zones to clients on the CUDN) while

recdns0.csx.cam.ac.uk [131.111.8.42]
recdns1.csx.cam.ac.uk [131.111.12.20]

will provide a recursive lookup service to CUDN clients (but not zone transfers), and no service at all outside the CUDN.

Announcement list converted to Mailman

2006-10-04 - News - Chris Thompson

The mailing list cs-nameservers-announce@lists.cam.ac.uk has been converted from an old-style list to a Mailman list. (See https://www.lists.cam.ac.uk for background information.)

The list options attempt to match the previous state of affairs. The list is moderated, and subscription requires administrator action (but you can now request it via the web pages as well as by message). On the other hand, unsubscription by end-user is enabled.

Digests are not available. Archives will be kept and can be read even by non-members.

Non-interactive use of the IP registration database

2006-05-08 - News - Chris Thompson

There are situations in which there is a requirement for non-interactive access to the IP registration database. A new method of using the web interface has been provided, in which cookies with a long life can be downloaded and used to authenticate subsequent non-interactive https access, for example by using programs such as curl.

See the download-cookie page on Jackdaw for a more complete description of the scheme. At the moment only the list_ops page can be used with downloaded cookies for the ipreg realm, and it requires a certain amount of reverse engineering to be used with a non-interactive tool. Pages more suitable for this sort of use may be provided later in the light of experience. The current state is quite experimental and we would ask anyone planning to use it in production to let us know.

Some departments and colleges are using firewall software written by Ben McKeegan at Netservers Ltd., which interacts with the IP registration database using the old method of authentication via an Oracle account password. A version of this software that uses downloaded cookies as described above is under development and we hope it will be available soon.

For several reasons we want to restrict the number of people who have SQL-level access to the underlying Oracle database, and there has been a recent purge of unused Oracle accounts. If you have good reason to need such access to the IP registration part of the database, please let us know.

More delegated control of CNAMEs

2005-12-19 - News - Chris Thompson

Up until now ordinary users of the IP registration database have only been allowed to modify certain fields (targetname, purpose, remarks) in existing CNAMEs, and in particular not to create or delete them. These restrictions have now been removed. CNAMEs for which both the name and the targetname are in domains which the user has access to can be freely created, updated and deleted, subject to the existing consistency constraints: for example, that the target_name actually refers to an existing database record.

Such operations can be done using the table_ops page after selecting object type cname, in ways that will be familiar to those who have performed modifications to existing CNAMEs in the past. We recognise that this interface is somewhat clunky, and a tailored cname_ops web page may be made available in the future.

There is a maximum number of CNAMEs associated with each management zone in the database, which can be altered only by us. hese limits have been set high enough that we do not expect sensible use of CNAMEs to reach them very often. Users will be expected to review their existing CNAMEs before asking for an increase.