op
command-line tool
with Ansible. It works fairly nicely.
You need to install the 1Password command-line tool.
You need a recent enough Ansible with the community.general
collection installed, so that it includes the onepassword
lookup
plugin
To try out an example, create an op.yml
file containing:
--- - hosts: localhost tasks: - name: test 1password debug: msg: <{{ lookup("onepassword", "Mythic Beasts", field="username") }}>
You might need to choose an item other than Mythic Beasts if you don't have a login with them.
Initialize op
and start a login session by typing:
eval $(op signin)
Then see if Ansible works:
ansible-playbook op.yml
Amongst the Ansible verbiage, I get the output:
ok: [localhost] => { "msg": "<hostmaster@cam.ac.uk>" }
Some more detailed notes follow...
I want it to be easy to keep secrets encrypted when they are not in use. Things like ssh private keys, static API credentials, etc. "Not in use" means when not installed on the system that needs them.
In particular, secrets should normally be encrypted on any systems on which we run Ansible, and decrypted only when they need to be deployed.
And it should be easy enough that everyone on the team is able to use it.
I wrote regpg to tackle this problem in a way I consider to be safe. It is modestly successful: people other than me use it, in more places than just Cambridge University Information Services.
But it was not the right tool for the Network Systems team in which I
work. It isn't possible for a simple wrapper like regpg
to fix
gpg
's usability issues: in particular, it's horrible if you don't
have a unix desktop, and it's horrible over ssh.
Since I wrote regpg
we have got 1Password set up for the team. I
have used 1Password for my personal webby login things for years, and
I'm happy to use it at work too.
There are a couple of ways to use 1Password for ops automation...
First I looked at the relatively new support for "secrets automation" with 1Password. It is based around a 1Password Connect server, which we would install on site. This can provide an application with short-term access to credentials on demand via a REST API. (Sounds similar to other cloudy credential servers such as Hashicorp Vault or AWS IAM.)
However, the 1Password Connect server needs credentials to get access to our vaults, and our applications that use 1Password Connect need API access tokens. And we need some way to deploy these secrets safely. So we're back to square 1.
The op
command has
basically the same functionality as 1Password's GUIs. It has a similar
login model, in that you type in your passphrase to unlock the vault,
and it automatically re-locks after an inactivity timeout. (This is
also similar to the way regpg
relies on the gpg
agent to cache
credentials so that an Ansible run can deploy lots of secrets with
only one password prompt.)
So op
is clearly the way to go, though there are a few niggles:
The op
configuration file contains details of the vaults it has
been told about, including your 1Password account secret key
in cleartext. So the configuration file is sensitive and should be
kept safe. (It would be better if op
stored the account secret
key encrypted using the user's password.)
op signin
uses an environment variable to store the session key,
which is not ideal because it is easy to accidentally leak the
contents of environment variables. It isn't obvious that a
collection of complicated Ansible playbooks can be trusted to
handle environment variables carefully.
It sometimes reauires passing secrets on the command line, which exposes them to all users on the system. For instance, the documented way to find out whether a session has timed out is with a command line like:
$ op signin --session $OP_SESSION_example example
I have reported these issues to the 1Password developers.
op
Ansible's community.general
collection includes
some handy wrappers around the op
command, in particular the
onepassword lookup plugin. (I am not so keen on the others
because the documentation suggests to me that they do potentially
unsafe things with Ansible variables.)
One of the problems I had with regpg
was bad behaviour that occurred
when an Ansible playbook was started when the gpg agent wasn't ready;
the fix was to add a task to the start of the Ansible playbook which
polls the gpg agent in a more controlled manner.
I think a similar preflight task might be helpful for op
:
check if there is an existing op
session; if not, prompt for a
passphrase to start a session
set up a wrapper command for op
that gets the session key from a
more sensible place than the environment
To refresh a session safely, and work around the safety issue with op
signin
mentioned above, we can test the session using a benign
command such as op list vaults
or op get account
, and run op
signin
if that fails.
The wrapper script can be as simple as:
#!/bin/sh OP_SESSION_example=SQUEAMISHOSSIFRAGE /usr/local/bin/op "$@"
Assuming there is somewhere sensible and writable on $PATH
...
cam.ac.uk
domains.
Underscores are now allowed in the names and targets of CNAME records so that they can be used for non-hostname purposes.
]]>One exception to that is the special Cisco Jabber zones supported by the phone service. There is now a link from our stealth secondary DNS documentation to the Cisco Jabber documentation, but there are tricky requirements and caveats, so you need to take care.
The rest of this item is the story of how we discovered the need for these warnings.
Cisco Jabber is designed around a classic enterprise-style
internal/external network architecture with firewalls and DNS views,
which doesn't fit the University very well. The special Jabber DNS SRV
records (_cisco-uds
etc.) have been set up on the phone system's own
DNS servers, which are able to support the special split views more
easily than the central DNS servers.
If the network requirements are satisfied then you can see the Jabber internal view records, but in practice most clients should see the DNS SRV records for the external view.
With many people working from home, our colleagues in the telecoms
office found that Jabber was not working as expected. After some
investigation it became apparent that the internal view _cisco-uds
DNS SRV records were leaking: often Virgin Media's DNS servers would
return the wrong answers, and sometimes the various public DNS
resovers would as well.
This was very mysterious.
We could not find any configuration problems with the phone system's DNS servers, nor with the central DNS servers, nor with the contents of the DNS zones.
After much head-scratching and many red herrings and blind alleys, I
worked out that one of the public DNS servers for the cam.ac.uk
zone
was configured as a secondary for Jabber's special internal
_cisco-uds
view. There was a 1-in-6 chance that people outside the
University would get the wrong records, depending on which of our 6
public DNS servers their resolver happened to talk to.
So we've corrected the configuration mistake, and improved our documentation to reduce the risk of it happening again. But there's a bit more we can do.
One of the things that made this hard to debug was that the usual consistency checking tools such as Zonemaster did not spot the mistake. DNSviz encountered the problem, which gave me a bit of a clue, but DNSviz isn't designed to systematically examine all of a zone's nameservers in the way that Zonemaster does.
The reason Zonemaster didn't find the problem is that it examines a
zone's own nameservers for consistency, but it doesn't check that all
the zone's parent's nameservers have consistent delegations. In our
Jabber case it was one of the parent zone (cam.ac.uk
) servers that
was doing the wrong thing with the child _cisco-uds
zone.
We have a Zonemaster script for checking all our zones, but it currently uses a rather out-of-date version. I'm hoping that after some operating system upgrades it will be more convenient to use a recent version of Zonemaster, and it will make sense to add some extra checks so that Zonemaster can spot and complain about mistakes like our Cisco Jabber leakage.
]]>This release removes TYPE65534 records from the list of
DNSSEC-related types that nsdiff
ignores.
TYPE65534 is the private type that BIND uses to keep track of incremental signing. These records usually end up hanging around after signing is complete, cluttering up the zone. It would be neater if they were removed automatically.
In fact, it's safe to try to remove them using DNS UPDATE: if the records can be removed (because signing is complete), thy will be; if they can't be removed then they are quietly left in place, and the rest of the update is applied.
After this change you can clean away TYPE65534 records using nsdiff
or nsvi
. In our deployment, nspatch
runs hourly and will now
automatically clean TYPE65534 records when they are not needed.
Since September we have blocked use-application-dns.net
, which tells
Firefox not to use DoH by default. This is not strictly necessary,
since Firefox does not plan to enable DoH for users in the EU and UK,
but we set up the block when the Firefox policy seemed to be much more
gung-ho, and we have left it in place.
We explained the reasons for blocking use-application-dns.net
on our DNS blog when the block was set up in September. It's a tricky
balance of several desirable but conflicting goals, and the outcome is
not so great - see the blog for the gory details.
Despite that, we are in favour of encrypted DNS and it has been supported on the University's central resolvers for well over a year.
We have instructions for setting up encrypted DNS lookups with Firefox and various DNS resolvers. As a bonus you can also enable encrypted server name indication (ESNI) to reduce information leaks during TLS connection setup.
The main caveat is that our resolvers are only available for use on the CUDN, so you will not be able to use this setup on highly mobile devices.
If you run your own DNS resolvers, there's no particular need to do anything about Firefox and DoH at this time.
If your resolvers forward queries to our central resolvers, then
use-application-dns.net
will already be blocked for you. If your
server is set up as a stealth secondary, then the sample.named.conf
guide includes instructions for subscribing to our DNS RPZ
blocks.
Otherwise, it's still OK to leave things as they are, because Firefox is not doing DoH by default for us. (yet?)
]]>We use keepalived to determine which of the physical servers is in live service. It does two things for us:
We can move live service from one server to another with minimal disruption, so we can patch and upgrade servers without downtime.
The live DNS service can recover automatically from things like server hardware failure or power failure.
This note is about coping with network outages, which are more difficult.
This morning there was some planned maintenance work on our data centre network which did not go smoothly. This caused more disruption than expected to our DNS service: although the DNS service was moved out of the affected building yesterday, DNS was not properly isolated from the network faults.
Until the network problems have been fixed, the DNS servers in the affected building have been completely disabled, so that their connectivity problems do not affect the live DNS servers in the other two buildings.
Normally we run with two live servers, one hot standby, and one test server. At the moment we have only the two live servers which are acting as hot standby for each other.
The rest of this note has more detailed background information about how our DNS servers cope with failures, and why this mitigation was chosen.
Keepalived is an implementation of VRRP, the virtual router redundancy protocol. VRRP is one of a number of first hop redundancy protocols that were originally designed to provide better resilience for a subnet's default gateway. But these protocols are also useful for application servers: as well as DNS servers, keepalived is often used with HAProxy to make web servers more resilient.
VRRP uses periodic multicast packets to implement an election protocol between the servers. The server with the highest priority is elected as the winner. It becomes the live server by using ARP to take over the service IP address. For our DNS servers, 131.111.8.42 is a floating address that moves to whichever server wins a VRRP election. There is another instance of VRRP for 131.111.12.20 so that a different server will win its election, so we have two different live DNS servers.
When there is an outage in the data centre ethernet, a DNS server can no longer see VRRP multicast packets from the other servers, so it considers itself to be the highest priority server available and elects itself the winner. Of course the other servers are likely to do the same, so we end up with multiple servers that think they currently own the live service address. This "split brain" situation will be resolved when connectivity is restored.
Recovering from a split brain involves two things:
At the VRRP level, the servers must see each other's multicast packets and recognise their correct position in the priority order and whether they should win the election or not;
At the ethernet level, the service IP address and MAC address need to be associated with the right physical switch port so that the right server gets the live traffic.
Our normal practice for planned network maintenance is to move the live DNS service away from the servers that will be affected by the work, so that service doesn't have to bounce around unnecessarily. Keepalived takes a few seconds to move the live server, and the standby server will have an empty cache, neither of which are good for DNS performance.
However, the idle server that is affected by the planned maintenance will go into split brain mode. When connectivity is restored, the split brain needs to be healed, and this can involve some disruption at the ethernet level which may briefly affect the live server. There may be longer disruption if connectivity is intermittent.
Our DNS keepalived configuration uses dynamic server priorities to move live service around without reconfiguring keepalived. Reconfiguring it requires restarting it, which can cause disruptive re-elections if the servers win the restart races in the wrong order.
To prevent split brain in planned outages, we need to exclude the affected DNS server completely. However, it isn't possible to exclude a server just using dynamic priorities. Although keepalived has a FAULT state which stops it electing itself the winner, in our version this is based on the ethernet link status and can't be scripted dynamically.
So with our current setup, the only way to stop a server from going into split brain mode is to stop keepalived completely. This removes the server from all the live and test DNS VRRP clusters. It can be safely removed from and added to the clusters, without causing unwanted re-elections, if it is also configured with the lowest dynamic priorities before keepalived is stopped.
In the past we generally have not bothered to disable keepalived during planned network outages. However in cases (like this morning) where the network maintenance does not go well, it can disrupt the live DNS servers even when it is not expected to.
So I think as an improvement for the future we will plan to stop keepalived on DNS servers that are going to be affected by network maintenance, just in case. We have enough DNS redundancy to be able to take one or two servers completely out of service temporarily, and still have 2N resilience.
Thanks to David Carter for reporting DNS problems this morning.
]]>Summary
DNSSEC validators should continue to treat SHA-1 signatures as secure until DNSSEC signers have had enough time to perform algorithm rollovers and eliminate SHA-1 from the vast majority of signed zones.
At the time of writing, the root zone has 1517 TLDs, of which 1376 are signed (91%). 141 (9%) are insecure and 274 (18%) use SHA-1. Two out of the five regional Internet registries use SHA-1 for their reverse DNS zones.
In an ideal world, everyone should have paid attention to the reasons that the root zone chose SHA-256 rather than SHA-1 when it was signed 10 years ago. In the second-best world, everyone should have paid attention to the first real demonstration of a collision attack on SHA-1 five years ago.
Despite many very clear recommendations (from NIST, from 5 years of practical SHA-1 attacks), we have been slow to eliminate SHA-1 from DNSSEC. I hope these articles will help to get rid of it faster.
Some applications that use DNSSEC will stop working if DNSSEC validation is disabled; some will quiely become insecure.
In my movie-plot attack scenario I said the attacker's target was using CAA records to protect against unauthorised X.509 certificates. The CA/Browser forum baseline requirements say that CAs must do DNSSEC validation for CAA lookups, so our victim's defence should have been solid. This CAA + DNSSEC defence was supposed to restrict what CAs and ACME accounts can request certificates. Without that restriction, ACME is vulnerable to interception attacks.
There is a similarly quiet fall-back to insecure behaviour for DANE TLSA authentication of mail servers if DNSSEC validation is disabled.
The breakage is more obvious for ssh. If a site is relying on SSHFP records in the DNS for authenticating servers, common ssh implementations will not cache server key fingerprints in the way they do for traditional trust-on-first-use ssh connections. If DNSSEC stops validating SSHFP records then ssh connections will stop working, and instead prompt a human to authenticate server key fingerprints (for interactive uses) or generate cronspam.
At the moment, the cost and difficulty of a SHA-1 collision attack is large, even in a lab setup with expert cryptographers. The practicalities of applying an attack in the real world make it even more challenging: predicting the RRSIG timing parameters, scheduling a DNS update to sub-second accuracy, having a spare supercomputer available at the right time, ...
Although this is much less than cryptographic levels of security, it is still hard to find a payoff that would make an attack worthwhile: there is a lot of lower-hanging fruit. For instance, in my movie-plot scenario I set things up so that it made more sense to attack DNSSEC than plaintext HTTP by assuming that CAA validationmethods and accounturi records are published and checked. But that draft spec has not yet been deployed, so ACME is a softer target than DNSSEC.
There are still a lot of sites that depend on DNSSEC SHA-1 algorithms. Things will break if SHA-1 validation is abruptly disabled. SHA-1 is not yet so easily cracked that it is useful to attackers.
My recommendation is that DNSSEC validators should continue to treat SHA-1 as secure from the protocol point of view. They should continue to reject responses that fail to validate as expected, and they should set the "AD" authenticated data bit in responses that do validate.
DNSSEC signers need to stop using SHA-1 as soon as they can. At the moment signers (those 18% of TLDs, those 40% of RIRs, those however-many subdomains) are responsible for the deprecation schedule. They should be encouraged by friendly help and advocacy to plan and implement their algorithm rollovers.
Disabling SHA-1 validation at this point in time is not friendly or helpful. We should plan to deprecate it after the number of zones signed with SHA-1 has fallen to a small enough level.
]]>1028 commits to IP Register
121 commits to Superglue
48 commits to git.uis.cam.ac.uk
26 commits to BIND
4287 IP Register / MZS support messages (about 6% more than last year)
2786 cronspam messages (less than half of last year, mainly due to changes in the MWS)
Server reshuffle (Jan, Nov, Dec)
January was continuing upgrades/renames from the end of 2018.
November was initial work on splitting authoritative servers from zone transfer servers. (This is still work in progress.)
December was abolishing the old authdns.csx
names. (nearly done!)
Wholesale delegation cleanup (Jan, Sep, Oct, Nov, Dec)
This was in support of:
the authdns.csx
-> auth.dns
server renaming
the withdrawal of sns-pb.isc.org
and moving our off-site
secondary service to Mythic Beasts
upgrading DNSSEC from RSASHA1 to ECDSA256
ensuring all our domains have consistent ownership and contact information - there was an error rate of more than 10% due to mistakes and omissions in manual maintenance
The development work involved:
porting the web site automation code from CasperJS to WebDriver
new integration code for Mythic Beasts
extending the scope to cover domain ownership and contact information as well as DNS delegations
improvements to the way we manage encrypted secrets
Zonemaster DNS rule checking for all zones
Overall this took a lot longer than I would have liked. This automation code has been a barely-working mess since 2015, but at last now it is close to the point of being releasable production code.
We now have fast, automated consistency checking and enforcement across our domains. The anomaly rate has been pushed down from somewhere over 10% to near zero.
Porting web front end from Jackdaw to www.dns.cam.ac.uk
(Apr, May, Jun, Jul, Aug)
Building on 2018's work on the web site infrastructure.
Ported Jackdaw's Oracle + mod_perl platform and web application framework - simplifying and moving to the DNS web server.
Reskinned the IP Register web forms to Project Light.
This project is more than half done, but it had to go on the back burner after more urgent work turned up, which is somewhat irritating.
February: added a light-weight self-service migration tool, with documentation.
September: determined timetable for migration and shut-down.
Less busy this year.
ANAME draft dropped due to technical difficulties; the consensus was to pursue different solutions to the general problem.
Received thanks in RFC 8482 (minimal ANY responses), RFC 8499 (DNS Terminology), RFC 8689 (SMTP Require TLS).
Superglue scripts for managing domain registrations and delegations
This is the code supporting the delegation cleanup project.
It is not quite up to a releasable standard - there are missing safety checks, missing documentation, missing build/install scripts.
Support for YAML metadata alongside encrypted secrets. This was for Superglue's login credentials, but it also led to improvements in IP Register's secret handling in several places.
Improved handling of CDS and CDNSKEY records.
Twenty-six patches committed to BIND9.
Several improvements to rndc
Cryptography improvements: deprecated SHA-1, upgraded default RSA key size.
Better support for CDS and CDNSKEY records.
Numerous others.
Short term:
Split authoritative servers from zone transfer servers
Sort out RPZ + RBL subscriptions
Deploy replacement hardware for recursive DNS servers
Finish Superglue; publish ReGPG and Superglue on CPAN; proper Debian package builds
Longer term:
Operating system refresh
Finish new IP Register web front end
Start porting IP Register database from Oracle to PostgreSQL
cam.ac.uk
domains.
The web user interface for the MZS has moved to https://mzs.dns.cam.ac.uk/; the old names redirect to the new place.
You can now manage TXT records in your zones in the MZS.
The expiry date of each zone (when its 5 year billing period is up) is now tracked in the MZS database and is visible in the web user interface.
]]>I'll use botolph.cam.ac.uk
as the example zone. I'll assume the
rollover is from algorithm 5 (RSASHA1) to algorithm 13
(ECDSA-P256-SHA-256).
First we add the new algorithm to the zone alongside the old one.
Run the following commands in your key directory:
dnssec-keygen -L 24h -a 13 -f ksk botolph.cam.ac.uk dnssec-keygen -L 24h -a 13 botolph.cam.ac.uk
Algorithm 13 is ECDSA-P256-SHA-256.
Each algorithm needs two keys, one with the KSK flag, and a ZSK without a special flag.
The -L 24h
flag sets the TTL on the DNSKEY records. This is
optional but it's a good idea if the zone's default TTL is
shorter.
In my setup I use group access to make the keys readable by named
:
chgrp named Kbotolph.cam.ac.uk.*.private chmod g+r Kbotolph.cam.ac.uk.*.private
Then get named
to reload the keys:
rndc loadkeys botolph.cam.ac.uk
Or you can use a bigger hammer to reload everything:
rndc reload
Look at the name server logs. You should see the keys being reloaded, followed by a lot of zone transfer activity caused by the zone being signed with the new keys.
If you look at the zone now, you should see four DNSKEY records (two
algorithm 5 and two algorithm 13) and two sets of signatures like
RRSIG DNSKEY 5
and RRSIG DNSKEY 13
.
dig +dnssec botolph.cam.ac.uk DNSKEY
When a zone is signed with a new algorithm, named
keeps track of
progress with TYPE65534
records. You can tidy them away after it has
finished signing with:
rndc signing -clear all botolph.cam.ac.uk
Now you need to wait for the longest TTL in your zone to pass. For
instance with cam.ac.uk
this was 48h because we have some
subdomain delegations with long TTLs.
You need to ensure that all records that are only signed with the old algorithm have expired from caches. After the longest TTL has expired, everywhere should be seeing records signed with both algorithms, so they will be able to successfully validate records using the new algorithm instead of the old one.
Now we can swap the chain of trust from the old algorithm to the new one. There are two ways to do this, depending on how recent your software is and your parent zone works.
For subdomains of cam.ac.uk
and other UIS zones (reverse DNS, etc.)
you can use CDS records. CDS records are instructions from the child
zone to its parent zone saying what the DS records should be. They are
configured by setting timing parameters on your keys.
Have a look at the public key files for your zone. This will show you the timing metadata as well as the DNSKEY records:
grep ^ Kbotolph.cam.ac.uk.*.key
Look for the key files with a first line like:
; This is a key-signing key, keyid XXXXX, for botolph.cam.ac.uk.
Ensure there are no CDS records for the old algorithm, and add CDS records for the new algorithm, like this:
dnssec-settime -Dsync now Kbotolph.cam.ac.uk.+005+XXXXX dnssec-settime -Psync now Kbotolph.cam.ac.uk.+013+YYYYY
-Dsync
means delete sync records
-Psync
means publish sync records
"sync records" is the term BIND uses for CDS and CDNSKEY records.
Then (if necessary) fix the permissions, and get named
to reload the keys:
chgrp named Kbotolph.cam.ac.uk.*.private chmod g+r Kbotolph.cam.ac.uk.*.private rndc loadkeys botolph.cam.ac.uk
Now wait. Our systems will automatically notice the requested change,
using my dnssec-cds
program. We will send you a
confirmation email when the change has been put into effect. (This is
not yet entirely automated.)
If you are running an older version of BIND, or for zones under most other parent domains, you will need to explicitly update the delegation with the new DS record. (Automatic updates with CDS records are supposed to work for RIPE reverse DNS, but it seems RIPE-NCC have not yet added support as expected.)
As before, look at the key files to identify the key-signing keys. Get the DS record for the new algorithm like this:
dnssec-dsfromkey -2 Kbotolph.cam.ac.uk.+013+YYYYY
Using whatever facilities the parent zone provides (e.g. email to ip-register@uis.cam.ac.uk), replace the old DS record(s) with the new one.
After you have confirmed that the DS record has been changed, you need
to wait for the parent's DS TTL to pass. For subdomains of cam.ac.uk
and other UIS zones this is 24 hours; in other cases (such as for
111.131.in-addr.arpa
) it is often 48 hours.
This wait ensures that the old algorithm is no longer being relied on as part of the chain of trust, before it can be safely decommissioned.
Once again this is done by updating the key timing parameters:
dnssec-settime -D now -I now Kbotolph.cam.ac.uk.+005+KKKKK dnssec-settime -D now -I now Kbotolph.cam.ac.uk.+005+ZZZZZ
-D
is when the DNSKEY record is deleted from the zone
-I
is when the key becomes inactive, no longer used for signing records
You need to do this for both the key-signing key and the zone-signing key for the old algorithm.
Then (if necessary) fix the permissions, and tell named
to put the
change into effect:
chgrp named Kbotolph.cam.ac.uk.*.private chmod g+r Kbotolph.cam.ac.uk.*.private rndc loadkeys botolph.cam.ac.uk
After that, named
no longer needs the old keys, so they can be
removed from the filesystem.
That's it!
There are some changes still in progress that will make this somewhat smoother with newer versions of BIND.
The first improvement will be to combine the steps before the DS change. At the moment there is a bug that stops you from scheduling new CDS records in the future. I have submitted a fix; when the fix is released, you will be able to do the key generation and schedule the CDS swap in one go:
dnssec-keygen -L 24h -a 13 -Psync now+50h -f ksk botolph.cam.ac.uk dnssec-keygen -L 24h -a 13 botolph.cam.ac.uk dnssec-settime -Dsync now+50h Kbotolph.cam.ac.uk.+005+XXXXX rndc loadkeys botolph.cam.ac.uk
Another improvement will come from an updated dnssec-checkds
that
knows about CDS records. (It works but I have not properly submitted
it upstream yet.). This will allow you to
automatically and safely verify that a DS change has taken effect.
When a DS change has been confirmed, you can immediately schedule the decommissioning of the old algorithm, with commands like:
dnssec-settime -D now+50h -I now+50h Kbotolph.cam.ac.uk.+005+KKKKK dnssec-settime -D now+50h -I now+50h Kbotolph.cam.ac.uk.+005+ZZZZZ
This works now, but you might want something to remind you to clear away unused key files.
Further in the future will be fully automated rollovers, but I will discuss that another time.
]]>On the 7th January, a new more flexible and efficient collision attack against SHA-1 was announced: SHA-1 is a shambles. SHA-1 is deprecated but still used in DNSSEC, and this collision attack means that some attacks against DNSSEC are now merely logistically challenging rather than being cryptographically infeasible.
As a consequence, anyone who is using a SHA-1 DNSKEY algorithm (algorithm numbers 7 or less) should upgrade. The recommended algorithms are 13 (ECDSAP256SHA256) or 8 (RSASHA256, with 2048 bit keys).
Update: I have written a follow-up note about SHA-1 and DNSSEC validation
In 2017 the SHAttered attack demonstrated the first SHA-1 collision. This was not an immediate disaster for DNSSEC because SHAttered required the start of the input to have a special structure that causes a collision, and there are not many input formats that are malleable enough to accommodate the attack. SHAttered made PDF files with colliding SHA-1 hashes, but DNSSEC avoided the worst.
The SHAmbles attack is a chosen prefix collision, so an attacker can construct two input prefixes with complete freedom, and calculate suffixes that will make their SHA-1 hashes collide.
For an attack against DNSSEC, the two prefixes will be a pair of RRsets: one consisting of superficially benign trojan horse records with an owner name that is under the control of the attacker; and the other being attack records with an owner name that the attacker is targeting. Something like,
trojan-horse.example. 3600 IN TXT ... attack-target.example. 3600 IN TXT ...
The signature metadata, owner names and other DNS rubric appear at the start of the input to the signature algorithm, and the suffixes that make them collide need to be smuggled into the right-hand-side of two DNS records.
For the purpose of this analysis, we assume that the attacker has full control over the network, so they are able to intercept and spoof traffic at will. In this situation, DNSSEC should be able to detect and prevent spoofed DNS records, so the attacker can only deny service without being able to successfully alter the DNS.
(The argument is that pwning the network should look easy compared to the difficulty of breaking the cryptography.)
Our attacker doesn't have access to any endpoints or people in their target, except for a small toehold. They want to use DNSSEC to expand their toehold to compromise other systems, for example an intranet site that accepts passwords from staff in the target organization.
One way the attacker can get interesting effects from spoofing the DNS is to make some third party believe that the attacker is in control of a domain name that they are not. For example, cloud providers such as Amazon, Google, and Microsoft commonly authenticate control over a domain name using TXT records in the DNS; this is also how the ACME dns-01 challenge authenticates a request for a TLS certificate.
Our attacker wants to use a SHAmbles collision to make it possible spoof the DNS despite DNSSEC. They need to get their trojan horse records into the same zone that contains the target name. The zone owner will sign the trojan horse records, and if their signature algorithm uses SHA-1, the attacker can take the trojan horse signature and attach it to their attack RRset to make it look cryptographically legitimate.
TXT records are ideal for a SHAmbles collision. Section 1.1 of the SHAmbles paper says, "Our attack uses one partial block for the birthday stage, and 9 near-collision blocks." This is illustrated in figure 7 (p.20) in the paper, which shows the attack needs 588 bytes.
A TXT record can contain arbitrary binary data. It consists of a series of substrings, each with a 1-byte count followed by that much data. The SHAmbles collision needs more than one 255 byte substring to work, so some trickery is needed.
Our attacker's trojan horse records might look roughly like:
$ORIGIN _acme-challenge.toehold.example. @ 3600 IN TXT "innocuous stuff" @ 3600 IN TXT "\255\123...\42\69\0" ""...
The second record contains the collision blocks that make the attack work. The attacker has to ensure that the collision blocks sort after the innocuous prefix (in DNSSEC canonical sorting order), which can be as simple as ensuring that the second record is longer than the first.
The record containing the collision blocks may need to start with some arbitrary padding to align with SHA-1's 512-bit input block boundaries. The substring length octets inside the collision blocks probably cannot be controlled, so our attacker needs to add a trailer to re-align the substring lengths to the end of the TXT record. The trailer is just 255 zero bytes: the first part of the trailer uses up any remaining space in the last substring of the collision blocks, and the rest of the trailer is interpreted as zero-length substrings.
The attack records need to be constructed to line up with the collision blocks in a similar way to the trojan horse records, but everything else can be different. Our attacker mainly cares about having a different owner name and different TXT contents:
$ORIGIN _acme-challenge.intranet.example. @ 3600 IN TXT "gfj9Xq...Rg85nM" @ 3600 IN TXT "\255\222...\53\88\0\0" ""...
In these examples
@
is short for the current$ORIGIN
setting. This notation is just to stop the lines from bumping in to the right margin and wrapping.
How does our attacker get their trojan horse records into the DNS zone so that they are signed? A plausible scenario is that the attacker's toehold is a host on some shared infrastructure that provides hosts with limited DNS update capabilities. Perhaps each host has a TSIG key that allows it to publish an ACME dns-01 challenge for its own name but nothing else.
A DNSSEC signature covers a bit more than just the signed records: there is also some RRSIG metadata. This identifies the key that signed the records, and has inception and expiration times.
To make the chosen prefix collision work, our attacker has to correctly guess what these RRSIG fields will be when zone signs the trojan horse records, so that the attacker can ensure that the attack records collide. In practice, it is easy to predict the RRSIG fields by controlling the time when the trojan horse DNS update is submitted.
Our attacker has pwned the network and gained access to a low-value host; they want to gain access to something more juicy. They plan to get a TLS certificate for their target so that they can intercept logins over https without being detected.
In this situation they would normally use ACME http-01, but our
attacker is thwarted because the site publishes
CAA validationmethods
and accounturi
records
protected with DNSSEC. But it's only RSASHA1, so they have an opening.
Our attacker asks Let's Encrypt for a certificate for their target, and gets an ACME challenge. They rapidly calculate a chosen-prefix SHA-1 collision to construct their trojan horse records, and at a carefully controlled time they update the DNS zone using the TSIG key they found in their toehold.
The attacker gets the signature from the trojan horse records and uses it to make an RRSIG record for their attack records. Then they spoof DNS responses to Let's Encrypt containing their attack records, convincing Let's Encrypt that our attacker has legitimate control over the target, thereby getting a TLS certificate.
Then the attacker uses the certificate to intercept TLS traffic to the target, and get some privileged login credentials from the decrypted https traffic.
This attack is supposed to be approximately as plausible as the tech scenes in action movies. One of the ways tech in movies is often implausible is that cryptographic keys are broken by brute force in a ridiculously short period of time, instead of taking longer than the heat death of the universe.
In our attacker's scenario, they need to run a SHAmbles attack within the expiry time of an ACME challenge. This expiry time is at least a few days, so less than a factor of 10x more difficult than the proof-of-concept SHAmbles attack.
I think this is enough to argue that SHA-1 in DNSSEC is practically broken, in cases where permission to update a zone is shared.
Our example scenario used TXT records, but there are a number of other DNS record types that provide enough space to smuggle in SHA-1 collision blocks, and which an attacker might be able to use for mischief.
TLSA records can be used to authenticate TLS certificates for mail servers and other protocols. TLSA records can contain RSA keys (selector = 1 for public key, matching type = 0 for no digest) which are large enough to hide collision blocks.
The most juicy target for an attacker is to get a signature over their choice of DNSKEY records, which would allow them to freely create their own signatures for any records in the zone.
Most zones have two kinds of keys, KSKs (key-signing keys) which only sign the DNSKEY records, and ZSKs (zone-signing keys) which sign the rest of the zone. The DS records in the parent zone make the link in the chain of trust. The DS records contain hashes of the KSKs, and the DNSKEY records are only trusted if they are signed by at least one of those KSKs.
In a zone set up with a KSK/ZSK split like this, our attacker can only get records signed by a ZSK, so they are unable to get a working malicious DNSKEY.
Some zones are set up with CSKs (combined signing keys) which sign the whole zone including the DNSKEY records. In a zone using CSKs our attacker can obtain a working DNSKEY under their control.
Alongside a zone's DNSKEY records, there may be CDNSKEY and CDS records. These records are instructions to the parent zone saying what the DS records should look like. If our attacker can get signed malicious CDNSKEY or CDS records, they may be able to persuade the zone's parent to install the attacker's choice of DS records. That would also allow the attacker to freely create their own signatures for any records in the zone.
Like DNSKEY records, CDNSKEY and CDS records must be signed by the zone's KSK. So zones with separate KSK and ZSK keys are safer against collision attacks than zones with a CSK.
Our scenario was something like an enterprise environment; but the most prominent situation where a zone can be updated by multiple parties is a top-level domain. Any domain registrant can get an arbitrary set of DS records signed by a TLD.
At the time of writing there are 274 TLDs using algorithm 5 (RSASHA1) or 7 (RSASHA1-NSEC3-SHA1).
What prevents TLDs being vulnerable to SHAmbles is that the payload of a well-formed DS record is too small to hold enough collision blocks. So, provided there are enough syntax checks, it is probably not feasible at the moment for an attacker to make a trojan horse DS RRset that collides with an attack DS RRset for a different domain.
However there are some TLDs (such as .de) that allow some subdomains to insert records directly into the TLD without a delegation. This greatly increases the risk of a chosen-prefix collision attack (though .de uses RSASHA256 so it is safe against SHAmbles).
Another risky practice is hosting providers that use the same public/private key pair for large numbers of zones. There are well over 200,000 zones which share keys, and some keys are shared by over 140,000 zones.
In this situation an attacker might be able to get legitimate control over a zone with the same key as their target's zone, even though those zones are different. This attacker does not need to be surreptitious with their trojan horse records: they can set up chosen prefix collision attacks in their own zone and use their signatures to attack other zones in the same hosting setup.
In 2008, a chosen prefix collision attack against MD5 was used to create a rogue X.509 CA certificate. By 2015, it was evident that SHA-1 would soon be vulnerable to a similar attack. In 2016 the CA/Browser forum baseline requirements were updated to require that certificate serial numbers are assigned using at least 64 bits of randomness. This protects certificates against chosen prefix attacks because an attacker cannot predict the prefix on the trojan horse certificate.
The predictability of RRSIG records is similar to the predictability of X.509 certificates 10 years ago. Can we add randomness to RRSIG records to make chosen prefix collisions harder? There is some space to add randomness to the inception and expiration times.
In BIND by default the inception time is one hour before the current time (to allow for inaccurate clocks). You could subtract about 10 to 12 bits of randomness (less than 1 hour 10 minutes) without causing problems. And by default the expiration time is 30 days in the future. This could have about 16 bits of randomness added (up to about 8 hours).
More adventurously, it might be possible to randomise the original TTL field, for up to 32 more bits of randomness. The original TTL field is required to match the TTL of the records being signed, so it would be a protocol violation to randomise it. But validators might not be able to detect that a violation has occurred, in which case randomising it would be benign. Some experimentation is needed to find out if this guess matches reality!
Whenever a DNS zone is signed with a SHA-1 DNSKEY algorithm it is vulnerable to chosen prefix collision attacks. This is a problem when a zone accepts updates from multiple parties, such as:
It is also a problem when a key is re-used by multiple zones.
Zones using algorithm numbers 7 or less should be upgraded. The recommended algorithms are 13 (ECDSAP256SHA256) or 8 (RSASHA256, with 2048 bit keys).
For extra protection against chosen prefix collision attacks, zones should not share keys, and they should have separate ZSKs and KSKs.
DNSSEC zone signing software should provide extra protection against chosen prefix collisions by adding more randomness to the inception and expiration times in RRSIG records.
Software implementing CDNSKEY and CDS checks must ensure that the records are properly signed by a KSK, not just a ZSK.
Top-level domain registry software must not accept over-sized DS records.
]]>Corrections
The number of domains with shared keys was erroneously large.
I originally thought the SHA-1 input block size is 20 bytes, like its output size, but in fact the block size is 64 bytes. This means a collision requires more space than stated in earlier versions of this article, but this does not significantly affect the implications for DNSSEC. The attack outline now explains how to accommodate the larger collision blocks.
Clarified reference to SHAttered colliding PDF files.
Our previous news item on DNS delegation updates explained that we are changing the DNSSEC signature algorithm on all UIS zones from RSA-SHA-1 to ECDSA-P256-SHA-256. Among the reasons I gave was that SHA-1 is rather broken.
Today I learned that SHA-1 is a shambles: a second SHA-1 collision has been constructed, so it is now more accurate to say that SHA-1 is extremely broken.
The new "SHAmbles" collision is vastly more affordable than the 2017 "SHAttered" collision and makes it easier to construct practical attacks.
As well as the UIS zones (which are now mostly off RSA-SHA-1), Maths and the Computer Lab have a number of zones signed with RSA-SHA-1. These should also be upgraded to a safer algorithm. I will be contacting the relevant people directly to co-ordinate this change.
I have written some more detailed notes on the wider implications of SHA-1 chosen prefix collisions and DNSSEC.
]]>This note starts with two actions item for those for whom we provide secondary DNS. Then, a warning for those who secondary our zones, including stealth secondaries.
There are still a few more delegation updates to do, including for
cam.ac.uk
itself, which will happen in the new year. There will be
further announcements near the time.
We are replacing the ISC SNS with secondary DNS service provided by Mythic Beasts.
If you have DNS zones that currently list sns-pb.isc.org
in their NS
records, please update them at your convenience to use the Mythic
Beasts servers listed below. These servers are already configured with
your zones.
The replacement servers are:
ns1.mythic-beasts.com
(in Dallas)
ns2.mythic-beasts.com
(in London)
ns3.mythic-beasts.com
(in Amsterdam)
For zones that use ns2.ic.ac.uk
(also in London) we are just using
ns1
and ns3
, and skipping ns2.mythic-beasts.com
.
Mythic Beasts are the domain registrar we use for the Managed Zone Service. They also provide non-JANET network connectivity for commercial tenants on the CUDN. Outside the University, they are well known for hosting the Raspberry Pi web site.
Last year we started a DNS server renaming / renumbering project. That has been on hold for much of this year while we got some necessary infrastructure in place, and while other work took priority.
The delegations for almost all of our zones have now been updated to
use the new authoritative DNS server names like auth0.dns.cam.ac.uk
instead of authdns0.csx.cam.ac.uk
.
Still remaining to do are cam.ac.uk
itself, and a number of reverse
DNS zones related to IP address space suballocated by JANET. These
should be completed early in the new year.
If you have any zones that still use the old names, can you please update them to the new names.
A wholesale delegation clean-up is a good opportunity to make some wholesale DNSSEC improvements. Doing them at the same time saves us from repeating a lot of the same kinds of correctness checks.
We are changing the signature algorithm on all our zones from RSA-SHA-1 (and a few cases of RSA-SHA-256) to ECDSA-P256-SHA-256. This improves things in a couple of ways:
ECDSA has much smaller signatures than RSA, which leads to smaller DNS packet sizes. This helps to avoid difficulties related to packet fragmentation and fallback to TCP.
Our RSA key sizes are rather too small, and SHA-1 is rather broken. Both were in serious need of upgrading to a better security level.
All Managed Zone Service domains are now signed with ECDSA. (A few lack secure delegations owing to missing third-party support.)
Most of our reverse DNS zones are now signed with ECDSA. (Reverse DNS zones related to IP address space suballocated by JANET and Mythic Beasts lack secure delegations.)
After the holidays we will do the algorithm rollover for our large
zones, cam.ac.uk
, 111.131.in-addr.arpa
, and
in-addr.arpa.cam.ac.uk
. During the rollover the zones will have two
sets of signatures, so they will be approximately 50% larger. When the
rollover is complete they will be about 25% smaller than before. The
rollover process will take a few days, to allow for the long
time-to-live on DNS delegations.
DNS servers need to run with at least twice as much RAM as they use in normal operations, to allow for certain kinds of reconfiguration that need two copies of a zone in memory. So the rollovers should not cause problems for properly provisioned servers.
]]>So I thought it might be worth writing a little tutorial describing how I am using WebDriver. These notes have nothing to do with my scripts or the DNS; it's just about the logistics of scripting a web site.
WebDriver: the standard remote control protocol for web browsers, originating in but now somewhat separate from the Selenium project.
How to use geckodriver to automate Firefox.
Scripting JSON-over-HTTP: use the programming language of your choice, so long as you have convenient libraries for REST-flavoured web APIs.
I'm going to use the command-line program HTTPie in the examples because it makes ad-hoc experiments pretty easy.
HTML: you need to be comfortable looking at the source code of web pages.
CSS selectors: you need to be able to write CSS selectors to pick the web page elements you want your script to act on.
Xpath: sometimes CSS selectors aren't powerful enough, so it's helpful to be able to write Xpath queries or at least navigate this Xpath cheat sheet.
Firefox dev tools: the web page inspector makes this work so much easier. (The other tools are not so relevant.)
A lot of the existing web browser automation ecosystem is oriented around testing (specifically Selenium and the node.js framework webdriver.io), but my purpose is to script web sites that don't provide the APIs I need.
Get Firefox if you don't already have it.
Download a copy of geckodriver for your system, unpack it, and
copy it to ~/bin
or some other suitable place on your $PATH
.
geckodriver
is proxy between the standard WebDriver protocol and
Firefox's less convenient native "marionette" remote-control protocol.
In a terminal window, run geckodriver
. It will sit there waiting for
something to happen. Keep the terminal open; geckodriver
will use it
for logging.
geckodriver
's default WebDriver endpoint is a web server running on
localhost
port 4444. Open a second terminal window and start a
WebDriver session by running:
$ echo '{}' | http POST http://localhost:4444/session HTTP/1.1 200 OK content-type: application/json; charset=utf-8 { "value": { "capabilities": { ... snip ... }, "sessionId": "570b8399-bc01-2745-b37b-ed6c641156b3" } }
geckodriver
should start a new copy of Firefox with an ephemeral
profile (so it won't have your cookies or history or settings or
extensions etc.). The address bar will have a stripey orange
background and a little picture of a robot so you know it is being
automated.
HTTPie prints a JSON response containing a lot of information about the browser. The important part is the session ID, like
"sessionId": "570b8399-bc01-2745-b37b-ed6c641156b3"
All the actions you perform on the browser will be associated with this session by using a URL prefix like
http://localhost:4444/session/570b8399-bc01-2745-b37b-ed6c641156b3
This URL is really long so let's call it $wds
for "WebDriver session".
sessionId=570b8399-bc01-2745-b37b-ed6c641156b3 wds=http://localhost:4444/session/$sessionId
Now, make the browser navigate to a URL with the command:
$ http -v POST $wds/url url=http://www.dns.cam.ac.uk POST /session/570b8399-bc01-2745-b37b-ed6c641156b3/url HTTP/1.1 Host: localhost:4444 Content-Type: application/json { "url": "http://www.dns.cam.ac.uk" } HTTP/1.1 200 OK content-type: application/json; charset=utf-8 { "value": null }
If you see this purple web site then you have started scripting a browser!
As you have seen, it is easy to start a session with the default settings. Normally when starting a session I use options like:
{ "capabilities": { "alwaysMatch": { "timeouts": { "implicit": 2000, "pageLoad": 60000 }, "moz:firefoxOptions": { "args": [ "-headless" ] } } } }
The "implicit" timeout is to do with waiting for page elements to appear. By default it is 0 milliseconds, but I set it to 2 seconds. I am not convinced this is as helpful as I hoped because I have still had to write code that polls the browser waiting for Javascript to finish faffing around.
The "pageLoad" timeout is by default 300000 milliseconds (5 minutes) which is ridiculous. I have set it to 60 seconds which is still a lot more generous than should be necessary.
I normally leave the "moz:firefoxOptions" member out, because I'm
normally doing interactive development and I need to see what my
script is doing. But this example shows how a fully-automated and
operational script would start a session. (Annoyingly, geckodriver
returns a "moz:headless" capability, but it doesn't accept it
in requests, so we have to send it a longer version.)
It's best not to quit Firefox or kill geckodriver
when there is an
active session because it's possible to leave remnants of the
ephemeral browser profile cluttering up your disk. Instead, delete
the WebDriver session as follows, which quits the
browser and deletes its ephemeral profile. (I'm including a reminder
of what $wds
is short for - your sessionId
will be different!)
$ sessionId=570b8399-bc01-2745-b37b-ed6c641156b3 $ wds=http://localhost:4444/session/$sessionId $ http DELETE $wds HTTP/1.1 200 OK content-type: application/json; charset=utf-8 { "value": null }
Once the session is deleted, you can start a new one re-using the same
geckodriver
(but you can't have multiple concurrent sessions).
Or you can safely kill an idle geckodriver
which has no active session.
When I am writing a script to control a web site, I work with several windows:
Firefox under control of geckodriver
(not in headless mode), for
seeing what my script does to the web page
Firefox web page inspector, for working out the CSS selectors for the HTML elements I want to manipulate (this can be docked as part of the main browser window but I prefer to separate it)
An editor window for writing my script
A terminal window for running my script and logging a trace of the WebDriver protocol JSON messages, or for experiments with HTTPie
Another terminal window where geckodriver
chatters (this is
less informative and not necessary to keep visible)
Most WebDriver interaction consists of pairs of HTTP requests:
locate an element
do something with the element
The WebDriver protocol has several ways to locate elements:
css selector
link text
partial link text
tag name
xpath
Let's try an example:
$ http -v $wds/element using='link text' value='About this site' POST /session/c33be620-65b5-6944-bc41-cff38a372823/element HTTP/1.1 ... headers ... { "using": "link text", "value": "About this site" } HTTP/1.1 200 OK ... headers ... { "value": { "element-6066-11e4-a52e-4f735466cecf": "8a6f5a50-d197-c84f-a2b3-cae767dc6dab" } }
Grab the ID out of the response, and try this action:
$ elem="8a6f5a50-d197-c84f-a2b3-cae767dc6dab" $ echo {} | http POST $wds/element/$elem/click
You should see the "About this site" menu appear on the web page.
The request has an object with a "using" member containing the location strategy, in this case "link text", and a "value" member that should identify the element we want.
For obscure reasons, element IDs are returned in an object with a
member named element-6066-11e4-a52e-4f735466cecf
. This is a fixed
string that is part of the protocol, it isn't an ID! The element ID in
this example is "8a6f5a50-d197-c84f-a2b3-cae767dc6dab".
In the rest of this tutorial, when I locate an element I will set the
elem
shell variable to the element's ID. You will need to substitute
the actual ID you get from your WebDriver response.
In my WebDriver code I have a different representation of elements in the web page, which is a lot more convenient than the WebDriver protocol representation.
Because I use them so heavily, a simple string is interpreted as a CSS selector.
Other locator strategies are represented like { "link text" : "About
this site" }
because it's much shorter to omit the "using" and
"value" strings.
Or if the element has alredy been located, it is represented in raw
WebDriver form like { "element-6066-11e4-a52e-4f735466cecf":
"8a6f5a50-d197-c84f-a2b3-cae767dc6dab" }
Whenever an action method in my code (such as click
) is passed a
locator rather than a raw WebDriver element, it automatically makes an
element
request to locate the element. This neatly wraps up the two
steps of locate and action for me.
Sometimes I explicitly locate elements. This typically happens when
I'm dealing with sub-elements such as rows of a table or fields in a
form. It's neater to use a $wds/element/$elem/element
sub-element
request than to use string concatenation to build CSS or
Xpath selectors.
The element
request returns either one element or an error.
$ http $wds/element using='link text' value='weasels' HTTP/1.1 404 Not Found ... headers ... { "value": { "error": "no such element", "message": "Unable to locate element: weasels", ... snip ... } }
In my WebDriver scripts, the low-level HTTP request code catches errors like this, reports the problem and aborts the script. This is usually good, because the script will not blunder on when its idea of what is happening diverges from reality.
There is also an elements
request which can be used to find multiple
elements in one go (such as the rows of a table) or test whether an
element exists.
$ http $wds/elements using='link text' value='weasels' HTTP/1.1 200 OK ... headers ... { "value": [] }
There are several WebDriver requests for inspecting elements.
The ones that I have found most useful are the text
request, which I
have used to look at the page to check that things are working as
expected, for extracting status messages, etc.
$ http -b $wds/element using='css selector' value='h1' { "value": { "element-6066-11e4-a52e-4f735466cecf": "1ec41bf0-63cb-dc43-9b7e-728779d7b920" } } $ elem="1ec41bf0-63cb-dc43-9b7e-728779d7b920" $ http -b $wds/element/$elem/text { "value": "Overview" }
And I use the property/value
request for getting the current state
of a form. When I'm looking at a pre-filled form that might need
changes I can use this to avoid submitting if changes turn out not to
be necessary.
My main reason for writing WebDriver scripts is to automatically fill in forms. This is superficially easy, but there are traps for the unwary.
Let's navigate to this tutorial page and get the id of the simple text box that appears just below.
$ http -b POST $wds/url \ url=http://www.dns.cam.ac.uk/news/2019-12-12-webdriver.html { "value": null } $ http -b $wds/element using='css selector' value='#wd-text' { "value": { "element-6066-11e4-a52e-4f735466cecf": "7cfbe5ea-903e-c945-898b-d3182852691c" } } $ elem="7cfbe5ea-903e-c945-898b-d3182852691c"
We can enter something in the box:
$ http -b $wds/element/$elem/value text='badger' { "value": null }
You should see a badger in the wd-text
box. If you run the command
more than once, you will see multiple badgers in the box.
The value
request does not set the value of a form input as you
might hope. Instead it simulates typing!
So, to correctly fill a text input you need to clear it first, like:
$ echo '{}' | http POST $wds/element/$elem/clear { "value": null } $ http -b $wds/element/$elem/value text='snake' { "value": null }
Then you can be sure you have only a snake.
Because it pretends to type at an element, the value
request is no
use for setting the value of a menu.
$ http -b $wds/element using='css selector' value='#wd-sel' { "value": { "element-6066-11e4-a52e-4f735466cecf": "9b4fa642-d7e2-e942-a90a-b6700d1b9eef" } } $ elem="9b4fa642-d7e2-e942-a90a-b6700d1b9eef"
$ http -b $wds/element/$elem/value text='bcde' { "value": null } $ http -b $wds/element/$elem/property/value { "value": "cdef" }
If you try this you will find it doesn't select the option as
expected - my property/value
request read back "cdef" not "bcde"!
(It doesn't even behave in anything like a way that I can understand!)
Instead you need to click on the relevant option, like:
$ http -b $wds/element using='css selector' \ value='#wd-sel option[value="bcde"]' { "value": { "element-6066-11e4-a52e-4f735466cecf": "450dc8b3-aa9c-b241-b15b-3b66cdefa91a" } } $ elem="450dc8b3-aa9c-b241-b15b-3b66cdefa91a" $ echo '{}' | http POST $wds/element/$elem/click { "value": null }
In cases where the option values don't have straightforward meanings, I have found it helpful to use Xpath to match the option text, like:
$ http -b $wds/element using='xpath' \ value='//select[@id="wd-sel"]/option[text()="bcde"]' { "value": { "element-6066-11e4-a52e-4f735466cecf": "450dc8b3-aa9c-b241-b15b-3b66cdefa91a" } }
There are a few other tricky cases that I have encountered.
One of my scripts has to deal with a pop-up date picker. Fortunately I can just type into the date box and ignore the picker - except that the picker pops over another element that I want to click on. In that situation, WebDriver returns an error saying you can't click on an obscured element.
So I had to make my script click elsewhere to dismiss the date picker, before clicking on the obscured drop-down menu.
Most WebDriver actions return a response after the action has completed, so scripts don't have to worry about all the multi-process machinery that is making it work.
However, when a click activates some JavaScript that does the actual thing, WebDriver returns a response immediately. There are cases where the thing is slow (such as performing a back-end API request) so it is fairly obvious that the WebDriver script gets a response before the browser is done.
My scripts handle this by repeatedly making elements
requests until
the expected element appears. There's a timeout in case something
unexpected happens.
You also need to beware of cases where the thing is fast (such as manipulating the DOM to adjust a form) because that can lead to tricky race conditions between the WebDriver script turn-around time and the JavaScript completion time.
That is basically everything I have needed to learn about WebDriver to make it useful for scripting web sites.
I have found that most of the work scripting a site is finding out how to automatically navigate the site while ensuring that it is working as expected. WebDriver itself has not been much of a pain point!
There are a bunch of other things that you can do with webdriver such as manipuating windows and taking screenshots, but I haven't needed them.
]]>Along the way I think I became convinced there's an opportunity for a significant improvement.
Since Rachel Kroll wrote about "make before break" recently, the phrase has been on my mind. Rachel's article is a great example of why I am working towards a transactional API and user interface for IP Register.
More generally, make-before-break is a fundamental safety technique (try Go Ape to see it in another context) and it's a core part of the way I go about things. It's why I check web servers work before changing the DNS and why I play tricks with redirects to provision Let's Encrypt certificates before a web server goes live.
This afternoon's work was a series of configuration changes that were planned, tested, and put into production. By keeping the dependencies in mind and following the make-before-break rule, I could do it without booking downtime in an out-of-hours at-risk period.
The "bad quick-and-dirty hack" was to use our hidden primary server as a zone transfer relay/fanout server for a third-party vendor RPZ block list. This violated our security principle that the primary server should not talk to the outside world. An appallingly bad choice, driven (if I remember correctly) by the limited length of the ACL the vendor would allow (smaller than our outward-facing DNS server clusters) and because I had not yet developed a plan for separate zone transfer relay servers.
The most difficult part of the reshuffle plan was to change the service architecture to add these zone transfer relay servers. Although I sketched out the idea a year ago, I did not really start planning how to get there from here until now. This afternoon was the smallest possible first step.
Since the new RPZ feeds will work basically the same as our existing feed, fixing the "bad quick-and-dirty hack" by taking a small step towards separate zone transfer relay servers also solves the immediate need.
The zone transfer servers are going to take over the IPv4 addresses
currently used by auth0.dns
and auth1.dns
since there are lots of
other people with configurations that have those addresses wired in
for zone transfers, and rather than creating work for others, it's
comparatively easy for me to change a few glue records.
Our DNS server configuration has a static part provisioned by Ansible; and a dynamic part provisioned from our database back-ends and a simplified configuration file. On an orthogonal axis are the different flavours of server: authoritative, recursive, hidden primary, and (embryonic) zone transfer servers.
The work went roughly as follows. Each point was a plan / code / test / deploy cycle.
Prepare ACLs in static config
There was some accumulated cruft that needed cleaning up, but the
significant parts were to add zone transfer source declarations
(masters
clauses in BIND configuration files) for the new RPZ
provider and for the embryonic zone transfer servers.
PREP: add new as-yet-unused config clauses
Create new dynamic config for zone transfer servers
We're turning auth servers into xfer servers so this was just the tiniest necessary difference: xfer servers relay the RPZ block lists, but auth servers don't.
NEEDS: not actually any of the new static config clauses, because the new RPZ block lists are not quite ready
PREP: new as-yet-unused config file
Create new static config for xfer servers
Again the tiniest necessary difference.
NEEDS: the dynamic config from the previous step
PREP: new as-yet-unused config file
Add auth0 and auth1 to RPZ vendor ACL
PREP: they need to be able to get the zones!
Reconfigure auth0 and auth1 to use xfer config
Still acting as auth servers because the config has hardly diverged at all.
NEEDS: the looser ACLs from the previous step
NEEDS: the config from the two steps before that
MAKE: zone transfer servers can now relay RPZ block lists
Reconfigure recursive servers
Get RPZ block lists from auth0 / auth1 (future zone transfer relay servers) instead of hidden primary. A user-facing change so always needs extra care and attention.
NEEDS: static config from first step
NEEDS: RPZ block lists on xfer servers from previous step
FLIP: from old hotness to new and busted
Fix zone transfers for RPZ block lists
By mistake the auth/xfer servers were getting the RPZ block lists from our hidden primary not direcly from the vendor, because that's what our recursive servers used to do. This kind of latent bug is why you double-check before removing things that are not supposed to be used any more...
Drop RPZ block list from hidden primary
BREAK: recursive servers used to depend on this
Drop hidden primary from RPZ vendor ACL
BREAK: zone transfers for RPZ block lists needed this
Remove evil firewall hole on hidden primary
BREAK: zone transfers for RPZ block lists needed this
That looks a lot neater in terms of planning and execution than it felt like at the time. I think that's because the scruffy indecisiveness happened mostly when I was working out what I needed to do, and I didn't do anything until I was sure it was heading in the right direction and not going to break anything.
Maybe you can see some of the difficulty in the number of PREP steps. There could have been fewer steps if the configuration modules were more self-contained.
One of the things about the DNS server config that makes me uncomfortable is that the dynamic and static configurations are quite tightly coupled, in that the dynamic config depends on ACLs whose names are defined in the static config. So although they look like separate parts of the system from the way the code is laid out, careless changes to either of them can easily break things.
Some of this discomfort dates back to when I redid the server setup with Ansible: before that the DNS server configuration scripts were more unified. I moved half of the configuration to Ansible without cutting the dependencies. My excuse is that it was a mad rush and I was already redesigning too many parts of the system!
It would be fairly easy to move the definition of the ACLs into the dynamic configuration script, so that it can produce a self-contained configuration file with ACLs expanded in-line instead of being referred to by name. This would also make the dynamic configuration more like a catalog zone, which I would like to use to make our recursive and authoritative servers more self-configuring.
But RPZ block lists are more difficult to decouple in this way. Partly this is because the dependency is in the opposite direction: zone configs depend on the static config for ACL names, but for RPZ block lists the static config also depends on the zones. We have a mildly tricky bootstrapping hack to cut this loop on a freshly provisioned server. And a minor planning blight for the future is that RPZ config depending on catalog zones is a no-no.
The main blocker to fixing this is that the simplified zone config is really nice: one line per zone expands to 4 or 5 lines of config for 3 or 4 flavours of server (with help from a short script). So the challenge is to come up with something similarly short that keeps the different flavours of server in sync, without the mistakes of the past.
Perhaps this is another case where YAML and Jinja2 will displace Perl, for those parts of the dynamic config that are not (in practice) dynamic...
]]>Both YAML and Markdown are terrible in several ways.
YAML is ridiculously over-complicated and its minimal syntax can hide minor syntax errors turning them into semantic errors. (A classic example is a list of two-letter country codes, in which Norway (NO) is transmogrified into False.)
Markdown is poorly defined, and has a number of awkward edge cases where its vagueness causes gotchas. It has spawned several dialects to fill in some of its inadequacies, which causes compatibility problems.
However, they are both extremely popular and relatively pleasant to write and read.
For this web site, I have found that a couple of simple sanity checks are really helpful for avoiding cockups.
One of YAML's peculiarities is its idea of storing multiple documents in a stream.
A YAML document consists of a ---
followed by a YAML value. You can
have multiple documents in a file, like these two:
--- document: one --- document: two
YAML values don't have to be key/value maps: they can also be simple strings. So you can also have a two-document file like:
--- one --- two
YAML has a complicated variety of multiline string
syntaxes. For the simple case of a
preformatted string, you can use the |
sigil. This document is like
the previous one, except that the strings have newlines:
--- | one --- | two
The source files for this web site each start with something like this (using this page as an example, and cutting off after the title):
--- tags: [ progress ] authors: [ fanf2 ] --- | YAML and Markdown =================
This is a YAML stream consisting of two documents, the front matter (a key/value map) and the Markdown page body (a preformatted string).
There's a fun gotcha. I like to use underline for headings because it helps to make them stand out in my editor. If I ever have a three-letter heading, that splits the source file into a third YAML document. Oops!
So my static site generator's first sanity check is to verify there are exactly two YAML documents in the file.
Aside: There is also a YAML document end marker, ...
, but I have
not had problems with accidentally truncated pages because of it!
Practically everything (terminals, editors, pagers, browsers...) by default has tab stops every 8 columns. It's a colossal pain in the arse to have to reconfigure everything for different tab stops, and even more of a pain in the arse if you have to work on projects that expect different tab stop settings. (PostgreSQL is the main offender of the projects I have worked with, bah.)
I don't mind different coding styles, or different amounts of indentation, so long as the code I am working on has a consistent style. I tend to default to KNF (the Linux / BSD kernel normal form) if I'm working on my own stuff, which uses one tab = one indent.
The only firm opinion I have is that if you are not using 8 column tab stops and tabs for indents, then you should use spaces for indents.
Markdown uses indentation for structure, either a 4-space indent or a tab indent. This is a terrible footgun if tabs are displayed in the default way and you accidentally have a mixture of spaces and tabs: an 8 column indent might be one indent level or two, depending on whether it is a tab or spaces, and the difference is mostly invisible.
So my static site generator's second sanity check is to ensure there are no tabs in the Markdown.
This is a backup check, in case my editor configuration is wrong and unintentionally leaks tabs.
]]>For full details, please see the announcement messages: https://lists.isc.org/pipermail/bind-announce/2019-October/thread.html
The vulnerabilities affect two features that are new in BIND 9.14, mirror zones and QNAME minimization, and we are not affected because we are not using either feature.
"Query name minimization" is a DNS privacy enhancement that changes the resolver algorithm to avoid leaking details of queries to the root and top-level domain name servers.
As noted when we upgraded to BIND 9.14, we have disabled QNAME minimization to avoid interoperability problems with the current algorithm.
Our resolvers don't do any query forwarding either, so we avoid this vulnerability twice over.
The aim of this feature is to allow a resolver to host its own "hyperlocal" copy of the DNS root zone. This can speed up queries that are not in the resolver's cache. Unlike previous ways of configuring a hyperlocal root zone, mirror zones do proper DNSSEC validation of the zone contents to ensure they are not tampered with.
There is another recent feature, negative answer synthesis, which uses the results of DNSSEC validation to generate negative answers from the contents of the resolver's cache, without having to query authoritative servers.
These two features have substantially the same effect, of reducing the amount that resolvers need to make long-distance queries to find out that a mistyped domain name doesn't exist. But mirror zones are basically only useful for the root zone, and they require special configuration; whereas negative answer synthesis is useful for many other parts of the DNS, and needs no configuration.
So we don't use mirror zones, and we aren't at risk from this vulnerability.
(Geoff Huston wrote Expanding the DNS Root: Hyperlocal vs NSEC Caching which comes to the same conclusions.)
]]>Since 2015 we have used the Internet Systems Consortium secondary name service (ISC SNS) to provide off-site DNS service for University domains.
ISC announced yesterday that the SNS is closing down in January 2020, so we need alternative arrangements.
We have not yet started to make any specific plans, so this is just to let you know that there will be some changes in the next few months. We will let you know when we have more details.
]]>I have a collection of domain registration management scripts called superglue, which have always been an appalling barely-working mess that I fettle enough to get some task done then put aside in a slightly different barely-working mess.
I have reduced the mess a lot by coming up with a very simple convention for storing login credentials. It is much more consistent and safe than what I had before.
One of the things superglue
always lacked is a coherent way to
handle login credentials for registr* APIs. It predates regpg by a
few years, but regpg
only deals with how to store the secret parts
of the credentials. The part that was awkward was how to store the
non-secret parts: the username, the login URL, commentary about what
the credentials are for, and so on. The IP Register system also has
this problem, for things like secondary DNS configuration APIs and
database access credentials.
There were actually two aspects to this problem.
My typical thoughtless design process for the superglue
code that
loaded credentials was like, we need a username and a password, so
we'll bung them in a file separated by a colon. Oh, this service needs
more than that, so we'll have a multi-line file with fieldname colon
value on each line. Just terrible.
I decided that the best way to correct the sins of the past would be to use an off-the-shelf format, so I can delete half a dozen ad-hoc parsers from my codebase. I chose YAML not because it is good (it's not) but because it is well-known, and I'm already using it for Ansible playbooks and page metadata for this web server's static site generator.
When designing regpg I formulated some guidelines for looking after secrets safely.
From our high-level perspective, secrets are basically blobs of random data: we can't usefully look at them or edit them by hand. So there is very little reason to expose them, provided we have tools (such as regpg) that make it easy to avoid doing so.
Although regpg isn't very dogmatic, it works best when we put each secret in its own file. This allows us to use the filename as the name of the secret, which is available without decrypting anything, and often all the metadata we need.
That weasel word "often" tries to hide the issue that when I wrote it two years ago I did not have an answer to the question, what if the filename is not all the metadata we need?
I have found that my ad-hoc credential storage formats are very bad
for secret hygiene. They encourage me to use the sinful regpg edit
command, and decrypt secrets just to look at the non-secret parts, and
generally expose secrets more than I should.
If the metadata is kept in a separate cleartext YAML file, then the comments in the YAML can explain what is going on. If we strictly follow the rule that there's exactly one secret in an encrypted file and nothing else, then there's no reason to decrypt secrets unnecessarily everything we need to know is in the cleartext YAML file.
I have released regpg-1.10 which includes ReGPG::Login a Perl library for loading credentials stored in my new layout convention. It's about 20 simple lines of code.
Each YAML file example-login.yml
typically looks like:
# commentary explaining the purpose of this login --- url: https://example.com/login username: alice gpg_d: password: example-login.asc
The secret is in the file example-login.asc
alongside. The library
loads the YAML and inserts into the top-level object the decrypted
contents of the secrets listed in the gpg_d
sub-object.
For cases where the credentials need to be available without someone
present to decrypt them, the library looks for a decrypted secret file
example-login
(without the .asc
extension) and loads that instead.
The code loading the file can also list the fields that it needs, to provide some protection against cockups. The result looks something like,
my $login = read_login $login_file, qw(username password url); my $auth = $login->{username}.':'.$login->{password}; my $authorization = 'Basic ' . encode_base64 $auth, ''; my $r = LWP::UserAgent->new->post($login->{url}, Authorization => $authorization, Content_Type => 'form-data', Content => [ hello => 'world' ] );
Secret storage in the IP Register system is now a lot more coherent, consistent, better documented, safer, ... so much nicer than it was. And I got to delete some bad code.
I only wish I had thought of this sooner!
]]>Most of this article is background information explaining the rationale for this change. The last section below gives an outline of the implementation details.
There is a widespread effort amongst software developers and network
operators to improve DNS privacy. A major part of the work
is to encrypt DNS traffic. The University's central DNS
resolvers, rec.dns.cam.ac.uk
, support encrypted DNS. Recent
versions of Android automatically encrypt DNS when the network's DNS
servers support DNS-over-TLS, and Firefox can be configured to encrypt
its DNS traffic using DNS-over-HTTPS; our documentation for encrypted
DNS explains more.
(We have also supported DNSSEC since 2009; DNSSEC is about DNS data authenication and integrity, but it does not provide encryption or privacy.)
DNS-over-HTTPS ("DoH") is a straightforward method for tunnelling DNS queries over HTTPS requests. In isolation it seems to be a ridiculously over-complicated way to encrypt DNS, but there are a couple of contexts where it makes some sense.
DoH allows applications running in restricted environments (i.e. JavaScript in a web browser) to make DNS queries when the only network requests they can make are some kind of HTTP. This was the main scenario discussed when the DoH specification was being prepared, but in practice almost nobody does this.
DoH allows you to make encrypted DNS queries when the network doesn't support encrypted DNS and/or blocks port 853 (DNS-over-TLS). This is the way DoH is being deployed in Firefox.
The developers of Firefox are very keen to deploy encrypted DNS as quickly as they can. Their plan is that by default Firefox will use DoH to bypass the network provider's DNS servers and the operating system's DNS configuration and instead use Cloudflare's centralized DNS service. Firefox will start doing this for users in the USA later this month.
The benefits they see are that:
Your network provider won't be able to use their DNS server traffic to see which web sites you have visited, and sell that data to third parties.
Your network provider won't be able to redirect DNS answers to different places. (DNSSEC can also prevent that.)
Your network provider can't censor DNS to block access to sites.
Firefox can avoid very badly performing DNS servers, although DoH to Cloudflare is slower on average.
The Firefox developers trust Cloudflare's DNS not to misbehave in these ways.
Tunnelling DNS over HTTP(s) in this way is not a new idea. What is different is Firefox's plan to deploy it as a mass-market default. This has caused widespread consternation.
The DNS is a very convenient point of control for network security.
DNS telemetry can identify infected devices that are trying to contact malware command-and-control servers
DNS blocks can help to protect against phishing and stop ads.
The big UK ISPs use the DNS as part of their system for blocking access to child pornography and other officially censored web pages.
The discussion around Firefox's deployment of DoH has been remarkably
bad-tempered. Part of the problem is that Firefox is removing a
security mechanism without providing a replacement. Network providers
and enterprises block malware and phishing on their DNS servers, and
home users use software like Pi-Hole or custom hosts
files
to block malware and ads. Firefox's DoH implementation will stop these
blocks from working.
There is also an awkward question about consent. Until now, network providers have relied on the user's sign-up agreement to give consent to the provider's overall approach to managing their network (DNS and everything else) as a bundle. Don't like it? Choose another provider. Firefox is using choice of software as implied consent to change the DNS configuration and bypass existing DNS-related security mechanisms.
More awkwardly, it isn't reasonable to expect the vast majority of people to make an informed choice about their DNS configuration or give meaningful consent to any changes. We can't demand that they spend time learning arcane details so they can understand the implications. Even the experts can't predict the consequences of Firefox's DoH deployment.
To be honest, the DNS isn't a particularly good place to implement a security policy. For instance, a mobile device can bypass the University's DNS anti-phishing blocks by just switching from eduroam WiFi to 3G/4G cellular.
It would be better if your computer came with software that made it easy to subscribe to block lists, inspect them, edit them, and remove them if they are more annoying than useful. And if it were easy for network providers to publish block lists and make you aware of them. Then your anti-phishing / anti-malware / anti-spam protections would not depend on where you get your DNS from.
Browsers come with a Safe Browsing block list by default, but this is a proprietary Google service which others can't easily contribute to, and it isn't designed to be easily inspected.
Ad blocking software such as uBlock Origin allows you to subscribe to block lists but it isn't easy for network providers to offer their own custom lists.
But this isn't an ideal world, and at present DNS blocks are the most workable way to provide anti-phishing / anti-malware / anti-spam protections to lots of people.
The most recent Firefox blog article about DoH mentions a few of the problems they have encountered with their DoH rollout:
Cloudflare can't access private DNS and split DNS for private networks.
Parental controls are broken (along with other DNS blocks, but the article doesn't mention other reasons for blocking sites).
Managed devices on enterprise networks must not bypass the managed enterprise DNS.
The Firefox developers have decided that other issues are less important to them than encrypted DNS:
DoH to a distant Cloudflare DNS server is slower than to a local DNS server. On average DoH is slower.
Centralizing DNS on Cloudflare reduces the healthy diversity of the Internet.
Redirecting DNS queries to Cloudflare sends web browsing metadata outside the EU to an American company.
To avoid problems with Firefox's implementation of DoH, network
providers can tell Firefox to use the default DNS settings
for the network and operating system. This is done by blocking
DNS queries for use-application-dns.net
.
The University's DNS blocks include use-application-dns.net
.
This will have no effect in the short term, because Firefox is not yet
planning to roll out DoH in the UK, but we think it is worth deploying
the change sooner rather than later.
This is regrettable because we would prefer more widespread use of encrypted DNS, but Firefox's default settings are too ham-fisted and problematic. They aren't giving us a nice way to tell Firefox to use our DoH service so all we can do is disable it by default and encourage enthusiastic users to configure it manually.
After we made this change, the Firefox developers announced that they have no plans to roll out DoH to Cloudflare as the default setting for users in the UK. However they are continuing to consult interested parties so the situation is still in flux.
]]>You have an existing web site, say www.botolph.cam.ac.uk
, which is
set up with good TLS security.
It has permanent redirects from http://…
to https://…
and from
bare botolph.cam.ac.uk
to www.botolph.cam.ac.uk
. Permanent
redirects are cached very aggressively by browsers, which take
"permanent" literally!
The web site has strict-transport-security with a long lifetime.
You want to migrate it to a new server.
If you want to avoid an outage, the new server must have similarly good TLS security, with a working certificate, before the DNS is changed from the old server to the new server.
But you can't easily get a Let's Encrypt certificate for a server until after the DNS is pointing at it.
As in my previous note, we can use the fact that Let's Encrypt will follow redirects, so we can provision a certificate on the new server before changing the DNS.
In the http virtual hosts for all the sites that are being migrated
(both botolph.cam.ac.uk
and www.botolph.cam.ac.uk
in our example),
we need to add redirects like
Redirect /.well-known/acme-challenge/ \ http://{{newserver}}/.well-known/acme-challenge/
where {{newserver}}
is the new server's host name (or IP address).
This redirect needs to match more specifically than the existing
http
-> https
redirect, so that Let's Encrypt is sent to the new
server, while other requests are bounced to TLS.
Run the ACME client to get a certificate for the web sites that are
migrating. The new server needs to serve ACME challenges for the web
site names botolph.cam.ac.uk
and www.botolph.cam.ac.uk
from the
{{newserver}}
default virtual host. This is straightforward with
the ACME client I use, dehydrated.
It should now be safe to update the DNS to move the web sites from the old server to the new one. To make sure, there are various tricks you can use to test the new server before updating the DNS [1] [2].
]]>This is roughly in order of priority.
This has been my main focus over the last few months. The remaining work is:
the list, xlist, and table ops pages;
more friendly input validation;
documentation, both internals and user-facing;
deployment!
As a possible alternative to IP Register v4. I should have enough headspace to look at this properly after the v3 rollout.
The current recursive DNS server hardware is nearly 5 years old so it is due for a refresh. (The recursive servers are bare metal to avoid dependency loops with other services.)
The DNS systems need to move from Debian 9 "Stretch" to 10 "Buster". It's probably easiest to do this after the new hardware has arrived.
This is a leftover from the renaming project started last year. This involves:
completing the registr* client automation for maintaining our domain delegations;
updating our secondary DNS arrangements, probably including separating zone transfers and secondary service from our authoritative servers;
After that there will be more work moving the back-end IP Register database off Oracle on Jackdaw, then more work on improving the web user interface. The details are murky.
]]>Previously, eligibility was restricted to (basically) being located in the EU. The change extends eligibility for domains owned by individuals to EU citizens everywhere.
Organizations in the UK that have .eu domain names will still need to give them up at Brexit.
Thanks to Alban Milroy for bringing this to our attention.
]]>I'm super keen to hear any complaints you have about the existing user interface. Please let ip-register@uis.cam.ac.uk know of anything you find confusing or awkward! Not everything will be addressed in this round of changes but we'll keep them in mind for future work.
Jackdaw has separate pages to download an API cookie and manage API cookies. The latter is modal and switches between an overview list and a per-cookie page.
In v3 they have been commbined into a single page (screenshot below) with less modality, and I have moved the verbiage to a separate API cookie documentation page.
While I was making this work I got terribly confused that my v3 cookie page did not see the same list of cookies as Jackdaw's manage-cookies page, until I realised that I should have been looking at the dev database on Ruff. The silliest bugs take the longest to fix...
Today I have started mocking up a v3 "single ops" page. This is a bit of a challenge, because the existing page is rather cluttered and confusing, and it's hard to improve within the constraint that I'm not changing its functionality.
I have re-ordered the page to be a closer match to the v3 box ops
page. The main difference is that the
address
field is near the top because it is frequently used as a
primary search key.
There is a downside to this placement, because it separates the address from the other address-related fields which are now at the bottom: the address's mzone, lan, subnet, and the mac and dhcp group that are properties of the address rather than properties of the box.
On the other hand, I wanted to put the address-related fields near the
register
and search
buttons to hint that they are kind of related:
you can use the address-related fields to get the database to
automatically pick an address for registration following those
constraints, or you can search for boxes within the constraints.
Did you know that (like table ops but unlike most other pages) you can use SQL-style wildcards to search on the single ops page?
Finally, a number of people said that the mzone / lan boxes are super awkward, and they explicitly asked for a drop-down list. This breaks the rule against no new functionality, but I think it will be simple enough that I can get away with it. (Privileged users still get the boxes rather than a drop-down with thousands of entries!)
]]>On top of the previous refactorings, the new code is quite a lot smaller than the existing web interface on Jackdaw.
page | v2 lines | v3 lines | change |
---|---|---|---|
box | 283 | 151 | 53% |
vbox | 327 | 182 | 55% |
aname | 244 | 115 | 47% |
cname | 173 | 56 | 32% |
mx | 253 | 115 | 45% |
srv | 272 | 116 | 42% |
motd | 51 | 22 | 43% |
totp | 66 | 20 | 30% |
There are two primary goals for the ipreg v3 web interface:
Move the web interface off Jackdaw
Make it possible to move the database off Jackdaw and Oracle and on to PostgreSQL
The move to PostgreSQL will allow us to change the schema, APIs, and web user interface more significantly. The "v4" label is for those longer term plans.
This work does not change the user-visible structure of the ipreg web interface: there is the same set of ops pages, which do the same things, and report errors by dumping Oracle error codes into the web page.
The aim is for v3 to be just enough to get off Jackdaw, remaining compatible with API clients apart from the change of URL.
In the previous article I outlined some of the changes to the framework code as it has moved off Jackdaw. A lot of this is necessary because of differences in how the web app authenticates to Oracle, and to support parallel running between Oracle and PostgreSQL while the back end is ported later on.
While refactoring and porting the ops pages to the new framework, it has become more clear where the trouble-spots are for moving the database to PostgreSQL. There will be more work needed to improve the separation between the web and database layers.
To make this project more rewarding, I am making some small improvements that should make the web pages nicer to look at and slightly less annoying to use. Minor things like:
Lists instead of drop-down menus for choosing things like IP addresses of multihomed boxes
More lenient handling of extra whitespace in input boxes, or trailing dots on domain names
Better autofocus and more useful default button
So far I have ported the simpler pages. The idea is to get the hang of things and solve many of the probems in an easier setting, to avoid getting bogged down by having to deal with too much at once in the more complictated pages.
You can't try these out yourselves yet, because v3 is still in heavy development and often broken. But if you have any reactions, positive or negative, please let ip-register@uis.cam.ac.uk know!
For reference, here's the existing box_ops
page on Jackdaw.
(Click to embiggen any of these screenshots.)
Top right, there is a small chevron which allows you to compactify or embiggen the Project Light header and footer. I am showing the pages in compact mode. (This chevron also works on the main www.dns.cam.ac.uk web site and the git.uis.cam.ac.uk server.)
Bottom middle in the footer is the DNS update time confirmation. (On www.dns.cam.ac.uk this slot has the page timestamp.)
The Project Light navigation menu has the links to the ops pages, and highlights the current page instead of disappearing it.
The sidebar has the other bits and bobs from the header of the old pages. The "connection" box bottom right might become more interesting during the port from Oracle to PostgreSQL.
Labels for mandatory fields are bold.
Press enter for the display
action.
Privileged users get a few extra options. I'm going to use my main account to take most of the screenshots so that I can show what the forms look like when populated.
Each edit
button copies a record into the input fields below so that
the record can be manipulated. In this example, I hit edit
then
del
on the IPv4 address of a DNS server, which provoked an error
from Oracle because of an aname constraint check.
[ This data is very stale - "ruff" is a clone of jackdaw that we are using for development work. So this example is from before the DNS server rename. Disregard any recent "last updated" notes in the screenshots below, because they are a result of me testing, they are not live data. ]
The first and simplest page. When there is a message of the day it is displayed at the top of the sidebar.
The last and most complicated page so far.
I always found the old service_ops page more confusing than I would like. So, there is now a placeholder hint about the format of the srv name field, and the input boxes are now in the same order as in the DNS.
Since the ipreg security revamp about two and a half years ago, we have used TOTP second factor authentication for privileged users to protect against compromise of UIS passwords. This page allows us to display a QR code for enrolling new devices for ourselves or each other. No, you don't get to see my QR code :-)
Occasionlly the update process breaks. It is written fairly conservatively so that if anything unexpected happens it stops and waits for someone to take a look. Some parts of the build process are slightly unreliable, typically parts that push data to other systems. Many of these push actions are not absolutely required to work, and it is OK to retry when the build job runs again in an hour.
Over time we have made the DNS build process less likely to fail-stop, as we have refined the distinction between actions that must work and actions that can be retried in an hour. But the build process changes, and sometimes the new parts fail-stop when they don't need to. That happened earlier this week, which prompted us to add the last update time stamp, so you a little more visibility into how the system is working (or not).
]]>https://rec.dns.cam.ac.uk/
. (Our DNS servers are only available on
the CUDN so this setting isn't suitable for mobile devices.)
Very recent versions of Firefox also support encrypted server name indication. When connecting to a web server the browser needs to tell the web server which site it is looking for. HTTPS does this using Server Name Indication, which is normally not encrypted unlike the rest of the connection. ESNI fixes this privacy leak.
To enable ESNI, go to about:config
and verify that
network.security.esni.enabled
is true
.
Unfortunately the Managed Zone Service operating system upgrade this evening failed when we attempted to swap the old and new servers. As a result the MZS admin web site is unavailable until tomorrow.
DNS service for MZS domains is unaffected.
We apologise for any inconvenience this may cause.
]]>This vulnerability affects all supported versions of BIND.
Hot on the heels of our upgrade to 9.14.2 earlier this week, I will be patching our central DNS servers to 9.14.3 today. There should be no visible interruption to service.
]]>I have a smallish number of web servers (currently 3) and a smallish number of web sites (also about 3). I would like any web server to be able to serve any site, and dynamically change which site is on which server for failover, deployment canaries, etc.
If server 1 asks Let's Encrypt for a certificate for site A, but site A is currently hosted on server 0, the validation request will not go to server 1 so it won't get the correct response. It will fail unless server 0 helps server 1 to validate certificate requests from Let's Encrypt.
I considered various ways that my servers could co-operate to get certificates, but they all required extra machinery for authentication and access control that I don't currently have, and which would be tricky and important to get right.
However, there is a simpler option based on HTTP redirects. Thanks to Malcolm Scott for reminding me that ACME http-01 validation requests follow redirects! The Let's Encrypt integration guide mentions this under "picking a challenge type" and "central validation servers".
Instead of redirecting to a central validation server, a small web server cluster can co-operate to validate certificates. It goes like this:
server 1 requests a cert for site A
Let's Encrypt asks site A for the validation response, but this request goes to server 0
server 0 discovers it has no response, so it speculatively replies with a 302 redirect to one of the other servers
Let's Encrypt asks the other server for the validation response; after one or two redirects it will hit server 1 which does have the response
This is kind of gross, because it turns 404 "not found" errors into 302 redirect loops. But that should not happen in practice.
My configuration to do this is a few lines of mod_rewrite. Yes, this doesn't help with the "kind of gross" aspect of this setup, sorry!
The rewrite runes live in a catch-all port 80 <VirtualHost>
which
redirects everything (except for Let's Encrypt) to https. I am not
using the dehydrated-apache2
package any more; instead I have copied
its <Directory>
section that tells Apache it is OK to serve
dehydrated's challenge responses.
I use Ansible's Jinja2 template module to install the
configuration and fill in a couple of variables: as usual,
{{inventory_hostname}}
is the server the file is installed on, and
in each server's host_vars
file I set {{next_acme_host}}
to the
next server in the loop. The last server redirects to the first one,
like web0 -> web1 -> web2 -> web0. These are all server host names,
not virtual hosts or web site names.
<VirtualHost *:80> ServerName {{inventory_hostname}} RewriteEngine on # https everything except acme-challenges RewriteCond %{REQUEST_URI} !^/.well-known/acme-challenge/ RewriteRule ^ https://%{SERVER_NAME}%{REQUEST_URI} [L,R=301] # serve files that exist RewriteCond /var/lib/dehydrated/acme-challenges/$1 -f RewriteRule ^/.well-known/acme-challenge/(.*) \ /var/lib/dehydrated/acme-challenges/$1 [L] # otherwise, try alternate server RewriteRule ^ http://{{next_acme_host}}%{REQUEST_URI} [R=302] </VirtualHost> <Directory /var/lib/dehydrated/acme-challenges/> Options FollowSymlinks Options -Indexes AllowOverride None Require all granted </Directory>]]>
The main consequence of this upgrade is that we will be implementing the DNS Flag Day protocol changes. The DNS resolvers will no longer have code to work around broken and buggy domain names. In the past these domains would have been very slow to resolve, whereas in the future they will be more likely to fail completely.
There are very few domains that are broken in this way: most of them were fixed in 2018 during the preparation for the DNS Flag Day.
It is possible for us to configure a workaround for broken domain names. You can use our resolver consistency test page; if our resolvers don't work when other public resolvers do work, you can report the problem to ip-register@uis.cam.ac.uk.
We are skipping from 9.12 to 9.14 because BIND now has an odd/even version numbering scheme: 9.13 was the development version that became 9.14. There is a chart displaying the BIND release numbering plan and support schedule in the BIND 9.14 release announcement
BIND 9.14 includes a DNS privacy enhancement called "query name minimization". This changes the resolver algorithm to avoid leaking details of queries to the root and top-level domain name servers.
QNAME minimization is on by default in BIND 9.14 but we will turn it off. Unfortunately the current implementation causes problems with lame delegations, and there isn't any way to exclude particular broken domain names from QNAME minimization.
There is work in progress to make BIND's QNAME minimization algorithm more lenient, so I hope we will be able to turn it on when BIND 9.16 is released next year.
At the same time as the BIND upgrade, we will also upgrade our implementation of DNS-over-HTTPS and DNS-over-TLS from OpenResty 1.13 to 1.15. OpenResty is a distribution of NGINX with support for application development in Lua, which is used for our doh101 implementation of encrypted DNS.
]]>I have spent the last week or so trying to get from a proof of concept to something workable. Much of this work has been on the security checks. The old UI has:
Cookie validation (for Oracle sessions)
Raven authentication
TOTP authentication for superusers
Second cookie validaion for TOTP
CSRF checks
There was an awkward split between the Jackdaw framework and the ipreg-specific parts which meant I needed to add a second cookie when I added TOTP authentication.
In the new setup I have upgraded the cookie to modern security levels, and it handles both Oracle and TOTP session state.
my @cookie_attr = ( -name => '__Host-Session', -path => '/', -secure => 1, -httponly => 1, -samesite => 'strict', );
The various "middleware" authentication components have been split out of the main HTTP request handler so that the overall flow is much easier to see.
There is some fairly tricky juggling in the old code between:
CGI request object
WebIPDB HTTP request handler object
IPDB database handle wrapper
Raw DBI handle
The CGI object is gone. The mod_perl
Apache2 APIs are sufficient
replacements, and the HTML generation functions are being
replaced by mustache templates. (Though there is some programmatic
form generation in table_ops
that might be awkward!)
I have used Moo roles to mixin the authentication middleware bits to the main request handler object, which works nicely. I might do the same for the IPDB object, though that will require some refactoring of some very old skool OO perl code.
The plan is to port the rest of the ops pages as directly as possible. There is going to be a lot of refactoring, but it will all be quite superficial. The overall workflow is going to remain the same, just more purple.
]]>mod_perl
code into shape, I happily
deleted a lot of database connection management code that I had
inherited from Jackdaw's web server. Today I had to put it all back
again.
There is a neat module called Apache::DBI which hooks mod_perl
and DBI together to provide a transparent connection cache: just throw
in a use
statement, throw out dozens of lines of old code, and you
are pretty much done.
Today the clone of Jackdaw that I am testing against was not available (test run for some maintenance work tomorrow, I think) and I found that my dev web server was no longer responding. It started OK but would not answer any requests. I soon worked out that it was trying to establish a database connection and waiting at least 5 minutes (!) before giving up.
There is a long discussion about timeouts in the DBI documentation which specifically mentions DBD::Oracle as a problem case, with some lengthy example code for implementing a timeout wrapper around DBI::connect.
This is a terrible documentation anti-pattern. Whenever I find myself giving lengthy examples of how to solve a problem I take it as a whacking great clue that the code should be fixed so the examples can be made a lot easier.
In this case, DBI should have connection timeouts as standard.
If you read past the examples in DBI(3pm) there's a reference to a more convenient module which provides a timeout wrapper that can be used like this:
if (timeout_call($connect_timeout, sub { $dbh = DBI->connect(@connect_args); moan $DBI::errstr unless $dbh; })) { moan "database connection timed out"; }
The problem is that there isn't a convenient place to put this timeout code where it should be, so that Apache::DBI can use it transparently.
So I resurrected Jackdaw's database connection cache. But not exacly - I looked through it again and I could not see any extra timeout handling code. My guess is that hung connections can't happen if the database is on the same machine as the web server.
]]>The existing code uses Perl CGI functions for rendering the HTML, with no styling at all. I'm replacing this with mustache templates using the www.dns.cam.ac.uk Project Light framework. So far I have got the overall navigation structure working OK, and it's time to start putting forms into the pages.
I fear this reskin is going to be disappointing, because although it's
superficially quite a lot prettier, the workflow is going to be the
same - for example, the various box_ops
etc. links in the existing
user interface become Project Light local navigation tabs in the new
skin. And there are still going to be horrible Oracle errors.
I have upgraded our DNS servers to 9.12.4-P1 to address the TCP socket exhaustion vulnerability.
At the same time I have also relinked BIND with a more recent version of OpenSSL, so it is now able to validate the small number of domains that use the new Ed25519 DNSSEC algorithm.
]]>Jackdaw is a complicated beast with a long history.
Back in the distant past when dinosaurs roamed the land), Jackdaw was a graph/network database management system written in BCPL for the IBM mainframe Phoenix. (Philip Hazel has written a bit about the old Jackdaw in his memoir.) It supported a number of uses, including the user admin database.
Before Phoenix was decommissioned in 1995, a new Jackdaw was created using Oracle, initially just for the user admin database. The IP Register database was added in around 2000-2002. (It is actually just a schema within the same overall database.)
As well as the database, Jackdaw has a mod_perl
framework for
developing web applications on Oracle called WebDBI. The IP Register
web UI is one of these applications.
Some of Jackdaw's complexity is because it suffered from the 1990s problem of not enough computers, so it is designed to support flexible testing on a single server:
There are multiple instances of Oracle, of potentially
different versions, such as jackdaw
, jdawtest
, jdawdev
,
etc.
There are production and test instances of the web server, running
on different ports. The choice of web server also determines the
version of the mod_perl
framework libraries.
There are production and test instances of the web applications, selected using an extended URL path.
The exact boundary between what is framework code and what is application code is somewhat murky to me, and I have not found it very easy as relatively unprivileged user of Jackdaw to use this testing setup.
As well as the various prod / test switches, there are a number of configuration variables that depend on which part of the web site you are looking at:
Is anonymous access allowed? This is used for things like the old email address search page.
Are long-term cookies allowed? They are used for automated access to the IP Register database.
Which authentication realm? This is used to tie long-term cookies to particular applications.
Separately, the IP Register UI has its own bodged TOTP two-factor authentication for privileged users, which is separate from Jackdaw's authentication system.
Jackdaw has its own HTTP request dispatch logic which implements the prod / test switch and the authentication process. It also determines whether the request is served by Perl code or by a file on disk, rather than using Apache's own configuration to do this.
I needed to work through all these details to understand how to get it working on another web server, so that I can move the IP Register web interface off Jackdaw while still using Jackdaw's Oracle database.
I have stripped down the code a lot while moving it to my web server.
The reason for moving the web front end is to make it easier to replace the Oracle back end with Postgres. So it is helpful to have less complexity.
So I have removed most of the code to implement the various options I've listed above. The only configuration that is still required is which Oracle instance to connect to, which is now fixed alongside the prod / test / dev options by the Ansible playbooks, and no longer chosen dynamically.
Together with my colleague Ujjwal Das who is responsible for Jackdaw,
we have also somewhat simplified the authentication between the web
server and Oracle. On Jackdaw, the web server switches between the
www
and raven_user
Oracle users. The new web server only has a
wwwdns
user.
The www
user allows unauthenticated access to a few very restricted
parts of the user admin database, and it can run a stored procedure to
verify authentication responses from Raven and set up login sessions.
For users with login cookies, the server changes to raven_user
and
runs another stored procedure checks the cookie and sets a
sys_context('raven', 'user')
variable to identify the
logged-in user; this is used by the IP Register database views for
access control.
The wwwdns
user has no anonymous access to the database, except for
Raven login stored procedures. So it is less privileged than the web
server on Jackdaw wrt the user admin parts of the databse. This change
means that I could not have used Jackdaw's code verbatim even if I
wanted to.
The rough plan is to finish the authentication support:
long-term cookie management
TOTP two-factor authentication
Then bring over the IP Register user interface. There will be a bit of a reskin but it will still have the same unfriendly behaviour.
]]>Of course, even a dev server needs a TLS certificate, especially because these experiments will be about authentication. Until now I have obtained certs from the UIS / Jisc / QuoVadis, but my dev server is using Let's Encrypt instead.
In order to get a certificate from Let's Encrypt using the http-01
challenge, I need a working web server. In order to start the web
server with its normal config, I need a certificate. This poses a bit
of a problem!
My solution is to install Debian's ssl-cert
package, which creates a
self-signed certificate. When the web server does not yet have a
certificate (if the QuoVadis cert isn't installed, or dehydrated
has
not been initialized), Ansible temporarily symlinks the self-signed
cert for use by Apache, like this:
- name: check TLS certificate exists stat: path: /etc/apache2/keys/tls-web.crt register: tls_cert - when: not tls_cert.stat.exists name: fake TLS certificates file: state: link src: /etc/ssl/{{ item.src }} dest: /etc/apache2/keys/{{ item.dest }} with_items: - src: certs/ssl-cert-snakeoil.pem dest: tls-web.crt - src: certs/ssl-cert-snakeoil.pem dest: tls-chain.crt - src: private/ssl-cert-snakeoil.key dest: tls.pem
The dehydrated
and dehydrated-apache2
packages need a little
configuration. I needed to add a cron job to renew the certificate, a
hook script to reload apache when the cert is renewed, and tell it
which domains should be in the cert. (See below for details of these
bits.)
After installing the config, Ansible initializes dehydrated
if
necessary - the creates
check stops Ansible from running
dehydrated
again after it has created a cert.
- name: initialize dehydrated command: dehydrated -c args: creates: /var/lib/dehydrated/certs/{{inventory_hostname}}/cert.pem
Having obtained a cert, the temporary symlinks get overwritten with links to the Let's Encrypt cert. This is very similar to the snakeoil links, but without the existence check.
- name: certificate links file: state: link src: /var/lib/dehydrated/certs/{{inventory_hostname}}/{{item.src}} dest: /etc/apache2/keys/{{item.dest}} with_items: - src: cert.pem dest: tls-web.crt - src: chain.pem dest: tls-chain.crt - src: privkey.pem dest: tls.pem notify: - restart apache
After that, Apache is working with a proper certificate!
The cron script chatters into syslog, but if something goes wrong it should trigger an email (tho not a very informative one).
#!/bin/bash set -eu -o pipefail ( dehydrated --cron dehydrated --cleanup ) | logger --tag dehydrated --priority cron.info
The hook script only needs to handle one of the cases:
#!/bin/bash set -eu -o pipefail case "$1" in (deploy_cert) apache2ctl configtest && apache2ctl graceful ;; esac
The configuration needs a couple of options added:
- copy: dest: /etc/dehydrated/conf.d/dns.sh content: | EMAIL="hostmaster@cam.ac.uk" HOOK="/etc/dehydrated/hook.sh"
The final part is to tell dehydrated
the certificate's domain name:
- copy: content: "{{inventory_hostname}}\n" dest: /etc/dehydrated/domains.txt
For production, domains.txt
needs to be a bit more complicated. I
have a template like the one below. I have not yet deployed it; that
will probably wait until the cert needs updating.
{{hostname}} {% if i_am_www %} www.dns.cam.ac.uk dns.cam.ac.uk {% endif %}]]>
I have upgraded our DNS servers to 9.12.3-P4 to address the memory leak vulnerability.
]]>Stop BIND from generating SHA-1 DS and CDS records by default, per RFC 8624
Teach dnssec-checkds
about CDS and CDNSKEY
superglue
to use CDS/CDNSKEY records, with similar logic
to dnssec-checkds
The "similar logic" is implemented in dnssec-dsfromkey
, so I don't
actually have to write the code more than once. I hope this will also
be useful for other people writing similar tools!
Some of my small cleanup patches have been merged into BIND. We are currently near the end of the 9.13 development cycle, so this work is going to remain out of tree for a while until after the 9.14 stable branch is created and the 9.15 development cycle starts.
So now I need to get to grips with dnssec-coverage
and dnssec-keymgr
.
The purpose of the dnssec-checkds
improvements is so that it can be
used as a safety check.
During a KSK rollover, there are one or two points when the DS records in the parent need to be updated. The rollover must not continue until this update has been confirmed, or the delegation can be broken.
I am using CDS and CDNSKEY records as the signal from the key
management and zone signing machinery for when DS records need to
change. (There's a shell-style API in dnssec-dsfromkey -p
, but that
is implemented by just reading these sync records, not by looking into
the guts of the key management data.) I am going to call them "sync
records" so I don't have to keep writing "CDS/CDNSKEY"; "sync" is also
the keyword used by dnssec-settime
for controlling these records.
The dnssec-keygen
and dnssec-settime
commands (which are used by
dnssec-keymgr
) schedule when changes to a key will happen.
There are parameters related to adding a key: when it is published in the zone, when it becomes actively used for signing, etc. And there are parameters related to removing a key: when it becomes inactive for signing, when it is deleted from the zone.
There are also timing parameters for publishing and deleting sync records. These sync times are the only timing parameters that say when we must update the delegation.
The point of the safety interlock is to prevent any breaking key changes from being scheduled until after a delegation change has been confirmed. So what key timing events need to be forbidden from being scheduled after a sync timing event?
Events related to removing a key are particularly dangerous. There are some cases where it is OK to remove a key prematurely, if the DS record change is also about removing that key, and there is another working key and DS record throughout. But it seems simpler and safer to forbid all removal-related events from being scheduled after a sync event.
However, events related to adding a key can also lead to nonsense. If we blindly schedule creation of new keys in advance, without verifying that they are also being properly removed, then the zone can accumulate a ridiculous number of DNSKEY records. This has been observed in the wild surprisingly frequently.
There must be no KSK changes of any kind scheduled after the next sync event.
This rule applies regardless of the flavour of rollover (double DS, double KSK, algorithm rollover, etc.)
Whereas for ZSKs, dnssec-coverage
ensures rollovers are planned for
some fixed period into the future, for KSKs, it must check correctness
up to the next sync event, then ensure nothing will occur after that point.
In dnssec-keymgr
, the logic should be:
If the current time is before the next sync event, ensure there is key coverage until that time and no further.
If the current time is after all KSK events, use dnssec-checkds
to verify the delegation is in sync.
If dnssec-checkds
reports an inconsistency and we are within
some sync interval dictated by the rollover policy, do nothing
while we wait for the delegation update automation to work.
If dnssec-checkds
reports an inconsistency and the sync interval
has passed, report an error because operator intervention is
required to fix the failed automation.
If dnssec-checkds
reports everything is in sync, schedule keys
up to the next sync event. The timing needs to be relative to this
point in time, since any delegation update delays can make it
unsafe to schedule relative to the last sync event.
At the moment I am still not familiar with the internals of
dnssec-coverage
and dnssec-keymgr
so there's a risk that I might
have to re-think these plans. But I expect this simple safety rule
will be a solid anchor that can be applied to most DNSSEC key
management scenarios. (However I have not thought hard enough about
recovery from breakage or compromise.)
The basic setup that will be necessary on the child is:
Write a policy configuration for dnssec-keymgr
.
Write a cron job to run dnssec-keymgr
at a suitable interval. If
the parent does not run dnssec-cds
then this cron job should
also run superglue
or some other program to push updates to the
parent.
The KSK rollover process will be driven by dnssec-keymgr
, but it
will not talk directly to superglue
or dnssec-cds
, which make the
necessary changes. In fact it can't talk to dnssec-cds
because that
is outside the child's control.
So, as specified in RFC 7344,
the child will advertise the desired state of its delegation using CDS
and CDNSKEY records. These are read by dnssec-cds
or superglue
to
update the parent. superglue
will be loosely coupled, and able to
work with any DNSSEC key management softare that publishes CDS
records.
The state of the keys in the child is controlled by the timing
parameters in the key files, which are updated by dnssec-keymgr
as
determined by the policy configuration. At the moment it generates
keys to cover some period into the future. For KSKs, I think it will
make more sense to generate keys up to the next DS change, then stop
until dnssec-checkds
confirms the parent has implemented the change,
before continuing. This is a bit different from the ZSK coverage
model, but future coverage for KSKs can't be guaranteed because
coverage depends on future interactions with an external system which
cannot be assumed to work as planned.
Teach dnssec-checkds
about CDS and CDNSKEY
Teach dnssec-keymgr
to set "sync" timers in key files, and to
invoke dnssec-checkds
to avoid breaking delegations.
Teach dnssec-coverage
to agree with dnssec-keymgr
about
sensible key configuration.
Teach superglue
to use CDS/CDNSKEY records, with similar logic
to dnssec-checkds
Stop BIND from generating SHA-1 DS and CDS records by default, per draft-ietf-dnsop-algorithm-update
This release adds CDS
and CDNSKEY
records to the list of
DNSSEC-related types that are ignored, since by default nsdiff
expects them to be managed by the name server, not as part of the zone
file. There is now a -C
option to revert to the previous behaviour.
131.111.8.42 recdns0.csx.cam.ac.uk -> rec0.dns.cam.ac.uk 131.111.12.20 recdns1.csx.cam.ac.uk -> rec1.dns.cam.ac.uk
A digression for the historically curious: the authdns
and recdns
names date from May 2006, when they were introduced to prepare for
separating authoritative and recursive DNS service.
Until 2006, 131.111.8.42 was known as chimaera.csx.cam.ac.uk
. It had
been our primary DNS server since September/October 1995. Before then,
our DNS was hosted on CUS, the Central Unix Service.
And 131.111.12.20 had been known as c01.csi.cam.ac.uk
(or comms01
)
since before my earliest records in October 1991.
In the end I decided to rewrite the superglue-janet
script in Perl,
since most of superglue
is already Perl and I would like to avoid
rewriting all of it. This is still work in progress; superglue
is
currently an unusable mess, so I don't recommend looking at it right
now :-)
Rather than using an off-the-shelf library, I have a very thin layer (300 lines of code, 200 lines of docs) that wraps WebDriver HTTP+JSON calls in Perl subroutines. It's designed for script-style usage, so I can write things like this (quoted verbatim):
# Find the domain's details page. click '#commonActionsMenuLogin_ListDomains'; fill '#MainContent_tbDomainNames' => $domain, '#MainContent_ShowReverseDelegatedDomains' => 'selected'; click '#MainContent_btnFilter';
This has considerably less clutter than the old PhantomJS / CasperJS code!
I don't really understand the concurrency model between the WebDriver server and the activity in the browser. It appears to be remarkably similar to the way CasperJS behaved, so I guess it is related to the way JavaScript's event loop works (and I don't really understand that either).
The upshot is that in most cases I can click
on a link, and the
WebDriver response comes back after the new page has loaded. I can
immediately interact with the new page, as in the code above.
However there are some exceptions.
On the JISC domain registry web site there are a few cases where selecting from a drop-down list triggers some JavaScript that causes a page reload. The WebDriver request returns immediately, so I have to manually poll for the page load to complete. (This also happened with CasperJS.) I don't know if there's a better way to deal with this than polling...
I am not a fan of the WebDriver protocol specification. It is written as a description of how the code in the WebDriver server / browser behaves, written in spaghetti pseudocode.
It does not have any abstract syntax for JSON requests and responses - no JSON schema or anything like that. Instead, the details of parsing requests and constructing responses are interleaved with details of implementing the semantics of the request. It is a very unsafe style.
And why does the WebDriver spec include details of how to HTTP?
This work is part of two ongoing projects:
I need to update all our domain delegations to complete the server renaming.
I need automated delegation updates to support automated DNSSEC key rollovers.
So I'm aiming to get superglue
into a usable state, and hook it up
to BIND's dnssec-keymgr.
The major DNS resolver providers have declared February 1st to be DNS Flag Day. (See also the ISC blog item on the DNS flag day.)
DNS resolvers will stop working around broken authoritative DNS servers that do not implement EDNS correctly. The effect will be that DNS resolution may fail in some cases where it used to be slow.
The flag day will take effect immediately on some large public resolvers. In Cambridge, it will take effect on our central resolvers after they are upgraded to BIND 9.14, which is the next stable branch due to be released Q1 this year.
I'm running the development branch 9.13 on my workstation, which already includes the Flag Day changes, and I haven't noticed any additional breakage - but then my personal usage is not particularly heavy nor particularly diverse.
Last week the old DNSSEC root key was revoked, so DNSSEC validators that implement RFC 5011 trust anchor updates should have deleted the old key (tag 19036) from their list of trusted keys.
For example, on one of my resolvers the output of rndc managed-keys
now includes the following. (The tag of the old key changed from 19036
to 19164 when the revoke flag was added.)
name: . keyid: 20326 algorithm: RSASHA256 flags: SEP next refresh: Fri, 18 Jan 2019 14:28:17 GMT trusted since: Tue, 11 Jul 2017 15:03:52 GMT keyid: 19164 algorithm: RSASHA256 flags: REVOKE SEP next refresh: Fri, 18 Jan 2019 14:28:17 GMT remove at: Sun, 10 Feb 2019 14:20:18 GMT trust revoked
This is the penultimate step of the root key rollover; the final step is to delete the revoked key from the root zone.
]]>131.111.8.42 recdns0.csx.cam.ac.uk -> rec0.dns.cam.ac.uk 131.111.12.20 recdns1.csx.cam.ac.uk -> rec1.dns.cam.ac.uk
Although there should not be much that depends on the old names, we are giving you a warning in case things like monitoring systems need reconfiguration.
This is part of the ongoing DNS server reshuffle project.
]]>I have upgraded
the IP Register DHCP servers
twice this year. In February they were upgraded from Ubuntu 12.04 LTS
to 14.04 LTS, to cope with 12.04's end of life, and to merge their
setup into the main ipreg
git repository (which is why the target
version was so old). So their setup was fairly tidy before the Debian
9 upgrade.
Unlike most of the IP Register systems, the dhcp servers are stateful.
Their dhcpd.leases
files must be preserved across reinstalls.
The leases file is a database (in the form of a flat text file in ISC
dhcp config file format) which closely matches the state of the network.
If it is lost, the server no longer knows about IP addresses in use by existing clients, so it can issue duplicate addresses to new clients, and hilarity will ensue!
So, just before rebuilding a server, I have to stop the dhcpd and take a copy of the leases file. And before the dhcpd is restarted, I have to copy the leases file back into place.
This isn't something that happens very often, so I have not automated it yet.
In February, I hacked around with the Ansible playbook to ensure the dhcpd was not started before I copied the leases file into place. This is an appallingly error-prone approach.
Yesterday, I turned that basic idea into an Ansible variable that controls whether the dhcpd is enabled. This avoids mistakes when fiddling with the playbook, but it is easily forgettable.
This morning I realised a much neater way is to disable the entire dhcpd role if the leases file doesn't exist. This prevents the role from starting the dhcpd on a newly reinstalled server before the old leases file is in place. After the server is up, the check is a no-op.
This is a lot less error-prone. The only requirement for the admin is
knowledge about the importance of preserving dhcpd.leases
...
The other pitfall in my setup is that monit
will restart dhcpd
if
it is missing, so it isn't easy to properly stop it.
My dhcpd_enabled
Ansible variable takes care of this, but I think it
would be better to make a special shutdown playbook, which can also
take a copy of the leases file.
1457 commits
4035 IP Register / MZS support messages
5734 cronspam messages
New DNS web site (Feb, Mar, Jun, Sep, Oct, Nov)
This was a rather long struggle with a lot of false starts, e.g. February / March finding that Perl Template Toolkit was not very satisfactory; realising after June that the server naming and vhost setup was unhelpful.
End result is quite pleasing
IP Register API extensions (Aug)
API access to xlist_ops
MWS3 API generalized for other UIS services
Now in active use by MWS, Drupal Falcon, and to a lesser extent by the HPC OpenStack cluster and the new web Traffic Managers. When old Falcon is wound down we will be able to eliminate Gossamer!
Server upgrade / rename (Dec)
Lots of Ansible review / cleanup. Satisfying.
Prototype setup for PostgreSQL replication using repmgr
(Jan)
Prototype infrastructure for JSON-RPC API in Typescript (April, May)
DHCP servers upgraded to match rest of IP Register servers (Feb)
DNS servers upgraded to BIND 9.12, with some serve-stale
related
problems. (March)
Local patches all now incorporated upstream :-)
git.uis continues, hopefully not for much longer
Took over as the main author of draft-ietf-dnsop-aname. This work is ongoing.
Received thanks in RFC 8198 (DNSSEC negative answer synthesis), RFC 8324 (DNS privacy), RFC 8482 (minimal ANY responses), RFC 8484 (DNS-over-HTTPS).
Ongoing maintenance of regpg
.
This has stabilized and reached a comfortable feature plateau.
Created doh101
,
a DNS-over-TLS and DNS-over-HTTPS proxy.
Initial prototype in March at the IETF hackathon.
Revamped in August to match final IETF draft.
Deployed in production in September.
Fifteen patches committed to BIND9.
CVE-2018-5737; extensive debugging work on the serve-stale
feature.
Thanked by ISC.org in their annual review.
Significant clean-up and enhancement of my qp trie data structure, used by Knot DNS. This enabled much smaller memory usage during incremental zone updates.
https://gitlab.labs.nic.cz/knot/knot-dns/issues/591
Update superglue
delegation maintenance script to match the
current state of the world. Hook it in to dnssec-keymgr
and get
automatic rollovers working.
Rewrite draft-ietf-dnsop-aname again, in time for IETF104 in March.
Server renumbering, and xfer/auth server split, and anycast. When?
Port existing ipreg web interface off Jackdaw.
Port database from Oracle on Jackdaw to PostgreSQL on my servers.
Develop new API / UI.
Re-do provisioning system for streaming replication from database to DNS.
Move MZS into IP Register database.
Last year, EURID (the registry for .eu domain names) sent out a notice about the effect of Brexit on .eu domain names registered in the UK. The summary is that .eu domains may only be registered by organizations or individuals in the EU, and unless any special arrangements are made (which has not happened) this will not include the UK after Brexit, so UK .eu domain registrations will be cancelled.
https://eurid.eu/en/register-a-eu-domain/brexit-notice/
Other European country-class TLDs may have similar restrictions (for instance, Italy's .it).
Sadly we cannot expect our government to behave sensibly, so you have to make your own arrangements for continuity of your .eu domain.
The best option is for you to find one of your collaborators in another EU country who is able to take over ownership of the domain.
We have contacted the owners of .eu domains registered through our Managed Zone Service. Those who registered a .eu domain elsewhere should contact their DNS provider for detailed support.
Edited to add: Thanks to Elliot Page for pointing out that this problem may apply to other TLDs as well as .eu
Edited to add (2019-07-22): There has been an update to the .eu eligibility criteria.
]]>I have some old code called superglue-janet which drives the JISC / JANET / UKERNA domain registry web site. The web site has some dynamic JavaScript behaviour, and it looks to me like the browser front-end is relatively tightly coupled to the server back-end in a way that I expected would make reverse engineering unwise. So I decided to drive the web site using browser automation tools. My code is written in JavaScript, using PhantomJS (a headless browser based on QtWebKit) and CasperJS (convenience utilities for PhantomJS).
PhantomJS is now deprecated, so the code needs a re-work. I also want to use TypeScript instead, where I would previously have used JavaScript.
The modern way to do things is to use a full-fat browser in headless mode and control it using the standard WebDriver protocol.
For Firefox this means using the geckodriver proxy which is a Rust program that converts the WebDriver JSON-over-HTTP protocol to Firefox's native Marionette protocol.
[Aside: Marionette is a full-duplex protocol that exchanges JSON messages prefixed by a message length. It fits into a similar design space to Microsoft's Language Server Protocol, but LSP uses somewhat more elaborate HTTP-style framing and JSON-RPC message format. It's kind of a pity that Marionette doesn't use JSON-RPC.]
The WebDriver protocol came out of the Selenium browser automation project where earlier (incompatible) versions were known as the JSON Wire Protocol.
I thought it would make sense to write the WebDriver client in TypeScript. The options seemed to be:
selenium-webdriver, which has Selenium's bindings for node.js. This involves a second proxy written in Java which goes between node and geckodriver. I did not like the idea of a huge wobbly pile of proxies.
webdriver.io aka wdio, a native node.js WebDriver client. I chose to try this, and got it going fairly rapidly.
I had enormous difficulty getting anything to work with wdio and TypeScript. It turns out that the wdio typing was only committed a couple of days before my attempt, so I had accidentally found myself on the bleeding edge. I can't tell whether my failure was due to lack of documentation or brokenness in the type declarations...
I need to find a better WebDriver client library. The wdio framework is very geared towards testing rather than general automation (see the wdio "getting started" guide for example) so if I use it I'll be talking to its guts rather than the usual public interface. And it won't be very stable.
I could write it in Perl but that wouldn't really help to reduce the amount of untyped code I'm writing :-)
]]>Ensure both new and old names are in the DNS
Rename the host in ipreg/ansible/bin/make-inventory
and run the script
Run ipreg/ansible/bin/ssh-knowhosts
to update ~/.ssh/known_hosts
Rename host_vars/$SERVER
and adjust the contents to match a
previously renamed server (mutatis mutandis)
For recursive servers, rename the host in
ipreg/ansible/roles/keepalived/files/vrrp-script
and
ipreg/ansible/inventory/dynamic
infra-sas@uis
to do the root privilege parts of the netboot
configuration - rename and/or new OS version as required
For DHCP servers, save a copy of the leases file by running:
ansible-playbook dhcpd-shutdown-save-leases.yml \ --limit $SERVER
Run the preseed.yml
playbook to update the unprivileged parts of
the netboot config
Reboot the server, tell it to netboot and do a preseed install
Wait for that to complete
For DHCP servers, copy the saved leases file to the server.
Then run:
ANSIBLE_SSH_ARGS=-4 ANSIBLE_HOST_KEY_CHECKING=False \ ansible-playbook -e all=1 --limit $SERVER main.yml
Update the rest of the cluster's view of the name
git push ansible-playbook --limit new main.yml
Done:
Live and test web servers, which were always Stretch, so they served as a first pass at getting the shared parts of the Ansible playbooks working
Live and test primary DNS servers
Live x 2 and test x 2 authoritative DNS servers
One recursive server
To do:
Three other recursive servers
Live x 2 and test x 1 DHCP servers
Here are a few notes on how the project has gone so far.
-test
in server namesI ought to be better at writing a checklist of actions for processes
like this. I kept forgetting to do things like renaming host_vars
files and re-running the Ansible inventory build script with the new
name before firing off a reinstall.
It isn't a surprise (especially after the recent DHCP / monit cockup) that there has been rather too much divergence between the Ansible playbooks and the running servers (i.e. it was non-zero) and a hidden accumulation of small bootstrapping bugs. This is mainly due to sticking with an LTS OS release for too long, because there wasn't enough pressure to upgrade.
I'm planning to address this by sticking closer to Debian's release schedule. This means upgrading every other year, except that the 9 "Stretch" -> 10 "Buster" upgrade will be in one year to match up with Debian's schedule.
Ansible by default gathers "facts" about target servers when it starts: information about things like the OS, CPU, memory, networking.
When I started using Ansible this seemed like more complication than I needed. It was easier to hard-code things that never changed in my setup. And that remained true for several years.
But I have found a few uses this year:
doh101
uses OS facts (Debian vs Ubuntu, version number) to
automatically select the right OpenResty package
Automatically installing the VMware tools when appropriate without me explicitly stating which servers are VMs and which are hardware
The network interfaces have different names in Trusty and Stretch, so it was most convenient to use facts to parameterize that part of the network configuration
One problem with the latter is that the playbooks do the wrong thing when I'm upgrading a server with a stale fact cache. I have a script for building (most of) my Ansible inventory, which also constructs /etc/hosts and ssh known_hosts files. It now also deletes the fact cache which helps (especially when I am renaming the servers).
Several times I have got the ordering in my playbooks wrong, by starting a daemon before I've installed all of its config files. This is usually benign, except when there is a mistake in the config file.
When you try to fix the config file in this situation, the rc script will abort when its config file check fails on the old broken file, causing Ansible to abort before it installs the fixed config file.
After the DHCP / monit cockup, I tried to use Ansible dependencies to
describe that (e.g.) named
installs a monit
config fragment. But
this led to prodding the daemon before installing the config file.
In the end I resolved this by putting the monit
role after the
named
role in the top-level playbooks (so the order is correct), and
invoking the reconfigure monit
handler from the named
role. This
is a semi-implicit dependency - Ansible will fail if the playbook does
not invoke both roles, which is good, but it doesn't use Ansible's
explicit dependency support.
This works but it isn't very satisfactory.
One of the most troublesome parts of my Ansible setup (in terms of the
amount of churn per line of code and bugs needing fixed) has been the
interface_reconfig
module. Debian's ifup
/ ifdown
programs don't
have a good way to check if the current running state of the network
interfaces matches what is specified in /etc/network/interfaces
.
My interface_reconfig
module parses /etc/network/interfaces
and
the output from ip addr show
and works out if it should bounce the
network. For dynamic IP addresses (managed by keepalived
on some of
my servers) I put specially formatted comments in
/etc/network/interfaces
to tell interface_reconfig
it should
ignore them.
This is pretty simple in principle, but it has been awkward keeping up
with changes in Ansible. The interface_reconfig
module now supports
diff mode and check mode, and it gets all configuration from
/etc/network/interfaces
so I don't have to repeat the list of
network interfaces in the playbooks.
On our primary server, the ipreg
home directory (where the DNS
update machinery works) was set up to allow admins logging in as
themselves to do things. I think I inherited this from the old
ip-register
VM [it was actually a Solaris Zone] which in turn came
from /group/Internet
on CUS.
There were a number of minor annoyances trying to maintain the right
permissions, and in practice we weren't using the shared group setup:
it's better to run the scripts on my workstation for ad-hoc stuff, and
when running on the live primary server it's safer to log in as
ipreg@
to avoid permissions screwups.
I do not have separate (floating) service IP addresses for most of my servers, so when upgrading the primary DNS server I can't upgrade the standby and swap which server is live and which is standby.
Instead, I adjusted the primary DNS server's packet filter configuration to isolate it from the other servers, so that they would never be able to see partially-empty zones while the rebuild is in progress. This worked quite nicely.
-test
in server namesThe new naming scheme has explicit -test
tags in the names of the
servers that are not live. This follows our server naming guidelines,
so that our colleagues elsewhere in the department could easily see
how much they should care about something working or not.
It has turned out to work very nicely with a bit of Ansible inventory automation, so I no longer have to manually list which servers are live and which are test.
However it doesn't fit well with floating service IP addresses. If my
DNS server addressing becomes more dynamic then the -test
names may
go away.
The only package that caused significant headaches due to the upgrade was Keepalived.
My old keepalived-1.2 dual stack configuration did not work with keepalived-1.3, because 1.3 is stricter about VRRP protocol conformance, and my old dual stack config was inadvertently making use of some very odd behaviour by Keepalived.
I was previously trying to advertise all IPv4 and IPv6 addresses over
VRRP, but VRRP is not supposed to be dual stack in that way. What
Keepalived actually did was advertise 0.0.0.0 (sic!) in place of each
IPv6 address. The new config uses virtual_ipaddress_excluded
for the
IPv6 addresses; they still follow the IPv4 addresses but now only the
IPv4 addresses are advertised over VRRP. I previously didn't use
virtual_ipaddress_excluded
because the documentation doesn't explain
what it does.
The new server naming scheme is more regular than the old one, which has allowed me to make some nice simplifications to the keepalived configuration. This meant I needed to update the health checker script, so I took the opportunity to rewrite it from shell to perl, so it is a bit less horrible.
Unfortunately, the way I am using the health checker script to
implement dynamic server priorities isn't really how keepalived
expects to work. It logs to syslog every time one of the scripts exits
non-zero, which happens several times each second on my servers. I've
configured syslog
to discard the useless noise, but it would be
better if there was no noise in the first place.
After the upgrades are complete, I need to change the NS records in all of our zones and in their delegations, so that I can get rid of the old names.
This is going to require a bit of programming work to update my delegation maintenace scripts which are rather stale and neglected. But this will also be a step towards automatic DNSSEC key rollovers.
I'm not sure what will happen after that. If we are ready for the IPv6 renumbering, I should try to get that out of the way reasonably swiftly. But I would like to work on porting the IP Register web front-end off Jackdaw.
]]>authdns1.csx.cam.ac.uk
for several minutes, during which our other
authoritative servers will be available to provide DNS service.
The primary server ipreg.csi.cam.ac.uk
will also be rebuilt, which
will involve reconstructing all our DNS zone files from scratch. (This
is less scary than it sounds, because the software we use for the
hourly DNS updates makes it easy to verify that DNS zones are
the same.)
These upgrades will cause secondary servers to perform full zone transfers of our zones, since the incremental transfer journals will be lost.
]]>authdns0.csx.cam.ac.uk
and upgrade its operating system from Ubuntu
14.04 "Trusty" to Debian 9 "Stretch".
During the upgrade our other authoritative servers will be available
to provide DNS service. After the upgrade, secondary servers are
likely to perform full zone transfers from authdns0
since it will
have lost its incremental zone transfer journal.
Next week, we will do the same for authdns1.csx.cam.ac.uk
and for
ipreg.csi.cam.ac.uk
(the primary server).
During these upgrades the servers will have their hostnames changed to
auth0.dns.cam.ac.uk
, auth1.dns.cam.ac.uk
, and pri0.dns.cam.ac.uk
,
at least from the sysadmin point of view. There are lots of references
to the old names which will continue to work until all the NS and SOA
DNS records have been updated. This is an early step in the DNS
server renaming / renumbering project.
I arrived at work late on Tuesday morning to find that the DHCP
servers were sending cronspam every minute from monit
. monit
thought dhcpd
was not working, although it was.
A few minutes before I arrived, a colleague had run our Ansible playbook to update the DHCP server configuration. This was the trigger for the cronspam.
We are using monit
as a basic daemon supervisor for our critical
services. The monit
configuration doesn't have an "include" facility
(or at least it didn't when we originally set it up) so we are using
Ansible's "assemble" feature to concatenate configuration file
fragments into a complete monit
config.
The problem was that our Ansible setup didn't have any explicit
dependencies between installing monit
config fragments and
reassembling the complete config and restarting monit
.
Running the complete playbook caused the monit
config to be
reassembled, so an incorrect but previously inactive config fragment
was activated, causing the cronspam.
How was there an inactive monit
config fragment on the DHCP servers?
The DHCP servers had an OS upgrade and reinstall in February. This was
when the spammy broken monit
config fragment was written.
What were the mistakes at that time?
The config fragment was not properly tested. A good monit
config
is normally silent, but in this case we didn't check that it sent
cronspam when things are broken, whoch would have revealed that
the config fragment was not actually installed properly.
The Ansible playbook was not verified to be properly idempotent. It should be possible to wipe a machine and reinstall it with one run of Ansible, and a second run should be all green. We didn't check the second run properly. Check mode isn't enough to verify idempotency of "assemble".
During routine config changes in the nine months since the servers were reinstalled, the usual practice was to run the DHCP-specific subset of the Ansible playbook (because that is much faster) so the bug was not revealed.
There was a lot more anxiety than there should have been when debugging this problem, because at the time the Ansible playbooks were going through a lot of churn for upgrading and reinstalling other servers, and it wasn't clear whether or not this had caused some unexpected change.
This gets close to the heart of the matter:
There are other issues related to being a (nearly) solo developer, which makes it easier to get into bad habits. The DHCP server config has the most contributions from colleagues at the moment, so it is not really surprising that this is where we find out the consequences of the bad habits of soloists.
It turns out that monit
and dhcpd do not really get along. The
monit
UDP health checker doesn't work with DHCP (which was the cause
of the cronspam) and monit
's process checker gets upset by dhcpd
being restarted when it needs to be reconfigured.
The monit
DHCP UDP checker has been disabled; the process checker
needs review to see if it can be useful without sending cronspam on
every reconfig.
There should be routine testing to ensure the Ansible playbooks committed to the git server run green, at least in check mode. Unfortunately it's risky to automate this because it requires root access to all the servers; at the moment root access is restricted to admins in person.
We should be in the habit of running the complete playbook on all the servers (e.g. before pushing to the git server), to detect any differences between check mode and normal (active) mode. This is necessary for Ansible tasks that are skipped in check mode.
This incident also highlights longstanding problems with our low bus protection factor and lack of automated testing. The resolutions listed above will make some small steps to improve these weaknesses.
]]>authdns0.maths
(131.111.20.101) and
authdns1.maths
(131.111.20.202).
I have updated sample.named.conf
and
catz.arpa.cam.ac.uk
to refer to these new servers for the 11 Maths
zones. Also, I have belatedly added the Computer Lab's new reverse DNS
range for 2a05:b400:110::/48.
The stealth secondary server documentation now includes
separate, simpler configuration files for forwarding BIND resolvers,
and for stealth secondaries using catz.arpa.cam.ac.uk
. (As far as I
can tell I am still the only one using catalog zones at the moment!
They are pretty neat, though.)
The new site is mostly the old (sometimes very old) documentation that was hosted under https://jackdaw.cam.ac.uk/ipreg/. It has been reorganized and reformatted to make it easier to navigate; for example some pages have been rescued from the obscurity of the news archives. There are a few new pages that fill in some of the gaps.
The old pages (apart from the IP Register database interface) will shortly be replaced by redirects to their new homes on the new site.
Our DNS news mailing list has been renamed to uis-dns-announce;
those who were subscribed to the old cs-nameservers-announce
list
have been added to the new list. This mailing list is for items of
interest to those running DNS servers on the CUDN, but which aren't of
broad enough relevance to bother the whole of ucam-itsupport
.
There are now Atom feeds for DNS news available from https://www.dns.cam.ac.uk/news/.
This news item is also posted at https://www.dns.cam.ac.uk/news/2018-11-20-web-site.html
The new site is part of the project to move the IP Register database off Jackdaw. The plan is:
New web server; evict documentation. (done)
Replcate IP Register web user interface on new server. (This work
will mostly be about porting Jackdaw's bespoke "WebDBI" mod_perl
/ Oracle application framework.)
Move the IP Register database off Jackdaw onto a new PostgreSQL database, without altering the external appearance. (This will involve porting the schema and stored procedures, and writing a test suite.)
After that point we should have more tractable infrastructure, making it easier to provide better user interface and APIs.
The new site is written in Markdown. The Project Light templates
use Mustache, because it is programming-language-agnostic, so it
will work with the existing mod_perl
scripts, and with TypeScript in
the future.
I have a number of long-term projects which can have much greater success within the University and impact outside the University by collaborating with people from other organizations in person. Last week was a great example of that, with significant progress on CDS (which I did not anticipate!), ANAME, and DNS privacy, which I will unpack below.
The DNS Operations Analysis and Research Centre holds peripatetic workshops twice a year, typically just before an ICANN or RIR meeting. They are attended by root server operators, TLD operators, large DNS service providers, DNS software suppliers, and others. CENTR is the association of European country-code TLD registries.
The RIPE NCC is the regional Internet registry for Europe and the Middle East. Earlier this year, the University of Cambridge became a local Internet registry (i.e. an organization responsible for sub-allocating IP address space) and as new members we got a couple of free tickets to attend a RIPE meeting. As well as a significant proportion of the DNS-OARC crowd, the RIPE meeting included a lot more network operators, academic researchers, and IETFers.
Both DNS-OARC and RIPE meetings are technical conferences that regularly include a lot of material that's highly relevant to my work; RIPE of course has a much broader agenda across network operations. I recommend having a look through the DNS-OARC 29 presentation archive and the RIPE77 presentation archive because I will only mention a few key items here.
I also wrote daily notes on my personal blog: Fri Sat Sun Mon Tue Wed Thu Fri; this report has a few highlights from those notes.
One of the pain points the University shares with other DNSSEC sites
is the difficulty of managing DNSSEC keys. Partly this is due to gaps
in the functionality of tools (though this is improving), but more
challenging are the poor interfaces available for keeping secure
delegations up-to-date, typically either a bespoke API (such as
RIPE's) or no API at all (such as for .ac.uk
).
CDS records (RFC7344, RFC8078) are a mechanism for automating DNSSEC delegation maintenance that does not need any extra APIs or credentials. It has the potential to greatly simplify operations, if it is deployed.
CDS is not yet widely adopted: the only registries I know that support
it are .ch
(Switzerland) .li
(Liechtenstein) and .cz
(Czechia).
Last year, I wrote dnssec-cds
to implement the parent side of the
CDS protocol, which is part of the tooling we need within the
University for managing delegations to the Computer Laboratory, Maths,
and others. I contributed dnssec-cds
to BIND 9.12 so that ISC
can take over maintenance, and with the hope that it would encourage
wider deployment.
In Amsterdam I met Ondřej Caletka of CESNET (the Czech national research and education network) who has done some work on using CDS records to drive updates via the RIPE database API, which he presented in the RIPE77 DNS-WG meeting.
I also met Anand Buddhdev who is in charge of the RIPE NCC's DNS services. (He presented a status update to the DNS-WG.) Anand encouraged me to voice support for Ondřej's work, with the aim of getting consensus for a policy change, so that RIPE NCC will implement CDS checking for reverse DNS zones.
There is a good chance that this will happen, which will be a nice simplification to our operations, and CESNET's, and others in the RIPE region.
Another commonly-felt DNS awkwardness is the restriction on how CNAME records can be used. This makes it needlessly difficult to set up web sites, adding to our support overheads. It makes our database schema and API more complicated than necessary.
There are a number of proprietary extensions to the DNS which work around this problem, called things like "ANAME", "ALIAS", "CNAME flattening", etc. (but not to be confused with the different thing that the IP Register database calls an "aname"). Since last year the IETF DNSOP working group has been developing a standardized version of ANAME.
Earlier this year it became clear that no-one liked the ANAME draft, including its authors. Eventually, I proposed a radically [simplified ANAME][], which was generally viewed as a good improvement. So I have taken over as principal author and editor of the ANAME draft.
I completed the initial rewrite of the draft shortly before going to Amsterdam. While I was there I discussed it with several people (especially Evan Hunt, the previous author!), getting some good suggestions for improvements. I also spoke to staff from a number of DNS vendors who have proprietary ANAME-like features; they are generally keen on the idea of standardization, improving interoperability, and the opportunity to reduce technical debt.
The IETF DNSOP working group chairs are very supportive of this effort. There's still a fair amount of work remaining, but it looks like there's a reasonable chance of success.
In recent years the IETF has stepped up its efforts to encrypt all the things. The [DNS privacy] project is leading the encryption effort for the DNS, along with other privacy-enhancing improvements.
These changes are going to have a big effect on DNS operations, by moving traffic off UDP on port 53 to encrypted transports on other ports.
I have written an implementation of DNS-over-TLS and DNS-over-HTTPS called doh101, which is running on the University's DNS resolvers.
I presented a lightning talk at the DNS-OARC meeting to give some idea of the number of people who are already using DNS-over-TLS (dozens of people in Cambridge, but less than 1%).
There was much valuable discussion of the consequences of DNS privacy, intended or not, especially following the talks from Sara Dickinson (one, two, three) and Ólafur Guðmundsson (four, five). There may be some useful measurements we can make, even though our traffic levels are small - I got some good suggestions from Florian Streibelt (thanks to Theresa Enghardt for introducing us).
This was just the main highlights of the week; there were also a number of other useful take-aways, which I will not repeat here (see the links to my daily notes above). It was a busy and productive trip!
]]>Before: http://dnsviz.net/d/root/W79zYQ/dnssec/ After: http://dnsviz.net/d/root/W790GQ/dnssec/
There's a lot of measurement happening, e.g. graphs of the view from the RIPE Atlas distributed Internet measurement system at: https://nlnetlabs.nl/
I have gently prodded our resolvers with rndc flushname .
so they start
using the 2017 key immediately, rather than waiting for the TTL to expire,
since I am travelling tomorrow. I expect there will be a fair amount of
dicsussion about the rollover at the DNS-OARC meeting this weekend...
At the moment (Wednesday mid-afternoon) we have about
29,000 - 31,000 devices on the wireless network
3900 qps total on both recursive servers
about 15 concurrent DoT clients (s.d. 4)
about 7qps DoT (s.d. 5qps)
5s TCP idle timeout
6.3s mean DoT connection time (s.d. 4s - most connections are just over 5s, they occasionally last as long as 30s; mean and s.d. are not a great model for this distribution)
DoT connections very unbalanced, 10x fewer on 131.111.8.42 than on 131.111.12.20
The rule of thumb that number of users is about 10x qps suggests that we have about 70 Android Pie users, i.e. about 0.2% of our userbase.
]]>If you run a DNSSEC validating resolver, you should double check that it trusts the 2017 root key. ICANN have some instructions at the link below; if in doubt you can ask ip-register at uis.cam.ac.uk for advice.
ICANN's DNSSEC trust anchor telemetry data does not indicate any problems for us; however the awkward cases are likely to be older validators that predate RFC 8145.
I am away for the DNS-OARC and RIPE meetings starting on Friday, but I will be keeping an eye on email. This ought to be a non-event but there hasn't been a DNSSEC root key rollover before so there's a chance that lurking horrors will be uncovered.
]]>I have upgraded our DNS servers to 9.12.2-P2 mainly to address the referral interoperability problem, though we have not received any reports that this was causing noticable difficulties.
There is a security issue related to Kerberos authenticated DNS updates; I'll be interested to hear if anyone in the University is using this feature!
Those interested in DNSSEC may have spotted the inline-signing bug that is
fixed by these patches. We do not use inline-signing but instead use
nsdiff
to apply changes to signed zones, and I believe this is also true
for the signed zones run by Maths and the Computer Lab.
Traditional unencrypted DNS using UDP or TCP on port 53 ("Do53")
DNS-over-TLS on port 853 - RFC 7858
DNS-over-HTTPS on port 443 - RFC 8484
Amongst other software, Android 9 "Pie" uses DoT when possible and you can configure Firefox to use DoH.
There is more detailed information about Cambridge's DNS-over-TLS and DNS-over-HTTPS setup on a separate page.
]]>list_ops
page. In order to allow automated registration of
systems with IPv6 addresses, it is now possible to use long-term
downloaded cookies for the xlist_ops
page as well.
]]>
serve-stale
feature. This
helps to make the DNS more resilient when there are local network
problems or when DNS servers out on the Internet are temproarily
unreachable. After many trials and tribulations we have at last
successfully enabled serve-stale
.
Popular websites tend to have very short DNS TTLs, which means the DNS
stops working quite soon when there are network problems. As a result,
network problems look more like DNS problems, so they get reported to
the wrong people. We hope that serve-stale
will reduce this
kind of misattribution.
The original attempt to roll out serve-stale
was rolled back
after one of the recursive DNS servers crashed. My normal upgrade
testing wasn't enough to trigger the crash, which happened after a few
hours of production load.
Since this was a crash that could be triggered by query traffic, it counted as a security bug. After I reported it to ISC.org, there followed a lengthy effort to reproduce it in a repeatable manner, so that it could be debugged and fixed.
I have a tool called
adns-masterfile
which I use for testing server upgrades and suchlike. I eventually
found that sending lots of reverse DNS queries was a good way to
provoke the crash; the reverse DNS has quite a large proportion of
broken DNS servers, which exercise the serve-stale
machinery.
The best I was able to do was get the server to crash after 1 hour; I
could sometimes get it to crash sooner, but not reliably. I used a
cache dump (from rndc dumpdb
) truncated after the .arpa
TLD so it
contained 58MB of reverse DNS, nearly 700,000 queries. I then set up
several concurrent copies of adns-masterfile
to run in loops. The
different copies tended to synchronize with each other, because when
one of them got blocked on a broken domain name the others would catch
up. So I added random delays between each run to encourage different
copies to make queries from different parts of the dump file.
It was difficult for our friends at ISC.org to provoke a crash. After
valgrind
failed to provide any clues, I tried using Mozilla's rr
debugger which supports record/replay
time-travel debugging with efficient reverse execution. It allowed me
to bundle up the binary, libraries, and execution trace and send them
to ISC.org so they could investigate what happened in detail.
I waited for BIND 9.12.2 before deploying the fixed serve-stale
implementation because earlier versions had very verbose logging that
could not easily be turned off.
I submitted a patch that moved serve-stale logging to a separate category so that it can be turned on and off independently of other logging or moved to a separate file. This was merged for the 9.12.2 release, which made it usable in production.
I also investigated options for better testing of new versions before
putting them into production. The disadvantage of adns-masterfile
is
that it makes a large number of unique queries, whereas the
CVE-2018-5737 crash required repetition.
I now have a little script which can extract queries from tcpdump
output and replay them against another server. I can use it to mirror
production traffic onto a staging server, and let it soak for several
hours before performing a live/staging switch-over.
Earlier this year we had a number of network outages during which we
lost connectivity to about half of the Internet. I was quite keen to
get serve-stale
deployed before these outages were fixed, so
I could observe it working; however I lost that race.
The outages were triggered by work on the network links between our CUDN border equipment and JANET's routers. In theory, this should not have affected our connectivity, because traffic should have seamlessly moved to use our other JANET uplink.
However, the routing changes propagated further than expected: they appeared to one of JANET's connectivity providers as route withdrawals and readvertisements. If more than a few ups and downs happened within a number of minutes, our routes were deemed to be flapping, triggering the flap-damping protection mechanism. Flap-damping meant our routes were ignored for 20 minutes by JANET's connectivity provider.
Addressing this required quite a lot of back-and-forth between network engineers in three organizations. It hasn't been completely eliminated, but the flap-damping has been made less sensitive, and we have amended our border router work processes to avoid multiple up/down events.
]]>As part of our planning for more eagerly rolling out IPv6, we concluded that our existing allocation from JISC (2001:630:210::/44) would not be large enough. There are a number of issues:
A typical allocation to a department might be a /56, allowing for 256 subnets within the department - the next smaller allocation of /60 is too small to allow for future growth. We only had space for 2048 x/56 allocations, or many fewer if we needed to make any /52 allocations for large institutions.
There is nowhere near enough room for ISP-style end-user allocations, such as a /64 per college bedroom or a /64 per device on eduroam.
As a result, we have asked RIPE NCC (the European regional IP address registry) to become an LIR (local internet registry) in our own right. This entitles us to get our own provider-independent ISP-scale IPv6 allocations, amongst other things.
We have now been allocated 2a05:b400::/32 and we will start planning to roll out this new address range and deprecate the old one.
We do not currently have any detailed plans for this process; we will make further announcements when we have more news to share. Any institutions that are planning to request IPv6 allocations might want to wait until the new prefix is available, or talk to networks@uis.cam.ac.uk if you have questions.
The first bit of technical setup for the new address space is to
create the reverse DNS zone, 0.0.4.b.5.0.a.2.ip6.arpa
. This is now
present and working on our DNS servers, though it does not yet contain
anything interesting! We have updated the sample stealth secondary
nameserver configuration to include this new
zone. If you are using the catalog zone
configuration your nameserver will already have
the new zone.
Edited to add: Those interested in DNSSEC might like to know that this new reverse DNS zone is signed with ECDSA P256 SHA256, whereas our other zones are signed with RSA SHA1. As part of our background project to improve DNSSEC key management, we are going to migrate our other zones to ECDSA as well, which will reduce the size of our zones and provide some improvement in cryptographic security.
]]>Some of you need to take action to ensure your validating resolvers are properly configured.
There is more information at https://www.icann.org/resources/pages/ksk-rollover
ICANN have started publishing IP addresses of resolvers which are providing RFC 8145 trust anchor telemetry information that indicates they do not yet trust the new KSK. The announcement is at https://mm.icann.org/pipermail/ksk-rollover/2018-June/000418.html
IP addresses belonging to our central DNS resolvers appear on this list: 2001:630:212:8::d:2 and 2001:630:212:12::d:3
ICANN's data says that they are getting inconsistent trust anchor telemetry from our servers. Our resolvers trust both the old and new keys, so their TAT signals are OK; however our resolvers are also relaying TAT signals from other validating resolvers on the CUDN that only trust the old key.
I am going to run some packet captures on our resolvers to see if I can track down where the problem trust anchor telemetry signals are coming from, so that I can help you to fix your resolvers before the rollover.
]]>ucam.biz
zone, we are going to enter them into the IP Register
database, so that these non-CUDN IP addresses appear directly in the
cam.ac.uk
zone.
There are a few reasons for this:
Both the ucam.biz
and IP Register setups are a bit
fiddly, but the database is more easily scripted;
It reduces the need for us to set up separate HTTP redirections on the web traffic managers;
It reduces problems with ACME TLS certificate authorization at off-site web hosting providers;
It is closer to what we have in mind for the future.
The new setup registers off-site IP addresses in an OFF-SITE
mzone,
attached to an off-site
vbox. The addresses are associated with web
site hostnames using aname
objects. This slightly round-about
arrangement allows for IP addresses that are used by multiple web
sites.
cam.ac.uk
.
]]>
The serve-stale vulnerability (CVE-2018-5737) is the one that we encountered on our live servers on the 27th March.
There are still some minor problems with serve-stale
which will be
addressed by the 9.12.2 release, so I plan to enable it after the next
release.
On the server side, the prepared transaction is a JSON-RPC request blob which can be updated with HTTP PUT or PATCH. Ideally the server should be able to verify that the result of the PATCH is a valid JSON-RPC blob so that it doesn't later try to perform an invalid request. I am planning to do API validity checks using JSON schema.
This design allows the prepared transaction storage to be just a simple JSON blob store, ignorant of what the blob is for except that it has to match a given schema. (I'm not super keen on nanoservices so I'll just use a table in the ipreg database to store it, but in principle there can be some nice decoupling here.)
It also suggests a more principled API design: An immediate
transaction (typically requested by an API client) might look like the
following (based on JSON-RPC version 1.0 system.multicall
syntax):
{ jsonrpc: "2.0", id: 0, method: "rpc.transaction", params: [ { jsonrpc: "2.0", id: 1, method: ... }, { jsonrpc: "2.0", id: 2, method: ... }, ... ] }
When a prepared transaction is requested (typically by the browser UI) it will look like:
{ jsonrpc: "2.0", id: 0, method: "rpc.transaction", params: { prepared: "#" } }
The "#" is a relative URI referring to the blob stored on the JSON-RPC endpoint (managed by the HTTP methods other than POST) - but it could in principle be any URI. (Tho this needs some thinking about SSRF security!) And I haven't yet decided if I should allow an arbitrary JSON pointer in the fragment identifier :-)
If we bring back rpc.multicall
(JSON-RPC changed the reserved prefix
from system.
to rpc.
) we gain support for prepared
non-transactional batches. The native batch request format becomes a
special case abbreviation of an in-line rpc.multicall
request.
serve-stale
on our recursive DNS servers, and after a few hours one of them
crashed messily. The automatic failover setup handled the crash
reasonably well, and I disabled serve-stale
to avoid any more
crashes.
How did this crash slip through our QA processes?
My test server is the recursive resolver for my workstations, and the primary master for my personal zones. It runs a recent development snapshot of BIND. I use it to try out new features, often months before they are included in a release, and I help to shake out the bugs.
In this case I was relatively late enabling serve-stale
so I was
only running it for five weeks before enabling it in production.
It's hard to tell whether a longer test at this stage would have exposed the bug, because there are relatively few junk queries on my test server.
Usually when I roll out a new version of BIND, I will pre-heat the cache of an upgraded standby server before bringing it into production. This involves making about a million queries against the server based on a cache dump from a live server. This also serves as a basic smoke test that the upgrade is OK.
I didn't do a pre-heat before enabling serve-stale
because it was
just a config change that can be done without affecting service.
But it isn't clear that a pre-heat would have exposed this bug because the crash required a particular pattern of failing queries, and the cache dump did not contain the exact problem query (though it does contain some closely related ones).
An alternative might be to use live traffic as test data, instead of a
static dump. A bit of code could read a dnstap
feed on a live
server, and replay the queries against another server. There are two
useful modes:
test traffic: replay incoming (recursive client-facing) queries; this reproduces the current live full query load on another server for testing, in a way that is likely to have reproduced yesterday's crash.
continuous warming: replay outgoing (iterative Internet-facing) queries; these are queries used to refill the cache, so they are relatively low volume, and suitable for keeping a standby server's cache populated.
There are a few cases where researchers have expressed interest in DNS
query data, of either of the above types. In order to satisfy them we
would need to be able to split a full dnatap
feed so that recipients
only get the data they want.
This live DNS replay idea needs a similar dnstap
splitter.
A few hours after the item below, we disabled the new serve-stale
feature following problems on one of our recursive DNS servers. We are
working with ISC.org to get
serve-stale
working better.
Original item follows:
The DNS servers are now running BIND 9.12.1. This version fixes an interoperability regression that affected resolution of bad domains with a forbidden CNAME at the zone apex.
We have also enabled the new serve-stale
feature, so that
when a remote DNS server is not available, our resolvers will return
old answers instead of a failure. The max-stale-ttl
is set to
one hour, which should be long enough to cover short network problems,
but not too long to make malicious domains hang around long after they
are taken down.
In other news, the DNS rebuild scripts (that run at 53 minutes past each hour) have been amended to handle power outages and server maintenance more gracefully. This should avoid most of the cases where the DNS build has stopped running due to excessive caution.
]]>Our central server network spans four sites across Cambridge, so it has a decent amount of resilience against power and cooling failures, and although it is a single layer two network, it is using some pretty fancy Cisco Nexus switches to provide plenty of redundant connectivity.
We have four recursive DNS servers, one at each site, usually two live and two hot spare. They are bare metal machines, which are intended to be able to boot up and provide service even if everything else is broken, provided they have power and cooling and network in at least one site.
The server network has several VLANs, and our resolver service addresses are on two of them: 131.111.8.42 is on VLAN 808, and 131.111.12.20 is on VLAN 812. So that any of the servers can provide service on either address, their switch ports are configured to deliver VLAN 808 untagged (so the servers can be provisioned using PXE booting without any special config) and VLAN 812 tagged.
There is strict reverse path filtering on the server network routers, so I have to make sure my resolvers use the correct VLAN depending on the source address. The trick is to use policy routing to match source addresses, since the normal routing table only looks at destination addresses.
The servers run Ubuntu, so this is configured in /etc/network/interfaces
by adding a couple of up
and down
commands. Here's an example;
there are four similar blocks in the config, for VLAN 808 and VLAN
812, and for IPv4 and IPv6.
iface em1.812 inet static address 131.111.12.{{ ifnum }} netmask 24 up ip -4 rule add from 131.111.12.0/24 table 12 down ip -4 rule del from 131.111.12.0/24 table 12 up ip -4 route add default table 12 via 131.111.12.62 down ip -4 route del default table 12 via 131.111.12.62
On Sunday we had some scheduled power work in one of our machine rooms. On Monday I found that the server in that room was not answering correctly over IPv6.
The machine had mostly booted OK, but it had partially failed to configure its network interfaces: everything was there except for the IPv6 policy routing, which meant that answers over IPv6 were being sent out of the wrong interfaces and dropped by the routers.
The logs were not completely clear, but it looked like the server had booted faster than the switch that it was connected to, so it had tried to configure its network interfaces when there was no network.
One approach might have been to add a script that waits for the
network to come up in /etc/network/if-pre-up.d
. But this is likely
to be unreliable in bad situations where it is extra important that
the server boots predictably.
The other approach, suggested by David McBride, was to try disabling
IPv6 duplicate address detection. He found the dad-attempts
option
in the
interfaces(5)
man page, which looked very promising.
Edited to add: Chris Share pointed out that there is a third option:
DAD can be disabled using
sysctl net.ipv6.conf.default.accept_dad=0
which is probably simpler than individually nobbling each network interface.
I went downstairs to the machine room in our office building to try booting a server with the ethernet cable unlugged. This nicely reproduced the problem.
I then tried adding the dad-attempts
option, and booting again. The
server booted successfully!
No need for a horrible pre-up script, yay!
The ifupdown
man pages are not very good at explaining how the
program works: they don't explain the /etc/network/if-*.d
hook
scripts, nor how the dad-attempts
option works.
I dug around in its source code, and I found that ifupdown
's DAD
logic is implemented by the script /lib/ifupdown/settle-dad.sh
,
which polls the output of ip -6 address list
. If it times out while
the address is still marked "tentative" (because the network is down)
the script declares failure, and ifupdown
breaks.
The other key part is the nodad
option to ip -6 addr add
, which is
undocumented.
This made it somewhat harder to find the fix and understand it. Bah.
I've now disabled duplicate address detection on my DNS servers, though I might have gone a bit far by disabling it on my VMs as well as the recursive servers. The point of DAD is to avoid accidentally breaking the network, so it's a bit arrogant to turn it off. On the other hand, if I have misconfigured duplicate IPv6 addresses, I have almost certainly done the same for IPv4, so I have still accidentally broken the network...
]]>/update
API endpoint
that I outlined turns out to be basically
JSON-RPC 2.0, so it seems to be worth making
the new IP Register API follow that spec exactly.
However, there are a couple of difficulties wrt transactions.
The current
not-an-API list_ops
page
runs each requested action in a separate transaction. It should be
possible to make similar multi-transation batch requests with the new
API, but my previous API outline did not support this.
A JSON-RPC batch request is a JSON array of request objects, i.e. the
same syntax as I previously described for /update
transactions,
except that JSON-RPC batches are not transactional. This is good for
preserving list_ops
functionality but it loses one of the key points
of the new API.
There is a simple way to fix this problem, based on a fairly
well-known idea. XML-RPC doesn't have batch requests like JSON-RPC,
but they were retro-fitted by defining
a system.multicall
method
which takes an array of requests and returns an array of responses.
We can define transactional JSON-RPC requests in the same style, like this:
{ "jsonrpc": "2.0", "id": 0, "method": "transaction", "params": [ { "jsonrpc": "2.0", "id": 1, "method": "foo", "params": { ... } }, { "jsonrpc": "2.0", "id": 2, "method": "bar", "params": { ... } } ] }
If the transaction succeeds, the outer response contains a "result" array of successful response objects, exactly one for each member of the request params array, in any order.
If the transaction fails, the outer response contains an "error" object, which has "code" and "message" members indicating a transaction failure, and an "error" member which is an array of response objects. This will contain at least one failure response; it may contain success responses (for actions which were rolled back); some responses may be missing.
Edited to add: I've described some more refinements to this idea
]]>First, a really nice DNSSEC-related performance enhancement is RFC 8198 negative answer synthesis: BIND can use NSEC records to generate negative responses, rather than re-querying authoritative servers. Our current configuration includes a lot of verbiage to suppress junk queries, all of which can be removed because of this new feature.
Second, a nice robustness improvement: when upstream authoritative DNS servers become unreachable, BIND will serve stale records from its cache after their time-to-live has expired. This should improve your ability to reach off-site servers when there are partial connectivity problems, such as DDoS attacks against their DNS servers.
Third, an operational simplifier: by default BIND will limit journal files to twice the zone file size, rather than letting them grow without bound. This is a patch I submitted to ISC.org about three years ago, so it has taken a very long time to get included in a release! This feature means I no longer need to run a patched BIND on our servers.
Fourth, a DNSSEC automation tool, dnssec-cds
. (I mentioned this in a
message I sent to this list back in October.) This is I think my largest
single contribution to BIND, and (in contrast to the previous patch) it
was one of the fastest to get committed! There's still some more work
needed before we can put it into production, but we're a lot closer.
There are numerous other improvements, but those are the ones I am particularly pleased by. Now, what needs doing next ...
]]>This week I finished incorporating the DHCP server Ansible setup into
the ipreg
Ansible setup. (It should really have been done last year.)
The old DHCP repository was an interesting beast. It was my first
proper try with Ansible (and the UCS/UIS's first!) and it was set up
in a mad rush as part of our office move project in autumn 2013. The
combination of git
and Ansible was very effective, and using DHCP to
renumber the old staff network went surprisingly smoothly.
regpg
Both the git
server I had set up earlier in 2013 and the DHCP
service used the same setup for storing secrets: I had a tar
file
encrypted with gpg
in password-protected symmetric cipher mode.
Last year as part of the git
server refresh, I wrote a new utility
for managing secrets called regpg
.
This uses a gpg
public-key encrypted file for each secret.
It has turned out to be much more pleasant to use, and it has been adopted by a number of other teams in the UIS and the Sanger Institute. A nice success.
Along with the DHCP server refresh, I have converted the ipreg
repository to use regpg
instead of the old secrets.tar.gz.asc
.
This includes using regpg
to manage the DNSSEC private key files,
which is a major step towards automated key management, as I
looked forward to back in October.
Another thing I worked out during the git
server refresh was a
better way of configuring iptables
.
The model I use is very basic, inherited from the Hermes packet
filters that David Carter originally set up before I started to work
with him in 2002. On Hermes the filrewall configuration is a simple
formulaic list of rules that is directly fed to iptables
- no
abstractions.
The thing that makes this model simple is to only explicitly allow or block incoming TCP SYN packets; all other TCP packets are allowed, but if the SYN didn't get through, they will get rejected by the TCP state machine (rather than the packet filter). All outgoing TCP connections are implicitly allowed.
UDP isn't so simple - permitted flows have to be explicitly allowed in both directions. This isn't too bad bacause UDP tends to be simpler - just NTP, DNS, and occasionally DHCP.
For the ipreg
repository, I used Ansible Jinja2 templates to reduce
the amount of repetition in the firewall configuration. This worked
OK, except I made it too DRY - there was a confusing muddle of header
and footer and common templates, and it wasn't clear what they did.
When I refreshed the git
server, I got more relaxed about repeating
myself. Instead, I made a collection of filter modules in the form of
little templates, which each do a single thing like allow NTP, or
serve HTTP, etc.
I've expanded this scheme for the ipreg
servers, and it works very
nicely. It even reduces repetition in cases where a little module can
be independent of v4/v6. For example:
# IPtables configuration for authoritative DNS servers # {% set cudn = cudn4 %} {% set privileged = privileged4 %} {% include "iptables.header.j2" %} {% include "iptables.drop-multicast.v4.j2" %} {% include "iptables.allow-ntp.v4.j2" %} {% include "iptables.allow-ssh-admin.j2" %} {% include "iptables.allow-rndc.j2" %} {% include "iptables.serve-dns-auth.j2" %} {% include "iptables.footer.j2" %}
And one of the included files:
# serve-dns-auth # # DNS UDP responses and UDP+TCP queries from everywhere # (Responses mainly arise from refresh queries.) # -A INPUT -p udp -m udp --sport 53 -j ACCEPT -A INPUT -p udp -m udp --dport 53 -j ACCEPT -A INPUT -p tcp -m tcp --dport 53 --syn -j ACCEPT #
I have a build of BIND 9.12.0 which I expect to roll out next week.
This is a significant milestone because we will no longer need to patch BIND. We have a few patches in the current production build:
Drop packets matching a DDoS signature: I inherited this patch from Chris Thompson, and it has been mostly redundant since BIND added support for RRL (response rate limiting);
Re-sign in bigger chunks: another Chris Thompson patch, a nice tweak but not vital;
Transitional support for my minimal-qtype-any
option, which was
renamed to minimal-any
when it was incorporated upstream. I've
renamed it in our configuration so this isn't needed any more;
Automatic tuning of max-journal-size
: this patch dates back to
my revamp of the DNS servers in 2014/2015 and it has taken a very
long time (and one significant re-work) to get incorporated
upstream.
BIND 9.12 also includes a really nice DNSSEC-related performance enhancement: it can use NSEC records to synthesize negative responses, rather than re-querying authoritative servers. Our current configuration includes a lot of verbiage to suppress junk queries, all of which can be removed because of this new feature.
It also includes my dnssec-cds
utility, for automatically updating
DS records in a parent zone based on CDS records in the child zone, as
I described back in October. Deploying this in
production will be another thing to look forward to.
Once that is done, I need to sort out the much-delayed RPZ web site, to which people will be redirected when they try to visit a blocked site.
And when that is done, it will be full speed ahead on the IP Register database port to PostgreSQL.
]]>I want the new user interface to be search-oriented. The best existing example within the UIS is Lookup. The home page is mostly a search box, which takes you to a search results page, which in turn has links to per-object pages, which in turn are thoroughly hyperlinked.
You should be able to search on almost all of the fields in almost all
objects with the one search box. It does the job of the various
info_for
buttons on the old single_ops
page and the search actions
on the old table_ops
page.
The "almost" needs to be a bit deft: for example, it doesn't make
sense to match the domain in box
es (etc.) or the mzone in
v4_address
es (etc.) since for the typical user a search will match
everything, which isn't entirely useful. Instead a match should lead
to something useful like an mzone
summary page.
Ideas for special search syntax:
CIDR range - [ 11.22.33.44/24 ]
arbitrary range - [ 11.22.33.44 - 55.66.77.88 ]
match in field - [ sysadmin:fanf2 ]
In all cases the search box is repeated at the top of the page to allow you to adjust the search.
one match - straight to per-object page
multiple matches - each result needs to be something like an object type and name (and any other primary key fields), plus any other matching fields as a search snippet
address range - might need a special results layout to do the job
of the old range_ops
page
A single object's page should do the job of the old table_ops
page,
i.e. access to all the fields of the corresponding table.
As well as that it should also do the job of the higher-level pages
such as box_ops
and service_ops
, i.e. also display the relevant
entries from the joined tables such as v4_address
or srv
.
The fields should be liberally hyperlinked, typically to a search results page listing related objects, e.g. click on a hostname to list objects related to the same name, click on an email address to list objects mentioning it in the metadata, etc.
All this search functionality relates to
the /query
API endpoint
I briefly mentioned before. I expect it will be most straightforward
to make this resource look at the client's Accept:
header, so it
returns HTML to browsers and JSON to API clients. (That is, not trying
to be a heavy JavaScript single-page app.)
I'm currently not sure if it makes sense to include the hyperlinks in
the JSON version. That would suggest a greater intention to use
HATEOAS than is actually the case, because the /update
endpoint is
definitely not RESTful.
As on the old UI, on a per-object page you should be able to:
On the existing UI, objects are displayed in an editable form, but that doesn't allow for hyperlinking, so the new UI will have to be more like Lookup where you have to press a button to switch to edit mode.
It would be nice to be able to infer your intent from which fields you edit. Perhaps, if you edit nothing, you can delete; if you edit anything except the primary key, you are modifying; if you edit only the primary key, you are renaming; if you edit both, you are creating a new object. Dunno if this will be simpler or more confusing - at least,the button will need to change to make the action clear.
The new UI also needs a way to:
This is more directly discoverable than clone-and-hack, and there might not be a relevant object to clone. This might be a separate tab on a full-fat Project Light page, or a top-right link in Lookup's Lite Light version.
After hitting the update button, you are returned to the object's page for a successful create/modify/rename; for an error or a delete, you are returned to the edit page so you can either correct the error or re-create the deleted object.
You should be able to perform any of the five kinds of update either immediately, or add the change to your pending transaction. Maybe the easiest way to indicate this is with a tickybox next to the update button.
In this case you are taken to the pending transaction page, which lists the actions, and any warnings if they will not be able to complete successfully.
On this page you should be able to remove actions from the transaction, and commit it.
The pending transaction page should be another tab alongside the create object tab on a full-fat Project Light page; they should be top-right links in Lookup's Lite Light version.
The /update
API endpoint
I described before is OK if you allow JavaScript in the browser;
the Project Light style basically requires JavaScript already so this
is not an awkward additional requirement.
Some edit features are best done with some in-browser code - the DWIM button I mentioned above; editing arbitrary JSON object metadata; and adjusting the form to match the selected type of a new object.
The /query
endpoint will return either HTML UI responses or and JSON
API responses, but the query strings will have the same meaning in
both cases. The query string will be easily visible to developers in
the URL.
For /update
the API will be not so easily visible. It would be a
good idea to add an advanced mode that shows the API requests and
responses, to help developers find out how to perform specific actions
from their own manual example.
No more raw Oracle error messages!
]]>I'm following the ANAME work with great interest because it will make certain configuration problems much simpler for us. I have made some extensive ANAME review comments.
An ANAME is rather different from what the IP Register database calls
an aname
object. An aname
is a name for a set of existing IP
addresses, which can be an arbitrary subset of the combined addresses
of multiple box
es or vbox
es, whereas an ANAME copies all the
addresses from exactly one target name.
There is more about
the general problem of aliases in the IP Register database
in one of the items I posted in December. I am still unsure how the
new aliasing model might work; perhaps it will become more clear when
I have a better idea about how the existing aname
implementation and
its limitations.
This week I have been helping
Mark Andrews and Evan Hunt
to track down a bug in BIND9. The problem manifested as named
occasionally failing to re-sign a DNSSEC zone; the underlying cause
was access to uninitialized memory.
It was difficult to pin down, partly because there is naturally a lot of nondeterminism in uninitialized memory bugs, but there is also a lot of nondeterminism in the DNSSEC signing process, and it is time-dependent so it is hard to re-run a failure case, and normally the DNSSEC signing process is very slow - three weeks to process a zone, by default.
Oct 9 - latent bug exposed
Nov 12 - first signing failure
I rebuild and restart my test DNS server quite frequently, and the bug is quite rare, which explains why it took so long to appear.
Nov 18 - Dec 6 - Mark fixes several signing-related bugs
Dec 28 - another signing failure
Jan 2 - I try adding some debugging diagnostics, without success
Jan 9 - more signing failures
Jan 10 - I make the bug easier to reproduce
Mark and Evan identify a likely cause
Jan 11 - I confirm the cause and fix
The incremental re-signing code in named
is tied into BIND's core
rbtdb
data structure (the red-black tree database). This is tricky
code that I don't understand, so I mostly took a black-box approach to
try to reproduce it.
I started off by trying to exercise the signing code harder. I set up a test zone with the following options:
# signatures valid for 1 day (default is 30 days) # re-sign 23 hours before expiry # (whole zone is re-signed every hour) sig-validity-interval 1 23; # restrict the size of a batch of signing to examine # at most 10 names and generate at most 2 signatures sig-signing-nodes 10; sig-signing-signatures 2;
I also populated the zone with about 500 records (not counting DNSSEC records) so that several records would get re-signed each minute.
This helped a bit, but I often had to wait a long time before it went
wrong. I wrote a script to monitor the zone using rndc zonestatus
, so
I could see if the "next resign time" matches the zone's earliest
expiring signature.
There was quite a lot of flailing around trying to exercise the code harder, by making the zone bigger and changing the configuration options, but I was not successful at making the bug appear on demand.
To make it churn faster, I used dnssec-signzone
to construct a version
of the zone in which all the signatures expire in the next few minutes:
rndc freeze test.example dig axfr test.example | grep -v RRSIG | dnssec-signzone -e now+$((86400 - 3600 - 200)) \ -i 3600 -j 200 \ -f signed -o test.example /dev/stdin rm -f test.example test.example.jnl mv signed test.example # re-load the zone rndc thaw test.example # re-start signing rndc sign test.example
I also modified BIND's re-signing co-ordination code; normally each batch will re-sign any records that are due in the next 5 seconds; I reduced that to 1 second to keep batch sizes small, on the assumption that more churn would help - which it did, a little bit.
But the bug still took a random amount of time to appear, sometimes within a few minutes, sometimes it would take ages.
Mark (who knows the code very well) took a bottom-up approach; he ran
named
under valgrind
which identified an access to uninitialized
memory. (I don't know what led Mark to try valgrind
- whether he does
it routinely or whether he tried it just for this bug.)
Evan had not been able to reproduce the bug, but once the cause was identified it became clear where it came from.
The commit on the 9th October that exposed the bug was a change to BIND's memory management code, to stop it from deliberately filling newly-allocated memory with garbage.
Before this commit, the missing initialization was hidden by the memory fill, and the byte used to fill new allocations (0xbe) happened to have the right value (zero in the bottom bit) so the signer worked correctly.
Evan builds BIND in developer mode, which enables memory filling, which stopped him from being able to reproduce it.
I changed BIND to fill memory with 0xff which (if we were right) should provoke signing failures much sooner. And it did!
Then applying the one-character fix to remove the access to uninitialized memory made the signer work properly again.
BIND has a lot of infrastructure that tries to make C safer to use, for instance:
Run-time assertions to ensure that internal APIs are used correctly;
Canary elements at the start of most objects to detect memory overruns;
buffer
and region
types to prevent memory overruns;
A memory management system that keeps statistics on memory usage, and helps to debug memory leaks and other mistakes.
The bug was caused by failing to use buffer
s well, and hidden by the
memory management system.
The bug occurred when initializing an rdataslab
data structure, which
is an in-memory serialization of a set of DNS records. The records are
copied into the rdataslab
in traditional C style, without using a
buffer
. (This is most blatant when the code
manually serializes a 16 bit number
instead of using isc_buffer_putuint16
.) This code is particularly
ancient which might explain the poor style; I think it needs
refactoring for safety.
It's ironic that the bug was hidden by the memory management code - it's
supposed to help expose these kinds of bug, not hide them! Nowadays, the
right approach would be to link to jemalloc
or some other advanced
allocator, rather than writing a complicated wrapper around standard
malloc
. However that wasn't an option when BIND9 development started.
Memory bugs are painful.
]]>The IP Register API will provide programmatic access to the database for third-party clients; I would also like to use it for building the new user interface (i.e. dogfooding the API). My vague plan is for a modern web front end using TypeScript in the browser to directly access the API.
As is usual for modern services, the API will be based on JSON over HTTPS.
Moving a service in the DNS requires multiple actions: from the DNS UPDATE point of view, delete followed by add; from the SQL point of view, DELETE followed by INSERT, etc. These have to be performed as an atomic transaction to avoid outages due to queries that happen between delete and add.
A transaction in the UI needs to correspond to a transaction in the API, and in SQL, and in the DNS.
This sort-of implies that some common API styles aren't a good fit:
Simple RPC doesn't fit because we effectively want to make multiple RPCs in one transaction;
REST doesn't fit because we want one request to affect multiple resources.
Instead, the style I favour is more like a DNS UPDATE request, i.e. a list of update actions, although each action will be higher-level.
The API basically needs two endpoints:
/query
for read-only lookups and searches;
/update
for transactional modifications.
The /query
endpoint will accept GET requests with URL-encoded lookup
and/or search parameters (details TBD).
I outlined below that the user interface needs two kinds of update: a one-shot update for simple cases; and a shopping-cart style UI for constructing more complicated transactions piecemeal.
The shopping-cart UI implies the need for some per-user state in the database that saves the user's pending transaction. I want an edit action in the UI to translate to an edit action on the server, and not simply upload a replacement copy of the pending transaction. When a user has multiple active instances of the UI open, their state will get out of sync (unless we do complicated things with web sockets) so it's likely that replacing the pending transaction will a overwrite a fresh version with a stale version, whereas an edit action should be safer.
A good way to describe edits to pending transactions is JSON Patch, RFC 6902, since it is standard and simple.
The /update
endpoint will support the following HTTP methods:
GET a copy of the pending transaction. When there is no pending
transaction it will return an empty array of actions, []
.
PUT a complete replacement of the pending transaction.
PATCH to edit the pending transaction with JSON patch.
POST a zero-byte body, to perform the pending transaction.
POST a transaction to perform it immediately, without affecting the pending transaction. This is the normal method for programmatic API calls, and for simple one-shot UI actions.
An update request is an JSON array of update actions; each action is described by a JSON object (details TBD). It should be easy to translate a DNS UPDATE request into a JSON-over-HTTPS update request.
But also, an update request looks a bit like a JSON Patch request. I hope this won't be confusing!
In general an action will need to return some result, such as when automatically allocating IP addresses.
This means the response should be an array of results corresponding to
the array of actions. A result might be OK
(possibly with additional
data), or an error, or skip
if another action failed and this action
was rolled back.
A consequence of this is that the HTTP status code won't express the application-level status like REST APIs often do, but instead it will just refer to the HTTP-level status.
This framework is simple enough that I hope I won't need a top-level revision number. Instead I plan to just extend the query parameters and the update actions as required.
The plans I have for changing the data model mostly involve relaxing the rules, e.g. making the metadata fields optional and generalizing them, which is easy to do compatibly.
Initially there will need to be separate actions for boxes and vboxes; when this distinction is abolished, it should be possible to make the actions into close synonyms.
A more tricky area is cross-mzone aliases; we can probably restrict the current unprivileged cases before opening up the API, in order to avoid getting into an even more sticky situation.
]]>ora2pg
to do a quick export
of the IP Register database from Oracle to PostgreSQL. This export
included an automatic conversion of the table structure, and the
contents of the tables. It did not include the more interesting parts
of the schema such as the views, triggers, and stored procedures.
Before installing ora2pg
, I had to install the Oracle client
libraries. These are not available in Debian, but Debian's ora2pg
package is set up to work with the following installation process.
Get the Oracle Instant Client RPMs
from Oracle's web site. This is a free download, but you will need to create an Oracle account.
I got the basiclite
RPM - it's about half the size of the
basic
RPM and I didn't need full i18n. I also got the sqlplus
RPM so I can talk to Jackdaw directly from my dev VMs.
The libdbd-oracle-perl
package in Debian 9 (Stretch) requires
Oracle Instant Client 12.1. I matched the version installed on
Jackdaw, which is 12.1.0.2.0.
Convert the RPMs to debs (I did this on my workstation)
$ fakeroot alien oracle-instantclient12.1-basiclite-12.1.0.2.0-1.x86_64.rpm $ fakeroot alien oracle-instantclient12.1-sqlplus-12.1.0.2.0-1.x86_64.rpm
Those packages can be installed on the dev VM, with libaio1
(which is required by Oracle Instant Client but does not appear in
the package dependencies), and libdbd-oracle-perl
and ora2pg
.
sqlplus
needs a wrapper script that sets environment variables
so that it can find its libraries and configuration files. After
some debugging I foud that although the documentation claims that
glogin.sql
is loaded from $ORACLE_HOME/sqlplus/admin/
in fact
it is loaded from $SQLPATH
.
To configure connections to Jackdaw, I copied tnsnames.ora
and
sqlnet.ora
from ent
.
ora2pg
By default, ora2pg
exports the table definitions of the schema we
are interested in (i.e. ipreg
). For the real conversion I intend to
port the schema manually, but ora2pg
's automatic conversion is handy
for a quick trial, and it will probably be a useful guide to
translating the data type names.
The commands I ran were:
$ ora2pg --debug $ mv output.sql tables.sql $ ora2pg --debug --type copy $ mv output.sql rows.sql $ table-fixup.pl <tables.sql >fixed.sql $ psql -1 -f functions.sql $ psql -1 -f fixed.sql $ psql -1 -f rows.sql
The fixup script and SQL functions were necessary to fill in some gaps
in ora2pg
's conversion, detailed below.
Oracle treats the empty string as equivalent to NULL but PostgreSQL does not.
This affects constraints on the lan
and mzone
tables.
The Oracle substr
function supports negative offsets which index
from the right end of the string, but PostgreSQL does not.
This affects subdomain constraints on the unique_name
,
maildom
, and service
tables. These constraints should be
replaced by function calls rather than copies.
The ipreg
schema uses raw
columns for IP addresses and
prefixes; ora2pg
converted these to bytea
.
The v6_prefix
table has a constraint that relies on implicit
conversion from raw
to a hex string. PostgreSQL is stricter
about types, so this expression needs to work on bytea
directly.
There are a number of cases where ora2pg
represented named
unique constraints as unnamed constraints with named indexes.
This unnecessarily exposes an implementation detail.
There were a number of Oracle functions which PostgreSQL doesn't
support (even with orafce
), so I implemented them in the
functions.sql
file.
The mzone_co
, areader
, and registrar
tables reference the
pers
table in the jdawadm
schema. These foreign key
constraints need to be removed.
There is a weird bug in ora2pg
which mangles the regex
[[:cntrl:]]
into [[cntrl:]]
This is used several times in the ipreg
schema to ensure that
various fields are plain text. The regex is correct in the schema
source and in the ALL_CONSTRAINTS
table on Jackdaw, which is why
I think it is an ora2pg
bug.
There's another weird bug where a regexp_like(string,regex,flags)
expression is converted to string ~ regex, flags
which is nonsense.
There are other calls to regexp_like()
in the schema which do
not get mangled in this way, but they have non-trivial string
expressions whereas the broken one just has a column name.
The export of the data from Oracle and the import to PostgreSQL took an uncomfortably long time. The SQL dump file is only 2GB so it should be possible to speed up the import considerably.
]]>There is a lot of infrastructure work to do before I am in a position to make changes - principally, porting from Oracle to PostgreSQL, and developing a test suite so I can make changes with confidence.
Still, it's worth writing down my thoughts so far, so colleagues can see what I have in mind, and so we have some concrete ideas to discuss.
I expect to add to this list as thoughts arise.
aname
sIt's worth unpacking "Jackdaw" a bit.
There is Jackdaw the service, consisting of a number of Linux servers
that host an Oracle DBMS and Apache mod_perl
application framework.
In the Oracle DBMS there are a number of databases: a production
database that is also called jackdaw
, which is occasionally cloned
for operational reasons to make jdawdev
, jdawtest
, etc.
The original user of the Jackdaw database was the user administration system; the IP Register database is a separate schema within the same database.
The plan is to make the IP Register database its own self-contained system hosted on PostgreSQL.
Instead of multiple databases within a DBMS, there will be just one. The dev and test databases will be on separate Linux systems isolated from the production servers. (I'm currently working on a dev system using a cluster of VMs hosted on my workstation.)
Breaking away from the user-admin database makes it possible to use multiple schemas within the IP Register database. There's an obvious place where this can make a big improvement.
At the moment, the Perl IP Register front-ends pass around a $prefix
which privileged users can set to choose whether they are working at
their normal privilege level (my_
) or in read-all mode (ra_
) or in
access-all-areas mode (all_
). This means that most of the SQL
queries are full of string interpolations, which means the code is not
trivially obviously free of SQL injection bugs.
If each set of access control views is moved to its own schema, then
the database front end can can set PostgreSQL's schema search path to
choose between my
, ra
, and all
schemas, and use static SQL
instead of interpolating $prefix
.
Dean Rasheed tells me I should look at row-level security as a better access control mechanism than IP Register's existing updatable views. In particular, it would allow us to give users read access to objects which they can't update.
I like the sound of this a lot, but it will require a lot more thought to work out how to make good use of it.
There are a few cases where the IP Register database makes use of the fact that it is just a schema within the wider Jackdaw database.
Firstly, the mzone_co
, areader
, and registrar
tables refer to
people by their CRSID; there are constraints tying these tables to the
pers
table in the user-admin database. I will have more to say about
this below.
Secondly, the Gossamer system which handles the provisioning of Falcon web sites (and used to handle MWS sites) has feet in both camps: the authoritative details of the web sites are held in the user-admin database, and the DNS-specific information is copied to the IP Register database. The current provisioning scripts take advantage of the fact that these are actually the same database, and use transactions that span both user-admin and IP Register databases when provisioning sites.
Those are the only cases I am currently aware of...
There's a lot of unnecessary bureaucracy maintaining the IP Register access control lists.
Instead of manual maintenance by UIS staff, the access control lists should come from Lookup groups via LDAP, so each institution can maintain their own access list in the same way as they maintain other groups.
Currently I think that it should require a quick manual process for an
existing mzone_co
to mirror a new ACL from Lookup to the IP Register
database. This is because control over the DNS gives you huge amounts
of privilege, and I don't want a compromise of Lookup to automatically
lead to compromise of the DNS.
We currently have a rough prototype implementation of TOTP for registrar access to the IP Register database.
This should be made available as an option for all mzone_co
s.
The TOTP secrets need to be added to the database, with a mechanism allowing IT staff within an institution to reset each others TOTP secrets wihtout having to talk to the UIS.
There's no real need or benefit from Gossamer using transactions that span the user-admin and IP Register databases - all the other provisioning systems that hang off Jackdaw are loosely coupled and eventually consistent, and this is true for the other web provisioning side of Gossamer.
We have a number of special interfaces to IP Register that (like Gossamer) need special privilege: the MWS3 API; the IP filter interlock; the RPZ pruner; ... These should all be refactored to follow a common set of design principles to be determined.
In every-day usage the IP Register database falls awkwardly between two stools: it isn't just a DNS backend, because each entry in the database requires certain metadata which isn't exported to the DNS; but the schema for this metadata is frequently too impoverished to be helpful for much of the information we want to record about DNS registrations.
As far as I can tell, this aspect of the IP Register database came from the manual / semi-automated practices of the IP Register team in the 1990s, and it has not been re-worked since then.
The PostgreSQL jsonb
data type allows us to attach schemaless data
to a row. I would like to replace all the IP Register metadata fields
that aren't used by the database with a jsonb
column.
All tables have a remarks
column, which is the basic metadata to be
replaced by jsonb
; other fields to fold into it include purpose
,
equipment
, location
, sysadmin
, etc. usw.
Questions of what metadata to ask for then become a matter for the user interface.
box
vs vbox
The differences between boxes and vboxes are:
boxes and vboxes have different un-constrained metadata fields;
vboxes are hosted on some other box or vbox, as recorded in the
vbox_box
table.
After the jsonb
metadata change, the first difference goes away.
The second difference also becomes moot, because a box is equivalent
to a vbox without an entry in the vbox_box
table.
So this distinction should be eliminated.
Another hangover from the 1990s is the IP Register representation of LANs and subnets. It seems to be designed to make it convenient, when registering a box, to provide manual configuration details (router, subnet mask, name servers) to the admin of the box.
In practice an IP Register LAN typically corresponds to a VLAN as configured into the relevant network equipment - but the database does not record VLAN numbers. It seems to be a half-finished feature.
The IP Register database currently has bare minimum support for exporting static registrations to our DHCP servers: you can tie an IP address to a MAC address and a DHCP group, and that's it.
I would like to be able to represent the rest of our DHCP configuration in the database too: DHCP-enabled subnets, dynamic pools, etc.
I would like an IP Register LAN to correspond to a VLAN from the
network equipment point of view and a shared-network
from the DHCP
point of view, with a flag determining whether it is exported to the
DHCP servers. We should also be able to match up the contents of the
IP Register database to the router configurations more automatically.
The LAN rubric stuff then becomes useful as DHCP options.
I would also like to be able to represent DHCP pools in the database. In this area we need to be thinking about replacing old ISC DHCP with ISC Kea - the latter is database-backed which might or might not mean we will benefit from tying it directly to the IP Register database.
There are a couple of annoying limitations with the way the IP Register database models IPv6 subnets.
It requires the subnet prefix length to be 64 bits. We generally use long prefixes for link subnets, so we want to be able to represent them as such in the database.
We have a convention for IPv6 service identifiers, using addresses
of the form prefix::service:instance
where prefix is 64 bits,
and service and instance are 16 bits. We currently use a wiki page
for keeping track of service identifiers; this should be in the
database.
In IPv4, LAN names are associated with v4_address
objects, so a
subnet can be split between multiple logical LANs, and LANs can be
updated by mzone_co
s. In IPv6, the LAN is associated with the
v6_prefix
and they require privileged access to update. This
should be delegated.
Ideally it should be possible to associate a LAN with a pair of a
v6_prefix
and a service identifier, so that static IPv6
addresses can be allocated automatically in a sensible way.
aname
sThe only delegated mechanism we have for making DNS aliases is with CNAMEs.
Any interesting alias setups have to be done with aname
objects,
which can only be created by privileged IP Register users. This
includes things like service aliases that combine multiple vboxes
(which we use for the ppsw
and recdns
services), bare www-less web
site names, and aliases for off-site services.
We also use aname
s as interlocks, to prevent records from being
deleted when there is some network configuration that refers to them.
(This is the ipfilter
mechanism.)
aname
refinementCNAME quotas should be abolished.
We should have an alias mechanism that is published in the DNS as the
addresses of the target, to avoid the restrictions on CNAME records.
We have something like this for domains hosted on the MZS; in the
future it should be based on
standard ANAME records
- which are not the same thing as IP Register aname
s.
The IP Register aname
table should be renamed to avoid confusion
with standard ANAMEs. (Renamed to what, though?)
I would like to delegate the ability to create IP Register aname
s,
though this may hit a snag with the current mutable view acess control
mechanism.
Our process for setting up off-site servers is currently very bureaucratic. To a large extent this is because it requires awkward work-arounds for limitations in the IP Register database.
To fix this, we should allow staff to set up arbitrary CNAMEs, arbitrary standard ANAMEs, and arbitrary address records.
This is a knotty area, and I'm not entirely sure what the right approach is.
At the moment, staff who have access to multiple mzone
s can create
objects that span mzone
s without restriction - e.g. a name from one
mzone
pointing at an address in another mzone
.
I'm inclined to think this is a bad idea, because it has awkward edge
cases where not all staff in a group have access to the same mzone
s,
and when mzone
s split or merge.
Instead, ordinary users should only be able to create objects that are
entirely within an mzone
. If they need to create mzone-spanning
links they can use unrestricted aliases.
Privileged cross-mzone objects might still be used for ipfilter
interlocks, and maybe for Falcon / MWS aliases.
A large proportion of the non-database records in the cam.ac.uk
zone
are TXT records. These fall into roughly three areas:
SPF records, which are managed by a special-purpose script;
DKIM / DMARC records;
Microsoft / Google / etc. domain authentication records.
SPF records are closely tied to mail domains. They should be moved
into the database, hooked into the maildom
table similarly to the
mx
table, to replace the current ad-hoc configuration file.
Staff should be able to create arbitrary TXT records, which would cover the other cases listed above. I'm not sure if it's worth implementing special cases for DMARC and DKIM - maybe syntax checks in the user interface will be enough.
The current IP Register database has a number of subroutines for converting IP addresses between packed octet strings (for storage in Oracle) and textual form (for presentation to users and export to the DNS).
ip4r
The ip4r
PostgreSQL extension provides native data types for IP
addresses and subnets. Using this would allow us to eliminate a lot
of awkward custom code.
So far this is just one person's thoughts, and I have a fairly limited and unusual perspective on the database. Because of this I'm very keen to hear other people's ideas and feedback: Do you like what I wrote above? Did I miss anything out? Do you have any suggestions?
Please send email to ip-register@uis.cam.ac.uk!
]]>Sadly the documentation for how to do this is utterly appalling, so here's a rant.
The Debian installer documentation, appendix B.
https://www.debian.org/releases/stable/amd64/apbs02.html.en
Some relevant quotes:
Putting it in the correct location is fairly straightforward for network preseeding or if you want to read the file off a floppy or usb-stick. If you want to include the file on a CD or DVD, you will have to remaster the ISO image. How to get the preconfiguration file included in the initrd is outside the scope of this document; please consult the developers' documentation for debian-installer.
Note there is no link to the developers' documentation.
If you are using initrd preseeding, you only have to make sure a file named preseed.cfg is included in the root directory of the initrd. The installer will automatically check if this file is present and load it.
For the other preseeding methods you need to tell the installer what file to use when you boot it. This is normally done by passing the kernel a boot parameter, either manually at boot time or by editing the bootloader configuration file (e.g.
syslinux.cfg
) and adding the parameter to the end of the append line(s) for the kernel.
Note that we'll need to change the installer boot process in any case, in order to skip the interactive boot menu. But these quotes suggest that we'll have to remaster the ISO, to edit the boot parameters and maybe alter the initrd.
So we need to guess where else to find out how to do this.
https://wiki.debian.org/DebianInstaller
This suggests we should follow https://wiki.debian.org/DebianCustomCD
or use simple-cdd
.
simple-cdd
I tried simple-cdd
but it failed messily.
It needs parameters to select the correct version (it defaults to Jessie) and a local mirror (MUCH faster).
$ time simple-cdd --dist stretch \ --debian-mirror http://ftp.uk.debian.org/debian [...] ERROR: missing required packages from profile default: less ERROR: missing required packages from profile default: simple-cdd-profiles WARNING: missing optional packages from profile default: grub-pc grub-efi popularity-contest console-tools console-setup usbutils acpi acpid eject lvm2 mdadm cryptsetup reiserfsprogs jfsutils xfsprogs debootstrap busybox syslinux-common syslinux isolinux real 1m1.528s user 0m34.748s sys 0m1.900s
Sigh, looks like we'll have to do it the hard way.
Eventually I realise the hard version of making a CD image without
simple-cdd
is mostly about custom package selections, which is not
something I need.
This article is a bit more helpful...
https://wiki.debian.org/DebianInstaller/Preseed
It contains a link to...
https://wiki.debian.org/DebianInstaller/Preseed/EditIso
That requires root privilege and is a fair amount of faff.
That page in turn links to...
https://wiki.debian.org/DebianInstaller/Modify
And then...
https://wiki.debian.org/DebianInstaller/Modify/CD
This has a much easier way of unpacking the ISO using bsdtar
, and
instructions on rebuilding a hybrid USB/CD ISO using xorriso
. Nice.
Most of the rest of the page is about changing package selections which we already determined we don't need.
OK, so we have used bsdtar
to unpack the ISO, and we can see various
boot-related files. We need to find the right ones to eliminate the
boot menu and add the preseed arguments.
There is no syslinux.cfg
in the ISO so the D-I documentation's
example is distressingly unhelpful.
I first tried editing boot/grub/grub.cfg
but that had no effect.
There are two boot mechanisms on the ISO, one for USB and one for
CD/DVD. The latter is in isolinux/isolinux.cfg
.
Both must be edited (in similar but not identical ways) to get the effect I want regardless of the way the VM boots off the ISO.
Unpacking and rebuilding the ISO takes less than 3 seconds on my workstation, which is acceptably fast.
]]>Jackdaw and Raven handle authentication, so the IP Register database
only needs to concern itself with access control. It does this using
views defined with check option
, as is briefly described in
the database overview and visible in the
SQL view DDL.
There are three levels of access to the database:
the registrar
table contains privileged users (i.e. the UIS
network systems team) who have read/write access to everything via
the views with the all_
prefix.
the areader
table contains semi-privileged users (i.e. certain
other UIS staff) who have read-only access to everything via the
views with the ra_
prefix.
the mzone_co
table contains normal users (i.e. computer officers
in other institutions) who have read-write access to their
mzone(s) via the views with the my_
prefix.
Apart from a few special cases, all the underlying tables in the database are available in all three sets of views.
The first part of the view definitions
is where the IP Register database schema is tied to the authenticated
user. There are two kinds of connection: either a web connection
authenticated via Raven, or a direct sqlplus
connection
authenticated with an Oracle password.
SQL users are identified by Oracle's user
function; Raven users are
obtained from the sys_context()
function, which we will now examine
more closely.
We are fortunate that support for create view with check option
was
added to PostgreSQL by our colleague Dean Rasheed.
The sys_context()
function is a bit more interesting.
Jackdaw's mod_perl
-based API is called WebDBI, documented at
https://jackdaw.cam.ac.uk/webdbi/
There's some discussion of authentication and database connections at https://jackdaw.cam.ac.uk/webdbi/webdbi.html#authentication and https://jackdaw.cam.ac.uk/webdbi/webdbi.html#sessions but it is incomplete or out of date; in particular it doesn't mention Raven (and I think basic auth support has been removed).
The interesting part is the description of sessions. Each web server process makes one persistent connection to Oracle which is re-used for many HTTP requests. How is one database connection securely shared between different authenticated users, without giving the web server enormously privileged access to the database?
Instead of mod_ucam_webauth
, WebDBI has its own implementation of
the Raven protocol - see jackdaw:/usr/local/src/httpd/Database.pm
.
This mod_perl
code does not do all of the work; instead it calls
stored procedures to complete the authentication. On initial login it
calls raven_auth.create_raven_session()
and for a returning user
with a cookie it calls raven_auth.use_raven_session()
.
These raven_auth
stored procedures set the authenticated user that
is retrieved by the sys_context()
call in the IP Register views -
see jackdaw:/usr/local/src/httpd/raven_auth/
.
Most of the logic is written in PL/SQL, but there is also an external
procedure written in C which does the core cryptography - see
jackdaw:/usr/local/oracle/extproc/RavenExtproc.c
.
On the whole I like Jackdaw's approach to preventing the web server from having too much privilege, so I would like to keep it, though in a simplified form.
As far as I know, PostgreSQL doesn't have anything quite like
sys_context()
with its security properties, though you can get
similar functionality using PL/Perl.
However, in the future I want more heavy-weight sessions that have more server-side context, in particular the "shopping cart" pending transaction.
So I think a better way might be to have a privileged session table,
keyed by the user's cookie and containing their username and jsonb
session data, etc. This table is accessed via security definer
functions, with something like Jackdaw's create_raven_session()
,
plus functions for getting the logged-in user (to replace
sys_context()
) and for manipulating the jsonb
session data.
We can provide ambient access to the cookie using the set session
command at the start of each web request, so the auth functions can
retrieve it using the current_setting()
function.
cam.ac.uk
domain
so that CNAMEs are no longer required; external references can refer
to IP addresses when a hostname isn't available.
One of the reasons for the old policy was that the IP Register database only knows about IP addresses on the CUDN. However, an old caveat says, "CUDN policy is not defined by this database, rather the reverse." The old policy proved to be inconvenient both for the Hostmaster team and for our colleagues around the University who requested external references. We didn't see any benefit to compensate for this inconvenience, so we have relaxed the policy.
At the moment we aren't easily able to change the structure of the IP
Register database. In order to work around the technical limitations,
when we need to make an external reference to an IP address, the
Hosmaster team will create the address records in the domain
ucam.biz
and set up a CNAME in the database from cam.ac.uk
to
ucam.biz
. This is slightly more fiddly for the Hostmaster team but
we expect that it will make the overall process easier.
I have a long-term background project to improve the way we manage our DNSSEC keys. We need to improve secure storage and backups of private keys, and updating public key digests in parent zones. As things currently stand it requires tricky and tedious manual work to replace keys, but it ought to be zero-touch automation.
We now have most of the pieces we need to support automatic key management.
regpg
For secure key storage and backup, we have a wrapper around GPG called
regpg
which makes it easier to repeatably encrypt files to a managed set
of "recipients" (in GPG terminology). In this case the recipients are the
sysadmins and they are able to decrypt the DNS keys (and other secrets)
for deployment on new servers. With regpg
the key management system will
be able to encrypt newly generated keys but not able to decrypt any other
secrets.
At the moment regpg
is in use and sort-of available (at the link below)
but this is a temporary home until I have released it properly.
Edited to link to the regpg
home page
dnssec-cds
There are a couple of aspects to DNSKEY management: scheduling the rollovers, and keeping delegations in sync.
BIND 9.11 has a tool called dnssec-keymgr
which makes rollovers a lot
easier to manage. It needs a little bit of work to give it proper support
for delegation updates, but it's definitely the way of the future. (I
don't wholeheartedly recommend it in its current state.)
For synchronizing delegations, RFC 7344 describes special CDS and CDNSKEY records which a child zone can publish to instruct its parent to update the delegation. There's some support for the child side of this protocol in BIND 9.11, but it will be much more complete in BIND 9.12.
I've written dnssec-cds
, an implementation of the parent side, which was
committed to BIND this morning. (Yay!) My plan is to use this tool for
managing our delegations to the CL and Maths. BIND isn't an easy codebase
to work with; the reason for implementing dnssec-cds
this way is (I
hope) to encourage more organizations to deploy RFC 7344 support than I
could achieve with a standalone tool.
https://gitlab.isc.org/isc-projects/bind9/commit/ba37674d038cd34d0204bba105c98059f141e31e
Until our parent zones become enlightened to the ways of RFC 7344 (e.g. RIPE, JANET, etc.) I have a half-baked framework that wraps various registry/registrar APIs so that we can manage delegations for all our domains in a consistent manner. It needs some work to bring it up to scratch, probably including a rewrite in Python to make it more appealing.
All these pieces need to be glued together, and I'm not sure how long that will take. Some of this glue work needs to be done anyway for non-DNSSEC reasons, so I'm feeling moderately optimistic.
]]>During the DNS OARC27
meeting
at the end of last week, DLV was decommissioned by emptying the
dlv.isc.org
zone. The item on the agenda was titled "Deprecating
RFC5074" - there are no slides because the configuration change was
made live in front of the meeting.
If you have not done so already, you should remove any
dnssec-lookaside
(BIND) or dlv-anchor
(Unbound) from your server
configuration.
The effect is that the reverse DNS for our IPv6 range 2001:630:210::/44 and our JANET-specific IPv4 ranges 193.60.80.0/20 and 193.63.252.0/32 can no longer be validated.
Other Cambridge zones which cannot be validated are our RFC 1918
reverse DNS address space (because of the difficulty of distributing
trust anchors); private.cam.ac.uk
; and most of our Managed
Zone Service zones. This may change because we would like to improve
our DNSSEC coverage.
ICANN announced last night that the DNSSEC root key rollover has been postponed, and will no longer take place on the 11th October. The delay is because telemetry data reveals that too many validators do not trust the new root key.
]]>private.cam.ac.uk
was set up in 2002, our DNS servers have
returned a REFUSED error to queries for private zones from outside the
CUDN. Hiding private zones from the public Internet is necessary to
avoid a number of security problems.
In March the CA/Browser Forum decided that after the 8th September 2017, certificate authorities must check CAA DNS records before issuing certificates. CAA records specify restrictions on which certificate authorities are permitted to issue certificates for a particular domain.
However, because names under private.cam.ac.uk
cannot be resolved on
the public Internet outside the CUDN, certificate authorities became
unable to successfuly complete CAA checks for private.cam.ac.uk
. The
CAA specification RFC 6844
implies that a CA should refuse to issue certificates in this
situation.
In order to fix this we have introduced a split view for
private.cam.ac.uk
.
There are now two different versions of the private.cam.ac.uk
zone: a fully-populated internal version, same as before; and a
completely empty external version.
With the split view, our authoritative servers will give different
answers to different clients: devices on the CUDN will get full
answers from the internal version of private.cam.ac.uk
, and
devices on the public Internet will get negative empty answers
(instead of an error) from the external version.
There is no change to the "stealth secondary" arrangements for
replicating the private.cam.ac.uk
zone to other DNS servers
on the CUDN.
The authoritative server list for private.cam.ac.uk
has been
pruned to include just the UIS authdns
servers which have the
split view configuration. Our thanks to the Computer Lab and the
Engineering Department for providing authoritative service until this
change.
We can use this new feature to make "stealth secondary" configurations
much shorter and lower-maintenance. Accordingly, there is now a
catz.arpa.cam.ac.uk
catalog zone corresponding to our recommended
stealth secondary configuration, and our sample BIND
configuration has been updated with notes on
how to use it.
This started off with some testing of the in-progress BIND 9.12 implementation of RFC 8198, which allows a validating DNSSEC resolver to use NSEC records to synthesize negative responses. (This spec is known as the Cheese Shop after an early draft which refers to a Monty Python sketch, https://tools.ietf.org/html/draft-wkumari-dnsop-cheese-shop / https://tools.ietf.org/html/rfc8198)
RFC 8198 is very effective at suppressing unnecessary queries especially to the root DNS servers and the upper levels of the reverse DNS. A large chunk of my DNS server configuration previously tried to help with that by adding a lot of locally-served empty zones (as specified by RFC 6761 etc.) With the cheese shop all that becomes redundant.
The other big chunk of my configuration is the stealth slave list. I have previously not investigated catalog-zones in detail, since they aren't quite expressive enough for use by our central DNS servers, and in any case their configuration is already automated. But it's just right for the stealth slave configuration on my test server (and ppsw, etc.)
Setting up a Cambridge catalog zone was not too difficult. Altogether it allowed me to delete over 100 zone configurations from my test server.
]]>localhost
entries from the cam.ac.uk
DNS zone. This change should have no
effect, except to avoid certain obscure web security risks.
RFC 1537, "Common DNS Data File
Configuration Errors", says "all domains that contain hosts should
have a localhost
A record in them." and the cam.ac.uk
zone has
followed this advice since the early 1990s (albeit not entirely
consistently).
It has belatedly come to our attention that this advice is no longer
considered safe, because localhost
can be used to subvert web
browser security policies
in some obscure situations.
Deleting our localhost DNS records should have no effect other than
fixing this security bug and cleaning up the inconsistency. End-user
systems handle queries for localhost
using their hosts
file,
without making DNS queries, and without using their domain search list
to construct queries for names like localhost.cam.ac.uk
. We verified
this by analysing query traffic on one of the central DNS resolvers,
and the number of unwanted queries was negligible, less than one every
15 minutes, out of about 1000 queries per second.
Our colleagues in the CL have set up the zone cst.cam.ac.uk
to go
with the new name, and it has been added to our sample nameserver
configuration file.
The new key (tag 20326) was published on 11th July, and validating resolvers that follow RFC 5011 rollover timing will automatically start trusting it on the 10th August. There's a lot more information about the root DNSSEC key rollover on the ISC.org blog. I have added some notes on how to find out about your server's rollover state on our DNSSEC validation page.
The DLV turndown was announced in
2015 and the dlv.isc.org
zone is
due to be emptied in 2017. You should delete any dnssec-lookaside
option you have in your configuration to avoid complaints in named
's
logs.
Annoyingly, we were relying on DLV as a stop-gap while waiting for JISC to sign their reverse DNS zones. Some of our IPv4 address ranges and our main IPv6 allocation are assigned to us from JISC. Without DLV these zones can no longer be validated.
]]>On the authoritative servers, the minimal-any
anti-DDOS
feature was developed by us and contributed to isc.org.
Happily we no longer have to maintain this as a patch.
On the recursive servers, there are a couple of notable features.
Firstly, BIND 9.11 uses EDNS cookies to identify legitimate clients so they can bypass DDoS rate limiting. Unfortunately EDNS options can encounter bugs in old badly-maintained third-party DNS servers. We are keeping an eye out for problems and if necessary we can add buggy servers to a badlist of those who can't have cookies.
Secondly, we now have support for "negative trust anchors" which provide a workaround for third party DNSSEC failures. Fortunately we have not so far had significant problems due to the lack of this feature.
]]>Note that update-policy local;
uses a well-known TSIG key name, and does
not include any IP address ACL restrictions, so it is extremely vulnerable
to attack. To mitigate this you can replace update-policy local;
with
allow-update { !{ !localhost; any; }; key local-ddns; };
This denies updates that come from everywhere except localhost, and then
allows updates with the built-in local-ddns key. For a longer explanation, see
https://kb.isc.org/article/AA-00723/0/Using-Access-Control-Lists-ACLs-with-both-addresses-and-keys.html
You can still use nsupdate -l
with this configuration.
Our master DNS server has very strict packet filters which should be effective at mitigating this vulnerability until I can update the servers.
]]>The most serious one is CVE-2017-3137 which can crash recursive servers. (It is related to the previous DNAME/CNAME RRset ordering bugs which led to security releases in January and November.)
The other vulnerabilities are in DNS64 support (which I don't think any of us use) and in the rndc control channel (which is mainly a worry if you have opened up read-only access in BIND 9.11).
More details on the bind-announce list, https://lists.isc.org/pipermail/bind-announce/2017-April/thread.html
I have patched the central DNS servers and the ppsw-specific resolvers.
]]>www.cam.ac.uk
.
As such the automated Cloudflare provisioning system described previously has been decommissioned.
]]>The most terrifying vulnerability was cross-site request forgery. If an IP Register user who was logged in to Raven clicked a malicious link, any web site could make changes to the database.
This has been fixed by adding an XSRF token. The change should be invisible in normal use; if you get an XSRF error then there is a bug, so please let the IP Register team know.
The XSRF token is a hidden field in the form containing an expiry time (12 hours), a random number (which cannot be predicted by an attacker), and an HMAC signature generated using the Jackdaw session cookie and a secret. This guarantees that a form submission was created reasonably recently from a legitimate IP Register web page.
Non-interactive clients authenticated using long-term cookies are exempt from XSRF checks.
The web user interface assumed that all requests were POST, but it did not actually check the HTTP method. This allowed an XSRF attacker to make changes to the database using only a GET request.
Now, interactive GET requests have their query parameters cleared so that they cannot make changes. This is more consistent with HTTP semantics. This change should also be invisible since the IP Register web forms only use POST.
Non-interactive clients authenticated
using long-term cookies may use GET requests for read-only list_*
actions on the list_ops
form.
Members of the IP Register team who have read/write access to the entire database must now use TOTP (time-based one-time passwords) to gain access to the web user interface, in addition to Raven authentication.
UIS passwords are promiscuously exposed to multiple computer systems, so they should not be relied on as the sole authenticator for privileged users. Before this change, if a system using UIS passwords was compromised, it would have been easy for the attacker to pwn our whole DNS, which in turn would have made it easy to compromise other systems.
This 2FA setup is a prototype, to give us some practical experience with running TOTP on a small scale. We would like to offer it more widely to other users of the IP Register database but we aren't currently able to do so.
As well as the security fixes, the IP Register web page header has been slightly adjusted.
The "clear" button is now a link, so that it doesn't provoke an XSRF failure.
The "debug" tickybox was broken. It now works, but is restricted to members of the IP Register team.
The "prefix" option which controls global read-only and read/write privileges for certain UIS staff has changed from a text box to a drop-down menu.
The "help" link has moved from the left to the right to make it more prominent.
Over the last several months, we have added SPF records for mail
domains under cam.ac.uk
which have mail hosted offsite. The
most common offsite host is Microsoft Office 365 Exchange Online, but
we have a few others using survey or mailshot services.
These SPF records are managed by the IP Register / Hostmaster team, in co-operation with the Mail Support / Postmaster team. Please email us if you would like any changes.
]]>I am in the process of patching our central DNS servers; you should patch yours too.
These bugs appear to be a similar class of error to the previous BIND CVE a couple of months ago.
]]>Our current DNS update mechanism runs as an hourly batch job. It would be nice to make DNS changes happen as soon as possible.
Instant DNS updates have tricky implications for the user interface.
At the moment it's possible to make changes to the database in between batch runs, knowing that broken intermediate states don't matter, and with plenty of time to check the changes and make sure the result will be OK.
If the DNS is updated immediately, we need a way for users to be able to prepare a set of inter-related changes, and submit them to the database as a single transaction.
(Aside: I vaguely imagine something like a shopping-cart UI that's available for collecting more complicated changes, though it should be possible to submit simple updates without a ceremonial transaction.)
This kind of UI change is necessary even is we simply run the current batch process more frequently. So we can't reasonalbly deploy this without a lot of front-end work.
Ideally I would like to keep the process of exporting the database to the DNS and DHCP servers as a purely back-end matter; the front-end user interface should only be a database client.
So, assuming we have a better user interface, we would like to be able to get instant DNS updates by improvements to the back end without any help from the front end.
PostgreSQL has a very tempting replication feature called "logical decoding", which takes a replication stream and turns it into a series of database transactions. You can write a logical decoding plugin which emits these transactions in whatever format you want.
With logical decoding, we can (with a bit of programming) treat the
DNS as a PostgreSQL replication target, with a script that looks
something like pg_recvlogical | nsupdate
.
I wrote a prototype along these lines, which is published at https://git.uis.cam.ac.uk/x/uis/ipreg/pg-decode-dns-update.git
The plugin itself works in a fairly satisfactory manner.
However it needs a wrapper script to massage transactions before they
are fed into nsupdate
, mainly to split up very large transactions
that cannot fit in a single UPDATE request.
The remaining difficult work is related to starting, stopping, and pausing replication without losing transactions. In particular, during initial deployment we need to be able to pause replication and verify that the replicated updates are faithfully reproducing what the batch update would have done. We can use the same pause/batch/resume mechanism to update the parts of the DNS that are not maintained in the database.
At the moment we are not doing any more work in this area until the other prerequisites are in place.
]]>The UIS are aiming to deploy Cloudflare in front of the University's most prominent / sensitive web sites; this service might be extended more widely to other web sites, though it is not currently clear if this will be feasible.
There is a separate document with more details of how the IP Register database and Cambridge DNS setup support Cloudflare.
]]>I am in the process of patching our central DNS servers; you should patch yours too.
(This bug was encountered by Marco Davids of SIDN Labs, and I identified it as a security vulnerability and reported it to ISC.org. You can find us in the acknowledgments section of the security advisory.)
]]>Yesterday evening, ISC.org announced a denial-of-service vulnerability in BIND's buffer handling. The crash can be triggered even if the apparent source address is excluded by BIND's ACLs (allow-query).
All servers are vulnerable if they can receive request packets from any source.
If you have not yet patched, you should be aware that this bug is now being actively exploited.
]]>All servers are vulnerable if they can receive request packets from any source.
Most machines on the CUDN are protected to a limited extent from outside attack by the port 53 packet filter. DNS servers that have an exemption are much more at risk.
http://www.ucs.cam.ac.uk/network/infoinstitutions/techref/portblock
I am in the process of patching our central DNS servers; you should patch yours too.
(This is another bug found by ISC.org's fuzz testing campaign; they have slowed down a lot since the initial rush that started about a year ago; the last one was in March.)
]]>Unfortunately the VM host system lost the RAID set that held the filesystem for our DNS master server (amongst others). We determined that it would be faster to rebuild some servers from scratch rather than waiting for more intensive RAID recovery efforts.
The DNS master server is set up so it can be rebuit from scratch without too much difficulty - all the data on its filesystem comes from our configuration management systems, and from the IP register and MZS databases.
The main effect of this is that the zone transfers following the rebuild will be full transfers from scratch - incremental transfers are not possible. There is likely to be some additional load which slows down zone transfers while everything catches up.
]]>Probably the most risky is CVE-2016-1286 which is a remote denial-of-service vulnerability in all versions of BIND without a workaround. CVE-2016-1285 can be mitigated, and probably is already mitigated on servers with a suitably paranoid configuration. CVE-2016-2088 is unlikely to be a problem.
I have updated the central DNS servers to BIND 9.10.3-P4.
I have also made a change to the DNS servers' name compression behaviour.
Traditionally, BIND used to compress domain names in responses so they
match the case of the query name. Since BIND 9.10 it has tried to preserve
the case of responses from servers, which can lead to case mismatches
between queries and answers. This exposed a case-sensitivity bug in
Nagios, so after the upgrade it falsely claimed that our resolvers were
not working properly! I have added a no-case-compress
clause to the
configuration so our resolvers now behave in the traditional manner.
Previously we were rejecting queries from outside the CUDN using DNS-level REFUSED responses; now, TCP connections from outside the CUDN are rejected at the network layer using ICMP connection refused.
This change should not have any visible effect; I am letting you know because others who run DNS servers on the CUDN might want to make a similar change, and because there is some interesting background.
For most purposes, incoming DNS queries are blocked by the JANET border packet filters. http://www.ucs.cam.ac.uk/network/infoinstitutions/techref/portblock You only really need an exemption to this block for authoritative DNS servers. If you are running recursive-only DNS servers that are exempted from the port 53 block, you should consider changing your packet filters.
The particular reason for this change is that BIND's TCP connection listener is trivially easy to flood. The inspiration for this change is a cleverly evil exploit announced by Cloudflare earlier this week which relies on TCP connection flooding. Although their particular attack doesn't work with BIND, it would still be unpleasant if anyone tried it on us.
I have published a blog article with more background and context at http://fanf.livejournal.com/141807.html
]]>authdns1.csx.cam.ac.uk
) suffered a series of DoS attacks which made
it rather unhappy. Over the last week I have developed a patch for
BIND to make it handle these attacks better.
On authdns1
we provide off-site secondary name service to a number of
other universities and academic institutions; the attack targeted
imperial.ac.uk
.
For years we have had a number of defence mechanisms on our DNS servers. The main one is response rate limiting, which is designed to reduce the damage done by DNS reflection / amplification attacks.
However, our recent attacks were different. Like most reflection / amplification attacks, we were getting a lot of QTYPE=ANY queries, but unlike reflection / amplification attacks these were not spoofed, but rather were coming to us from a lot of recursive DNS servers. (A large part of the volume came from Google Public DNS; I suspect that is just because of their size and popularity.)
My guess is that it was a reflection / amplification attack, but we were not being used as the amplifier; instead, a lot of open resolvers were being used to amplify, and they in turn were making queries upstream to us. (Consumer routers are often open resolvers, but usually forward to their ISP's resolvers or to public resolvers such as Google's, and those query us in turn.)
Because from our point of view the queries were coming from real resolvers, RRL was completely ineffective. But some other configuration settings made the attacks cause more damage than they might otherwise have done.
I have configured our authoritative servers to avoid sending large UDP packets which get fragmented at the IP layer. IP fragments often get dropped and this can cause problems with DNS resolution. So I have set
max-udp-size 1420; minimal-responses yes;
The first setting limits the size of outgoing UDP responses to an MTU which is very likely to work. (The ethernet MTU minus some slop for tunnels.) The second setting reduces the amount of information that the server tries to put in the packet, so that it is less likely to be truncated because of the small UDP size limit, so that clients do not have to retry over TCP.
This works OK for normal queries; for instance a cam.ac.uk IN MX
query
gets a svelte 216 byte response from our authoritative servers but a
chubby 2047 byte response from our recursive servers which do not have
these settings.
But ANY queries blow straight past the UDP size limit: the attack
queries for imperial.ac.uk IN ANY
got obese 3930 byte responses.
The effect was that the recursive clients retried their queries over TCP, and consumed the server's entire TCP connection quota. (Sadly BIND's TCP handling is not up to the standard of good web servers, so it's quite easy to nadger it in this way.)
We might have coped a lot better if we could have served all the attack traffic over UDP. Fortunately there was some pertinent discussion in the IETF DNSOP working group in March last year which resulted in draft-ietf-dnsop-refuse-any, "providing minimal-sized responses to DNS queries with QTYPE=ANY".
This document was instigated by Cloudflare, who have a DNS server architecture which makes it unusually difficult to produce traditional comprehensive responses to ANY queries. Their approach is instead to send just one synthetic record in response, like
cloudflare.net. HINFO ( "Please stop asking for ANY" "See draft-jabley-dnsop-refuse-any" )
In the discussion, Evan Hunt (one of the BIND developers) suggested an alternative approach suitable for traditional name servers. They can reply to an ANY query by picking one arbitrary RRset to put in the answer, instead of all of the RRsets they have to hand.
The draft says you can use either of these approaches. They both allow an authoritative server to make the recursive server go away happy that it got an answer, and without breaking odd applications like qmail that foolishly rely on ANY queries.
I did a few small experiments at the time to demonstrate that it really would work OK in the real world (unlike some of the earlier proposals) and they are both pretty neat solutions (unlike some of the earlier proposals).
So draft-ietf-dnsop-refuse-any is an excellent way to reduce the damage caused by the attacks, since it allows us to return small UDP responses which reduce the downstream amplification and avoid pushing the intermediate recursive servers on to TCP. But BIND did not have this feature.
I did a very quick hack on Tuesday to strip down ANY responses, and I deployed it to our authoritative DNS servers on Wednesday morning for swift mitigation. But it was immediately clear that I had put my patch in completely the wrong part of BIND, so it would need substantial re-working before it could be more widely useful.
I managed to get back to the Patch on Thursday. The right place to put
the logic was in the fearsome query_find()
which is
the top-level query handling function and nearly 2400 lines long! I
finished the first draft of the revised patch that afternoon (using
none of the code I wrote on Tuesday), and I spent Friday afternoon
debugging and improving it.
The result this patch which adds a minimal-qtype-any option. I'm currently running it on my toy nameserver, and I plan to deploy it to our production servers next week to replace the rough hack.
I have submitted the patch to ISC.org; hopefully something like it will be included in a future version of BIND. And it prompted a couple of questions about draft-ietf-dnsop-refuse-any that I posted to the DNSOP working group mailing list.
]]>For details, please see the announcement messages: https://lists.isc.org/pipermail/bind-announce/2016-January/thread.html
The central DNS servers have been upgraded to BIND 9.10.3-P3.
]]>If you build your own BIND packages linked to OpenSSL 1.0.1 or 1.0.2 then you should also be aware of the OpenSSL security release that occurred earlier this month. The new versions of BIND will refuse to build with vulnerable versions of OpenSSL.
For more information see the bind-announce list, https://lists.isc.org/pipermail/bind-announce/2015-December/thread.html
The central nameservers and the resolvers on the central mail relays were updated to BIND 9.10.3-P2 earlier today.
]]>newton.cam.ac.uk
145.111.131.in-addr.arpa
These have been delegated like the other domains managed by the Faculty of Mathematics.
]]>newton.cam.ac.uk
(the Isaac Newton
Institute's domain) to the Faculty of Mathematics, who have been
running their own DNS since the very earliest days of Internet
connectivity in Cambridge.
Unlike most new delegations, the newton.cam.ac.uk
domain already
exists and has a lot of records, so we have to keep them working
during the process. And for added fun, cam.ac.uk
is signed with
DNSSEC, so we can't play fast and loose.
In the absence of DNSSEC, it is mostly OK to set up the new zone, get all the relevant name servers secondarying it, and then introduce the zone cut. During the rollout, some servers will be serving the domain from the old records in the parent zone, and other servers will serve the domain from the new child zone, which occludes the old records in its parent.
But this won't work with DNSSEC because validators are aware of zone cuts, and they check that delegations across cuts are consistent with the answers they have received. So with DNSSEC, the process you have to follow is fairly tightly constrained to be basically the opposite of the above.
The first step is to set up the new zone on name servers that are
completely disjoint from those of the parent zone. This ensures that a
resolver cannot prematurely get any answers from the new zone - they
have to follow a delegation from the parent to find the name servers
for the new zone. In the case of newton.cam.ac.uk
, we are lucky that
the Maths name servers satisfy this requirement.
The second step is to introduce the delegation into the parent zone. Ideally this should propagate to all the authoritative servers promptly, using NOTIFY and IXFR.
(I am a bit concerned about DNSSEC software which does validation as a separate process after normal iterative resolution, which is most of it. While the delegation is propagating it is possible to find the delegation when resolving, but get a missing delegation when validating. If the validator is persistent at re-querying for the delegation chain it should be able to recover from this; but quick propagation minimizes the problem.)
After the delegation is present on all the authoritative servers, and
old data has timed out of caches, the new child zone can (if
necessary) be added to the parent zone's name servers. In our case the
central cam.ac.uk
name servers and off-site secondaries also serve the
Maths zones, so this step normalizes the setup for newton.cam.ac.uk
.
sample.named.conf
on Friday was to remove the
explicit configuration of the root name server hints. I was asked why,
so I thought I should explain to everyone.
BIND comes with a built-in copy of the hints, so there is no need to explicitly configure them. It is important to keep BIND up-to-date for security reasons, so the root hints should not be stale. And even if they are stale, the only negative effect is a warning in the logs.
So I regard explicitly configuring root hints as needless extra work.
It is worth noting that the H-root name server IP addresses are going to change on the 1st December 2015. We will not be making any special effort in response since normal BIND updates will include this change in due course.
There is a history of root name server IP address changes at http://root-servers.org/news.html
]]>The central name servers now have DNS zones for 10.128.0.0/9. There are not yet any registrations in this address space, so the zones are currently almost empty. We have updated the name server configuration advice to cover these new zones.
https://jackdaw.cam.ac.uk/ipreg/nsconfig/
On the CUDN the RFC 1918 address block 10.0.0.0/8 is divided in two. The bottom half, 10.0.0.0/9, is for institution-private usage and is not routed on the CUDN. The top half, 10.128.0.0/9, was previously reserved; it has now been re-assigned as CUDN-wide private address space.
To provide DNS for 10.0.0.0/8 we have a mechanism for conveniently
sharing the zone 10.in-addr.arpa
between institution-private and
CUDN-wide private uses. The arrangement we are using is similar to the
way 128.232.0.0/16
is divided between the Computer Lab and the rest
of the University.
We have two new zones for this address space,
10.in-addr.arpa
in-addr.arpa.private.cam.ac.uk
The sample nameserver configuration has been updated to include them.
Institutions that are using the bottom half, 10.0.0.0/9, should
provide their own version of 10.in-addr.arpa
with DNAME redirections to in-addr.arpa.private.cam.ac.uk
for the
CUDN-wide addresses.
If you are running BIND as a recursive DNS server you should update it urgently. We will be patching the central DNS servers this morning.
]]>The bind-announce mailing list has the formal vulnerability notification and release announcements:
The authors of BIND have also published a blog post emphasizing that there are no workarounds for this vulnerability: it affects both recursive and authoritative servers and I understand that query ACLs are not sufficient protection.
Our central DNS servers authdns* and recdns* have been patched.
]]>The update job takes a minute or two to run, after which changes are immediately visible on our public authoritative DNS servers, and on our central recursive servers 131.111.8.42 and 131.111.12.20.
We have also reduced the TTL of our DNS records from 24 hours to 1 hour. (The time-to-live is the maximum time old data will remain in DNS caches.) This shorter TTL means that users of other recursive DNS servers around the University and elsewhere will observe DNS changes within 2 hours of changes to the IP Register database.
There are two other DNS timing parameters which were reduced at the time of the new DNS server rollout.
The TTL for negative answers (in response to queries for data that is not present in the DNS) has been reduced from 4 hours to 1 hour. This can make new entries in the DNS available faster.
Finally, we have reduced the zone refresh timer from 4 hours to 30 minutes. This means that unofficial "stealth secondary" nameservers will fetch DNS updates within 90 minutes of a change being made to the IP Register database. Previously the delay could be up to 8 hours.
]]>I have already written about scripting the recursive DNS rollout. I also used Ansible for the authoritative DNS rollout. I set up the authdns VMs with different IP addresses and hostnames (which I will continue to use for staging/testing purposes); the rollout process was:
Stop the Solaris Zone on the old servers using my
zoneadm
Ansible module;
Log into the staging server and add the live IP addresses;
Log into the live server and delete the staging IP addresses;
Update the hostname.
There are a couple of tricks with this process.
You need to send a gratuitous ARP to get the switches to update their
forwarding tables quickly when you move an IP address. Solaris does
this automatically but Linux does not, so I used an explicit arping
-U
command. On Debian/Ubuntu you need the iputils-arping
package to
get a version of arping
which can send gratuitous ARPs (The arping
package is not the one you want. Thanks to Peter Maydell for helping
me find the right one!)
If you remove a "primary" IPv4 address from an interface on
Linux, it also deletes all the other IPv4 addresses on the same
subnet. This is not helpful when you are renumbering a machine. To
avoid this problem you need to set
sysctl net.ipv4.conf.eth0.promote_secondaries=1
.
The BIND configuration on my new DNS servers is rather different to the old ones, so I needed to be careful that I had not made any mistakes in my rewrite. Apart from re-reading configurations several times, I used a couple of tools to help me check.
I used bzl, the BIND zone list tool by JP Mens to get the list of configured zones from each of my servers. This helped to verify that all the differences were intentional.
The new authdns servers both host the same set of zones, which is the union of the zones hosted on the old authdns servers. The new servers have identical configs; the old ones did not.
The new recdns servers differ from the old ones mainly because I have been a bit paranoid about avoiding queries for martian IP address space, so I have lots of empty reverse zones.
I used my tool nsdiff to verify that the new DNS build scripts produce the same zone files as the old ones. (Except for the HINFO records which the new scripts omit.)
(This is not quite an independent check, because nsdiff is part of the new DNS build scripts.)
On Monday I sent out the DNS server upgrade announcement, with some wording improvements suggested by my colleagues Bob Dowling and Helen Sargan.
It was rather arrogant of me to give the expected outage times without any allowance for failure. In the end I managed to hit 50% of the targets.
The order of rollout had to be recursive servers first, since I did not want to swap the old authoritative servers out from under the old recursive servers. The new recursive servers get their zones from the new hidden master, whereas the old recursive servers get them from the authoritative servers.
The last server to be switched was authdns0, because that was the old master server, and I didn't want to take it down without being fairly sure I would not have to roll back.
The difference in running time between my recdns and authdns scripts bothered me, so I investigated and discovered that IPv4 was partially broken. Rob Bricheno helped by getting the router's view of what was going on. One of my new Linux boxes was ARPing for a testdns IP address, even after I had deconfigured it!
I fixed it by rebooting, after which it continued to behave correctly through a few rollout / backout test runs. My guess is that the problem was caused when I was getting gratuitous ARPs working - maybe I erroneously added a static ARP entry.
After that all switchovers took about 5 - 15 seconds. Nice.
I wrote a couple of scripts for checking rollout status and progress. wheredns tells me where each of our service addresses is running (old or new); pingdns repeatedly polls a server. I used pingdns to monitor when service was lost and when it returned during the rollout process.
On Tuesday shortly after 18:00, I switched over recdns1. This is our busier recursive server, running at about 1500 - 2000 queries per second during the day.
This rollout went without a hitch, yay!
Afterwards I needed to reduce the logging because it was rather too noisy. The logging on the old servers was rather too minimal for my tastes, but I turned up the verbosity a bit too far in my new configuration.
On Wednesday morning shortly after 08:00, I switched over recdns0. It is a bit less busy, running about 1000 - 1500 qps.
This did not go so well. For some reason Ansible appeared to hang when connecting to the new recdns cluster to push the updated keepalived configuration.
Unfortunately my back-out scripts were not designed to cope with a partial rollout, so I had to restart the old Solaris Zone manually, and recdns0 was unavailable for a minute or two.
Mysteriously, Ansible connected quickly outside the context of my rollout scripts, so I tried the rollout again and it failed in the same way.
As a last try, I ran the rollout steps manually, which worked OK although I don't type as fast as Ansible runs a playbook.
So in all there was about 5 minutes downtime.
I'm not sure what went wrong; perhaps I just needed to be a bit more patient...
After doing recdns0 I switched over authdns1. This was a bit less stressy since it isn't directly user-facing. However it was also a bit messy.
The problem this time was me forgetting to uncomment authdns1 from the Ansible inventory (its list of hosts). Actually, I should not have needed to uncomment it manually - I should have scripted it. The silly thing is that I had the testdns servers in the inventory for testing the authdns rollout scripts; the testdns servers were causing me some benign irritation (connection failures) when running ansible in the previous week or so. I should not have ignored this irritation and (like I did with the recdns rollout script) automated it away.
Anyway, after a partial rollout and manual rollback, it took me a few
ansible-playbook --check
runs to work out why Ansible was saying
"host not found". The problem was due to the Jinja expansion in the
following remote command, where the to
variable was set to
authdns1.csx.cam.ac.uk
which was not in the inventory.
ip addr add {{hostvars[to].ipv6}}/64 dev eth0
You can reproduce this with a command like,
ansible -m debug -a 'msg={{hostvars["funted"]}}' all
After fixing that, by uncommenting the right line in the inventory, the rollout worked OK.
The other post-rollout fix was to ensure all the secondary zones had transferred OK. I had not managed to get all of our masters to add my staging servers to their ACLs, but this was not to hard to sort out using the BIND 9.10 JSON statistics server and the lovely jq command:
curl http://authdns1.csx.cam.ac.uk:853/json | jq -r '.views[].zones[] | select(.serial == 4294967295) | .name' | xargs -n1 rndc -s authdns1.csx.cam.ac.uk refresh
After that, I needed to reduce the logging again, because the authdns servers get a whole different kind of noise in the logs!
One mistake sneaked out of the woodwork on Wednesday, with fortunately small impact.
My colleague Rob Bricheno reported that client machines on 131.111.12.0/24 (the same subnet as recdns1) were not able to talk to recdns0, 131.111.8.42. I could see the queries arriving with tcpdump, but they were being dropped somewhere in the kernel.
Malcolm Scott helpfully suggested that this was due to Linux reverse path filtering on the new recdns servers, which are multihomed on both subnets. Peter Benie advised me of the correct setting,
sysctl net.ipv4.conf.em1.rp_filter=2
On Thursday evening shortly after 18:00, I did the final switch-over of authdns0, the old master.
This went fine, yay! (Actually, more like 40s than the expected 15s, but I was patient, and it was OK.)
There was a minor problem that I forgot to turn off the old DNS update cron job, so it bitched at us a few times overnight when it failed to send updates to its master server. Poor lonely cron job.
Over the weekend my email servers complained that some of their zones had not been refreshed recently. This was because four of our RFC 1918 private reverse DNS zones had not been updated since before the switch-over.
There is a slight difference in the cron job timings on the old and new setups: previously updates happened at 59 minutes past the hour, now they happen at 53 minutes past (same as the DNS port number, for fun and mnemonics). Both setups use Unix time serial numbers, so they were roughly in sync, but due to the cron schedul the old servers had a serial number about 300 higher.
BIND on my mail servers was refusing to refresh the zone because it had copies of the zones from the old servers with a higher serial number than the new servers.
I did a sneaky nsupdate
add and delete on the relevant zones to
update their serial numbers and everything is happy again.
They say a clever person can get themselves out of situations a wise person would not have got into in the first place. I think the main wisdom to take away from this is not to ignore minor niggles, and to write rollout/rollback scripts that can work forwards or backwards after being interrupted at any point. I won against the niggles on the ARP problem, but lost against them on the authdns inventory SNAFU.
But in the end it pretty much worked, with only a few minutes downtime and only one person affected by a bug. So on the whole I feel a bit like Mat Ricardo.
]]>The immediate improvements are:
Automatic failover for recursive DNS servers. There are servers in four different locations, two live, two backup.
DNSSEC signing moved off authdns0 onto a hidden master server, with support for signing Managed Zone Service domains.
There are extensive improvements to the DNS server management and administration infrastructure:
Configuration management and upgrade orchestration moved from ad-hoc to Ansible.
Revision control moved from SCCS to git, including a history of over 20,000 changes dating back to 1990.
Operating system moved from Solaris to Linux, to make better use of our local knowledge and supporting infrastructure.
The rollout will switch over the four service addresses on three occasions this week. We are avoiding changes during the working day, and rolling out in stages so we are able to monitor each change separately.
Tuesday 10 February, 18:00 -
Wednesday 11 February, 08:00 -
Thursday 12 February, 18:00 -
There will be a couple of immediate improvements to the DNS service, with more to follow:
Automatic failover for recursive DNS servers. There are servers in three different locations, two live, one backup, and when the West Cambridge Data Centre comes online there will be a second backup location.
DNSSEC signing moved off authdns0 onto a hidden master server, with support for signing Managed Zone Service domains.
There are extensive improvements to the DNS server management and administration infrastructure:
Configuration management and upgrade orchestration moved from ad-hoc to Ansible. The expected switchover timings above are based on test runs of the Ansible rollout / backout playbooks.
Revision control moved from SCCS to git, including a history of over 20,000 changes dating back to 1990.
Operating system moved from Solaris to Linux, to make better use of our local knowledge and supporting infrastructure.
My aim is to do a forklift upgrade of our DNS servers - a tier 1 service - with negligible downtime, and with a backout plan in case of fuckups.
Our old existing DNS service is based on Solaris Zones. The nice thing about this is that I can quickly and safely halt a zone - which stops the software and unconfigures the network interface - and if the replacement does not work I can restart the zone - which brings up the interfaces and the software.
Even better, the old servers have a couple of test zones which I can bounce up and down without a care. These give me enormous freedom to test my migration scripts without worrying about breaking things and with a high degree of confidence that my tests are very similar to the real thing.
Testability gives you confidence, and confidence gives you productivity.
Before I started setting up our new recursive DNS servers, I ran
zoneadm -z testdns* halt
on the old servers so that I could
use the testdns addresses for developing and testing
our keepalived setup.
So I had the testdns zones in reserve for developing and testing the
rollout/backout scripts.
The authoritative and recursive parts of the new setup are quite different, so they require different rollout plans.
On the authoritative side we will have a virtual machine for each service address. I have not designed the new authoritative servers for any server-level or network-level high availability, since the DNS protocol should be able to cope well enough. This is similar in principle to our existing Solaris Zones setup. The vague rollout plan is to set up new authdns servers on standby addresses, then renumber them to take over from the old servers. This article is not about the authdns rollout plan.
On the recursive side, there are four physical servers any of which can host any of the recdns or testdns addresses, managed by keepalived. The vague rollout plan is to disable a zone on the old servers then enable its service address on the keepalived cluster.
So far I have been using Ansible in a simple way as a configuration management system, treating it as a fairly declarative language for stating what the configuration of my servers should be, and then being able to run the playbooks to find out and/or fix where reality differs from intention.
But Ansible can also do orchestration: scripting a co-ordinated sequence of actions across disparate sets of servers. Just what I need for my rollout plans!
The first thing I needed was a good way to drive zoneadm from Ansible. I have found that using Ansible as a glorified shell script driver is pretty unsatisfactory, because its shell and command modules are too general to provide proper support for its idempotence and check-mode features. Rather than messing around with shell commands, it is much more satisfactory (in terms of reward/effort) to write a custom module.
My zoneadm
module does the bare minimum: it runs zoneadm list -pi
to get the current state of the machine's zones, checks if the target
state matches the current state, and if not it runs zoneadm boot
or
zoneadm halt
as required. It can only handle zone states that are
"installed" or "running". 60 lines of uncomplicated Python, nice.
After I had a good way to wrangle zoned
it was time to do a quick
hack to see if a trial rollout would work. I wrote the following
playbook which does three things: move the testdns1 zone from running
to installed, change the Ansible configuration to enable testdns1 on
the keepalived cluster, then push the new keepalived configuration to
the cluster.
--- - hosts: helen2.csi.cam.ac.uk tasks: - zoneadm: name=testdns1 state=installed - hosts: localhost tasks: - command: bin/vrrp_toggle rollout testdns1 - hosts: rec roles: - keepalived
This is quick and dirty, hardcoded all the way, except for the vrrp_toggle
command which is the main reality check.
The vrrp_toggle
script just changes the value of an Ansible variable
called vrrp_enable
which lists which VRRP instances should be included
in the keepalived configuration. The keepalived configuration is
generated from a Jinja2 template, and each vrrp_instance
(testdns1
etc.) is emitted if the instance name is not commented out of the
vrrp_enable
list.
Fail.
Ansible does not re-read variables if you change them in the middle of a playbook like this. Good. That is the right thing to do.
The other way in which this playbook is stupid is there are actually 8 of them: 2 recdns plus 2 testdns, rollout and backout. Writing them individually is begging for typos; repeated code that is similar but systematically different is one of the most common ways to introduce bugs.
So the right thing to do is tweak the variable then run the playbook.
And note the vrrp_toggle
command arguments describe almost everything
you need to know to generate the playbook! (The only thing missing is
the mapping from instance name (like testdns1) to parent host (like
helen2).
So I changed the vrrp_toggle
script into a rec-rollout / rec-backout
script, which tweaks the vrrp_enable
variable and generates the
appropriate playbook. The playbook consists of just two tasks, whose
order depends on whether we are doing rollout or backout, and which
have a few straightforward place-holder substitutions.
The nice thing about this kind of templating is that if you screw it up (like I did at first), usually a large proportion of the cases fail, probably including your test cases; whereas with clone-and-hack there will be a nasty surprise in a case you didn't test.
In the playbook I quoted above I am using my keepalived role, so I can be absolutely sure that my rollout/backout plan remains consistent with my configuration management setup. Nice!
However the keepalived role does several configuration tasks, most of which are not necessary in this situation. In fact all I need to do is copy across the templated configuration file and tell keepalived to reload it if the file has changed.
Ansible tags are for just this kind of optimization. I added a line to my keepalived.conf task:
tags: quick
Only one task needed tagging because the keepalived.conf task has a
handler to tell keepalived to reload its configuration when that
changes, which is the other important action. So now I can run my
rollout/backout playbooks with a --tags quick
argument, so only the
quick tasks (and if necessary their handlers) are run.
Once I had got all that working, I was able to easily flip testdns0 and testdns1 back and forth between the old and new setups. Each switchover takes about ten seconds, which is not bad - it is less than a typical DNS lookup timeout.
There are a couple more improvements to make before I do the rollout
for real. I should improve the
molly guard
to make better use of ansible-playbook --check
. And I should
pre-populate the new servers' caches with the Alexa Top 1,000,000 list
to reduce post-rollout latency. (If you have a similar UK-centric
popular domains list, please tell me so I can feed that to the servers
as well!)
There is still some final cleanup and robustifying to do, and checks to make sure I haven't missed anything. And I have to work out the exact process I will follow to put the new system into live service with minimum risk and disruption. But the end is tantalizingly within reach!
In the last couple of weeks I have also got several small patches into BIND.
Jan 7: documentation for named -L
This was a follow-up to a patch I submitted in April last year. The named -L option specifies a log file to use at startup for recording the BIND version banners and other startup information. Previously this information would always go to syslog regardless of your logging configuration.
This feature will be in BIND 9.11.
Jan 8: typo in comment
Trivial :-)
Jan 12: teach nsdiff to AXFR from non-standard ports
Not a BIND patch, but one of my own companion utilities. Our
managed zone service runs a name server on a non-standard port,
and our new setup will use nsdiff | nsupdate
to implement
bump-in-the-wire signing for the MZS.
Jan 13: document default DNSKEY TTL
Took me a while to work out where that value came from. Submitted on Jan 4. Included in 9.10 ARM.
Jan 13: automatically tune max-journal-size
Our old DNS build scripts have a couple of mechanisms for tuning BIND's max-journal-size setting. By default a zone's incremental update journal will grow without bound, which is not helpful. Having to set the parameter by hand is annoying, especially since it should be simple to automatically tune the limit based on the size of the zone.
Rather than re-implementing some annoying plumbing for yet another setting, I thought I would try to automate it away. I have submitted this patch as RT#38324. In response I was told there is also RT#36279 which sounds like a request for this feature, and RT#25274 which sounds like another implementation of my patch. Based on the ticket number it dates from 2011.
I hope this gets into 9.11, or something like it. I suppose that rather than maintaining this patch I could do something equivalent in my build scripts...
Edited to add: this was eventually committed upstream for BIND 9.12.
Jan 14: doc: ignore and clean up isc-notes-html.xsl
I found some cruft in a supposedly-clean source tree.
This one actually got committed under my name, which I think is a first for me and BIND :-) (RT#38330)
Jan 14: close new zone file before renaming, for win32 compatibility
Jan 14: use a safe temporary new zone file name
These two arose from a problem report on the bind-users list. The conversation moved to private mail which I find a bit annoying - I tend to think it is more helpful for other users if problems are fixed in public.
But it turned out that BIND's error logging in this area is basically negligible, even when you turn on debug logging :-( But the Windows Process Explorer is able to monitor filesystem events, and it reported a 'SHARING VIOLATION' and 'NAME NOT FOUND'. This gave me the clue that it was a POSIX vs Windows portability bug.
So in the end this problem was more interesting than I expected.
Jan 16: critical: ratelimiter.c:151: REQUIRE(ev->ev_sender == ((void *)0)) failed
My build scripts are designed so that Ansible sets up the name
servers with a static configuration which contains everything
except for the zone {}
clauses. The zone configuration is
provisioned by the dynamic reconfiguration scripts. Ansible runs
are triggered manually; dynamic reconfiguration runs from
cron.
I discovered a number of problems with bootstrapping from a bare server with no zones to a fully-populated server with all the zones and their contents on the new hidden master.
The process is basically,
if there are any missing master files, initialise them as minimal zone files
write zone configuration file and run rndc reconfig
run nsdiff | nsupdate
for every zone to fill them with the
correct contents
When bootstrapping, the master server would load 123 new zones,
then shortly after the nsdiff | nsupdate
process started, named
crashed with the assertion failure quoted above.
Mark Andrews replied overnight with the linked patch (he lives in Australia) which fixed the problem. Yay!
nsdiff
is not very clever about the order in which it emits
changes; in particular it does not ensure that hostnames exist
before any NS or MX or SRV records are created to point to them. You
can turn off most of the integrity checks, but not the NS record
checks.
This causes trouble for us when bootstrapping the cam.ac.uk zone, which is the only zone we have with in-zone NS records. It also has lots of delegations which can also trip the checks.
My solution is to create a special bootstrap version of the zone,
which contains the apex and delegation records (which are built from
configuration stored in git) but not the bulk of the zone contents
from the IP Register database. The zone can then be succesfully loaded
in two stages, first nsdiff cam.ac.uk DB.bootstrap | nsupdate -l
then nsdiff cam.ac.uk zones/cam.ac.uk | nsupdate -l
.
Bootstrapping isn't something I expect to do very often, but I want to be sure it is easy to rebuild all the servers from scratch, including the hidden master, in case of major OS upgrades, VM vs hardware changes, disasters, etc.
No more special snowflake servers!
]]>testdns0.csi.cam.ac.uk
and testdns1.csi.cam.ac.uk
. I
am quite pleased with the way it works.
It was difficult to get started because keepalived's documentation is TERRIBLE. More effort has been spent explaining how it is put together than explaining how to get it to work. The keepalived.conf man page is a barely-commented example configuration file which does not describe all the options. Some of the options are only mentioned in the examples in /usr/share/doc/keepalived/samples. Bah!
Edited to add: Oh good grief. I have found the keepalived configuration documentation hidden in keepalived.conf.SYNOPSIS.
The vital clue came from Graeme Fowler who told me about keepalived's vrrp_script feature which is "documented" in keepalived.conf.vrrp.localcheck which I never would have found without Graeme's help.
Keepalived is designed to run on a pair of load-balancing routers in front of a cluster of servers. It has two main parts. Its Linux Virtual Server daemon runs health checks on the back-end servers and configures the kernel's load balancing router as appropriate. The LVS stuff handles failover of the back-end servers. The other part of keepalived is its VRRP daemon which handles failover of the load-balancing routers themselves.
My DNS servers do not need the LVS load-balancing stuff, but they do
need some kind of health check for named
. I am running keepalived in
VRRP-only mode and using its vrrp_script feature for health checks.
There is an SMTP client in keepalived which can notify you of state changes. It is too noisy for me, because I get messages from every server when anything changes. You can also tell keepalived to run scripts on state changes, so I am using that for notifications.
All my servers are configured as VRRP BACKUPs, and there is no MASTER. According to the VRRP RFC, the master is supposed to be the machine which owns the IP addresses. In my setup, no particular machine owns the service addresses.
I am using authentication mainly for additional protection against screwups (e.g. VRID collisions). VRRP password authentication doesn't provide any security: any attacker has to be on the local link so they can just sniff the password off the wire.
I am slightly surprised that it works when I set both IPv4 and IPv6 addresses on the same VRRP instance. The VRRP spec says you have to have separate vrouters for IPv4 and IPv6. Perhaps it works because keepalived doesn't implement real VRRP by default: it does not use a virtual MAC address but instead it just moves the virtual IP addresses and sends gratuitous ARPs to update the switches' forwarding tables. Keepalived has a use_vmac option but it seems rather fiddly to get working, so I am sticking with the default.
vrrp_instance testdns0 { virtual_router_id 210 interface em1 state BACKUP priority 50 notify /etc/keepalived/notify authentication { auth_type PASS auth_pass XXXXXXXX } virtual_ipaddress { 131.111.8.119/23 2001:630:212:8::d:fff0 } track_script { named_check_testdns0_1 named_check_testdns0_2 named_check_testdns0_3 named_check_testdns0_4 } }
My notification script sends email when a server enters the MASTER
state and takes over the IP addresses. It also sends email if the
server dropped into the BACKUP state because named
crashed.
#!/bin/sh # this is /etc/keepalived/notify instance=$2 state=$3 case $state in (BACKUP) # do not notify if this server is working if /etc/keepalived/named_ok then exit 0 else state=DEAD fi esac exim -t <<EOF To: hostmaster@cam.ac.uk Subject: $instance $state on $(hostname) EOF
In the vrrp_instance
snippet above, you can see that it specifies
four vrrp_script
s to track. There is one vrrp_script
for each
possible priority, so that the four servers can have four different
priorities for each vrrp_instance
.
Each vrrp_script
is specified using the
Jinja macro
below. (Four different vrrp_script
s for each of four different
vrrp_instance
s is a lot of repetition!) The type argument is "recdns"
or "testdns", the num is 0 or 1, and the prio is a number from 1 to
4.
Each script is run every "interval" seconds, and is allowed to run for up to "timeout" seconds. (My checking script should take at most 1 second.)
A positive "weight" setting is added to the vrrp_instance's priority to increse it when the script succeeds. (If the weight is negative it is added to the priority to decrease it when the script fails.)
{%- macro named_check(type,num,prio) -%} vrrp_script named_check_{{type}}{{num}}_{{prio}} { script "/etc/keepalived/named_check {{type}} {{num}} {{prio}}" interval 1 timeout 2 weight {{ prio * 50 }} } {%- endmacro -%}
When keepalived runs the four tracking scripts for a vrrp_instance
on one of my servers, at most one of the scripts will succeed. The
priority is therefore adjusted to 250 for the server that should be
live, 200 for its main backup, 150 and 100 on the other servers, and
50 on any server which is broken or out of service.
The checking script finds the position of the host on which it is
running in a configuration file which lists the servers in priority
order. A server can be commented out to remove it from service. The
priority order for testdns1 is the opposite of the order for testdns0.
So the following contents of /etc/keepalived/priority.testdns
specifies that testdns1 is running on recdns-cnh, testdns0 is on
recdns-wcdc, recdns-rnb is disabled, and recdns-sby is a backup.
recdns-cnh #recdns-rnb recdns-sby recdns-wcdc
I can update this prioriy configuration file to change which machines are in service, without having to restart or reconfigure keepalived.
The health check script is:
#!/bin/sh set -e type=$1 num=$2 check=$3 # Look for the position of our hostname in the priority listing name=$(hostname --short) # -F = fixed string not regex # -x = match whole line # -n = print line number # A commented-out line will not match, so grep will fail # and set -e will make the whole script fail. grepout=$(grep -Fxn $name /etc/keepalived/priority.$type) # Strip off everything but the line number. Do this separately # so that grep's exit status is not lost in the pipeline. prio=$(echo $grepout | sed 's/:.*//') # for num=0 later is higher priority # for num=1 later is lower priority if [ $num = 1 ] then prio=$((5 - $prio)) fi # If our priority matches what keepalived is asking about, then our # exit status depends on whether named is running, otherwise tell # keepalived we are not running at the priority it is checking. [ $check = $prio ] && /etc/keepalived/named_ok
The named_ok script just uses dig
to verify that the
server seems to be working OK. I originally queried for version.bind,
but there are very strict rate limits on the server info view so it
did not work very well! So now the script checks that this command
produces the expected output:
dig @localhost +time=1 +tries=1 +short cam.ac.uk in txt]]>
The current setup which I am replacing uses Solaris Zones (like FreeBSD Jails or Linux Containers) to host the various name server instances on three physical boxes. The new setup will use Ubuntu virtual machines on our shared VM service (should I call it a "private cloud"?) for the authoritative servers. I am making a couple of changes to the authoritative setup: changing to a hidden master, and eliminating differences in which zones are served by each server.
I have obtained dedicated hardware for the recursive servers. Our main concern is that they should be able to boot and work with no dependencies on other services beyond power and networking, because basically all the other services rely on the recursive DNS servers. The machines are Dell R320s, each with one Xeon E5-2420 (6 hyperthreaded cores, 2.2GHz), 32 GB RAM, and a Dell-branded Intel 160GB SSD.
The most important change to the recursive DNS service will be automatic failover. Whenever I need to loosen my bowels I just contemplate dealing with a failure of one of the current elderly machines, which involves a lengthy and delicate manual playbook described on our wiki...
Often when I mention DNS and failover, the immediate response is "Anycast?". We will not be doing anycast on the new servers, though that may change in the future. My current plan is to do failover with VRRP using keepalived. (Several people have told me they are successfully using keepalived, though its documentation is shockingly bad. I would like to know of any better alternatives.) There are a number of reasons for using VRRP rather than anycast:
The recursive DNS server addresses are 131.111.8.42 (aka recdns0) and 131.111.12.20 (aka recdns1). (They have IPv6 addresses too.) They are on different subnets which are actually VLANs on the same physical network. It is not feasible to change these addresses.
The 8 and 12 subnets are our general server subnets, used for a large proportion of our services, most of which use the recdns servers. So anycasting recdns[01] requires punching holes in the server network routing.
The server network routers do not provide proxy ARP and my colleagues in network systems do not want to change this. But our Cisco routers can't punch a /32 anycast hole in the server subnets without proxy ARP. So if we did do anycast we would also have to do VRRP to support failover for recdns clients on the server subnets.
The server network spans four sites, connected via our own city-wide fibre network. The sites are linked at layer 2: the same Ethernet VLANs are present at all four sites. So VRRP failover gives us pretty good resilience in the face of server, rack, or site failures.
VRRP will be a massive improvement over our current setup, and it should provide us a lot of the robustness that other places would normally need anycast for, but with significantly less complexity. And less complexity means less time before I can take the old machines out of service.
After the new setup is in place, it might make sense for us to revisit anycast. For instance, we could put recursive servers at other points of presence where our server network does not reach (e.g. the Addenbrooke's medical research site). But in practice there are not many situations when our server network is unreachable but the rest of the University data network is functioning, so it might not be worth it.
The old machines are special snowflake servers. The new setup is being managed by Ansible.
I first used Ansible in 2013 to set up the DHCP servers that were a
crucial part of the network renumbering we did when moving our main
office from the city centre to the West Cambridge site. I liked how
easy it was to get started with Ansible. The way its --check
mode
prints a diff of remote config file changes is a killer feature for
me. And it uses ssh rather than rolling its own crypto and host
authentication like some other config management software.
I spent a lot of December working through the configuration of the new servers, starting with the hidden master and an authoritative server (a staging server which is a clone of the future live servers). It felt like quite a lot of elapsed time without much visible progress, though I was steadily knocking items off the list of things to get working.
The best bit was the last day before the xmas break. The new recdns hardware arrived on Monday 22nd, so I spent Tuesday racking them up and getting them running.
My Ansible setup already included most of the special cases required for the recdns servers, so I just uncommented their hostnames in the inventory file and told Ansible to run the playbook. It pretty much Just Worked, which was extremely pleasing :-) All that steady work paid off big time.
The main part of the recdns config which did not work was the network interface configuration, which was OK because I didn't expect it to work without fiddling.
The recdns servers are plugged into switch ports which present subnet 8 untagged (mainly to support initial bootstrap without requiring special setup of the machine's BIOS), and subnet 12 with VLAN tags (VLAN number 812). Each server has its own IPv4 and IPv6 addresses on subnet 8 and subnet 12.
The service addresses recdns0 (subnet 8) and recdns1 (subnet 12) will be additional (virtual) addresses which can be brought up on any of the four servers. They will usually be configured something like:
And in case of multi-site failures, the recdns1 servers will act as additional backups for the recdns0 servers and vice versa.
There were two problems with my initial untested configuration.
The known problem was that I was likely to need policy routing, to ensure that packets with a subnet 12 source address were sent out with VLAN 812 tags. This turned out to be true for IPv4, whereas IPv6 does the Right Thing by default.
The unknown problem was that the VLAN 812 interface came up only
half-configured: it was using SLAAC for IPv6 instead of the static
address that I specified. This took a while to debug. The clue to the
solution came from running ifup
with the -v
flag to
get it to print out what it was doing:
# ip link delete em1.812 # ifup -v em1.812
This showed that interface configuration was failing when it tried to set up the default route on that interface. Because there can be only one default route, and there was already one on the main subnet 8 interface. D'oh!
Having got ifup
to run to completion I was able to verify that the
subnet 12 routing worked for IPv6 but not for IPv4, pretty much as
expected. With advice from my colleagues David McBride and Anton
Altaparmakov I added the necessary runes to the configuration.
My final /etc/network/interfaces
files on the recdns servers are
generated from a Jinja template you can see in the ipreg
Ansible
repository.
Edited to add:
The original minimal policy routing configuration turned out not to work sometimes, depending on which of the routers were active and how ECMP split traffic. Eventually, after a number of rounds of reachability bug fixes, I extended the configutation to repeat the policy routing setup mutatis mutandis for both IPv4 and IPv6 on both subnet 8 and subnet 12.
]]>Ironically, to do this I have ended up spending lots of time working with SCCS and RCS, rather than Git. This was mainly developing analysis and conversion tools to get things into a fit state for Git.
If you find yourself in a similar situation, you might find these tools helpful.
Cambridge was allocated three Class B networks in the 1980s: first the Computer Lab got 128.232.0.0/16 in 1987; then the Department of Engineering got 129.169.0.0/16 in 1988; and eventually the Computing Service got 131.111.0.0/16 in 1989 for the University (and related institutions) as a whole.
The oldest records I have found date from September 1990, which list about 300 registrations. The next two departments to get connected were the Statistical Laboratory and Molecular Biology (I can't say in which order). The Statslab was allocated 131.111.20.0/24, which it has kept for 24 years! Things pick up in 1991, when the JANET IP Service was started and rapidly took over to replace X.25. (Last month I blogged about connectivity for Astronomy in Cambridge in 1991.)
I have found these historical nuggets in our ip-register
directory
tree. This contains the infrastructure and history of IP address and
DNS registration in Cambridge going back a quarter century. But it
isn't just an archive: it is a working system which has been in
production that long. Because of this, converting the directory tree
to Git presents certain challenges.
The ip-register
directory tree contains a mixture of:
My aim was to preserve this all as faithfully as I could, while converting it to Git in a way that represents the history in a useful manner.
The rough strategy was:
Take a copy of the ip-register
directory tree, preserving
modification times. (There is no need to preserve owners because
any useful ownership information was lost when the directory tree
moved off the Central Unix Service before that shut down in 2008.)
Convert from SCCS to RCS file-by-file. Converting between these formats is a simple one-to-one mapping.
Files without SCCS history will have very short artificial RCS histories created from their modification times and editor backup files.
Convert the RCS tree to CVS. This is basically just moving files around, because a CVS repository is little more than a directory tree of RCS files.
Convert the CVS repository to Git using git cvsimport. This is the only phase that needs to do cross-file history analysis, and other people have already produced a satisfactory solution.
Simples! ... Not.
I first tried ESR's sccs2rcs Python script. Unfortunately I rapidly ran into a number of showstoppers.
I fixed a bug or two but very soon concluded the program was entirely the wrong shape.
(In the end, the Solaris incompatibility became moot when I installed GNU CSSC on my FreeBSD workstation to do the conversion. But the other problems with sccs2rcs remained.)
So I wrote a small script called
sccs2rcs1
which just converts one SCCS file to one RCS file, and gives you
control over where the RCS and temporary files are placed. This meant
that I would not have to shuffle RCS files around: I could just create
them directly in the target CVS repository. Also, sccs2rcs1
uses RCS
options to avoid the need to fiddle with checkout locks, which is a
significant simplification.
The main regression compared to sccs2rcs
is that sccs2rcs1
does not
support branches, because I didn't have any files with branches.
At this point I needed to work out how I was going to co-ordinate the
invocations of sccs2rcs1
to convert the whole tree. What was in
there?!
I wrote a fairly quick-and-dirty script called sccscheck which analyses a directory tree and prints out notes on various features and anomalies. A significant proportion of the code exists to work out the relationship between working files, backup files, and SCCS files.
I could then start work on determining what fix-ups were necessary before the SCCS-to-CVS conversion.
One notable part of the ip-register
directory tree was the archive
subdirectory, which contained lots of gzipped SCCS files with date
stamps. What relationship did they have to each other? My first guess
was that they might be successive snapshots of a growing history, and
that the corresponding SCCS files in the working part of the tree
would contain the whole history.
I wrote sccsprefix to verify if one SCCS file is a prefix of another, i.e. that it records the same history up to a certain point.
This proved that the files were NOT snapshots! In fact, the working SCCS files had been periodically moved to the archive, and new working SCCS files started from scratch. I guess this was to cope with the files getting uncomfortably large and slow for 1990s hardware.
So to represent the history properly in Git, I needed to combine a series of SCCS files into a linear history. It turns out to be easier to construct commits with artificial metadata (usernames, dates) with RCS than with SCCS, so I wrote rcsappend to add the commits from a newer RCS file as successors of commits in an older file.
Converting the archived SCCS files was then a combination of
sccs2rcs1
and rcsappend
. Unfortunately this was VERY slow, because
RCS takes a long time to check out old revisions. This is because an
RCS file contains a verbatim copy of the latest revision and a series
of diffs going back one revision at a time. The SCCS format is more
clever and so takes about the same time to check out any revision.
So I changed sccs2rcs1
to incorporate an append mode, and used that
to convert and combine the archived SCCS files, as you can see in the
ipreg-archive-uplift
script. This still takes ages to convert and linearize nearly 20,000
revisions in the history of the hosts.131.111
file - an RCS checkin
rewrites the entire RCS file so they get slower as the number of
revisions grows. Fortunately I don't need to run it many times.
There are a lot of files in the ip-register
tree without SCCS
histories, which I wanted to preserve. Many of them have old editor
backup ~ files, which could be used to construct a wee bit of history
(in the absence of anything better). So I wrote
files2rcs
to build an RCS file from this kind of miscellanea.
At this point I need to moan a bit.
Why does RCS object to file names that start with a comma. Why.
I tried running these scripts on my Mac at home. It mostly worked,
except for the directories which contained files like DB.cam
(source
file) and db.cam
(generated file). I added a bit of support in the
scripts to cope with case-insensitive filesystems, so I can use my
Macs for testing. But the bulk conversion runs very slowly, I think
because it generates too much churn in the Spotlight indexes.
One significant problem is dealing with SCCS files whose working
files have been deleted. In some SCCS workflows this is a normal state
of affairs - see for instance the SCCS support in the
POSIX make
XSI extensions.
However, in the ip-register
directory tree this corresponds to files
that are no longer needed. Unfortunately the SCCS history generally
does not record when the file was deleted. It might be possible to
make a plausible guess from manual analysis, but perhaps it is more
truthful to record an artificial revision saying the file was not
present at the time of conversion.
Like SCCS, RCS does not have a way to represent a deleted file. CVS uses a convention on top of RCS: when a file is deleted it puts the RCS file in an "Attic" subdirectory and adds a revision with a "dead" status. The rcsdeadify script applies this convention to an RCS file.
There are situations where it is possible to identify a meaningful
committer and deletion time. Where a .tar.gz
archive exists, it
records the original file owners. The
tar2usermap
script records the file owners from the tar files. The contents can
then be unpacked and converted as if they were part of the main
directory, using the usermap file to provide the correct committer
IDs. After that the files can be marked as deleted at the time the
tarfile was created.
The main conversion script is
sccs2cvs,
which evacuates an SCCS working tree into a CVS repository, leaving
behind a tree of (mostly) empty directories. It is based on a
simplified version of the analysis done by sccscheck
, with more
careful error checking of the commands it invokes. It uses
sccs2rcs1
, files2rcs
, and rcsappend
to handle each file.
The rcsappend
case occurs when there is an editor backup ~ file
which is older than the oldest SCCS revision, in which case sccs2cvs
uses rcsappend
to combine the output of sccs2rcs1
and files2rcs
. This
could be done more efficiently with sccs2rcs1
's append mode, but for
the ip-register
tree it doesn't cause a big slowdown.
To cope with the varying semantics of missing working files,
sccs2rcs
leaves behind a tombstone where it expected to find a working
file. This takes the form of a symlink pointing to 'Attic'. Another
script can then deal with these tombstones as appropriate.
Before sccs2cvs can run, the SCCS working tree should be reasonably clean. So the overall uplift process goes through several phases:
sccs2cvs
;
git cvsimport
or cvs-fast-export | git fast-import
;
For the ip-register
directory tree, the pre-uplift phase also
includes ipreg-archive-uplift
which I described earlier. Then in the
mid-uplift phase the combined histories are moved into the proper
place in the CVS repository so that their history is recorded in the
right place.
Similarly, for the tarballs, the pre-uplift phase unpacks them in place, and moves the tar files aside. Then the mid-uplift phase rcsdeadifies the tree that was inside the tarball.
I have not stuck to my guidelines very strictly: my scripts delete quite a lot of cruft in the pre-uplift phase. In particular, they delete duplicated SCCS history files from the archives, and working files which are generated by scripts.
SCCS/RCS/CVS all record committers by simple user IDs, whereas git
uses names and email addresses. So git-cvsimport
and cvs-fast-export
can be given an authors file containing the translation. The
sccscommitters
script produces a list of user IDs as a starting point for an authors
file.
At first I tried git cvsimport
, since I have successfully used it
before. In this case it turned out not to be the path to swift
enlightenment - it was taking about 3s per commit. This is mainly
because it checks out files from oldest to newest, so it falls foul of
the same performance problem that my rcsappend program did, as I
described above.
So I compiled cvs-fast-export
and fairly soon I had a populated repository: nearly 30,000 commits at
35 commits per second, so about 100 times faster. The
fast-import/export format allows you to provide file contents in any
order, independent of the order they appear in commits. The fastest
way to get the contents of each revision out of an RCS file is from
newest to oldest, so that is what cvs-fast-export
does.
There are a couple of niggles with cvs-fast-export
, so I have a
patch
which fixes them in a fairly dumb manner (without adding command-line
switches to control the behaviour):
cvs-fast-export
replaces empty commit
messages with
" empty log message ",
whereas I want it to leave them empty.
cvs-fast-export
makes a special effort to translate CVS's
ignored file behaviour into git by synthesizing a .gitignore
file
into every commit. This is wrong for the ip-register
tree.
hosts.131.111
file takes a long time, during which
cvs-fast-export
appears to stall. I added a really bad progress
meter to indicate that work was being performed.
Overall this has taken more programming than I expected, and more time, very much following the pattern that the last 10% takes the same time as the first 90%. And I think the initial investigations - before I got stuck in to the conversion work - probably took the same time again.
There is one area where the conversion could perhaps be improved: the archived dumps of various subdirectories have been converted in the location that the tar files were stored. I have not tried to incorporate them as part of the history of the directories from which the tar files were made. On the whole I think combining them, coping with renames and so on, would take too much time for too little benefit. The multiple copies of various ancient scripts are a bit weird, but it is fairly clear from the git history what was going on.
So, let us declare the job DONE, and move on to building new DNS servers!
]]>I found an interesting document from one of the oldest parts of the archive, which provides a good snapshot of academic computer networking in the UK in 1991. It was written by Tony Stonely, aka <ajms@cam.ac.uk>. AJMS is mentioned in RFC 1117 as the contact for Cambridge's IP address allocation. He was my manager when I started work at Cambridge in 2002, though he retired later that year.
The document is an email discussing IP connectivity for Cambridge's Institute of Astronomy. There are a number of abbreviations which might not be familiar...
Edited to correct the expansion of RA and to add Starlink
Connection of IoA/RGO to IP world --------------------------------- This note is a statement of where I believe we have got to and an initial review of the options now open. What we have achieved so far ---------------------------- All the Suns are properly connected at the lower levels to the Cambridge IP network, to the national IP network (JIPS) and to the international IP network (the Internet). This includes all the basic infrastructure such as routing and name service, and allows the Suns to use all the usual native Unix communications facilities (telnet, ftp, rlogin etc) except mail, which is discussed below. Possibly the most valuable end-user function thus delivered is the ability to fetch files directly from the USA. This also provides the basic infrastructure for other machines such as the VMS hosts when they need it. VMS nodes --------- Nothing has yet been done about the VMS nodes. CAMV0 needs its address changing, and both IOA0 and CAMV0 need routing set for extra-site communication. The immediate intention is to route through cast0. This will be transparent to all parties and impose negligible load on cast0, but requires the "doit" bit to be set in cast0's kernel. We understand that PSH is going to do all this [check], but we remain available to assist as required. Further action on the VMS front is stalled pending the arrival of the new release (6.6) of the CMU TCP/IP package. This is so imminent that it seems foolish not to await it, and we believe IoA/RGO agree [check]. Access from Suns to Coloured Book world --------------------------------------- There are basically two options for connecting the Suns to the JANET Coloured Book world. We can either set up one or more of the Suns as full-blown independent JANET hosts or we can set them up to use CS gateway facilities. The former provides the full range of facilities expected of any JANET host, but is cumbersome, takes significant local resources, is complicated and long-winded to arrange, incurs a small licence fee, is platform-specific, and adds significant complexity to the system managers' maintenance and planning load. The latter in contrast is light-weight, free, easy to install, and can be provided for any reasonable Unix host, but limits functionality to outbound pad and file transfer either way initiated from the local (IoA/RGO) end. The two options are not exclusive. We suspect that the latter option ("spad/cpf") will provide adequate functionality and is preferable, but would welcome IoA/RGO opinion. Direct login to the Suns from a (possibly) remote JANET/CUDN terminal would currently require the full Coloured Book package, but the CS will shortly be providing X.29-telnet gateway facilities as part of the general infrastructure, and can in any case provide this functionality indirectly through login accounts on Central Unix facilities. For that matter, AST-STAR or WEST.AST could be used in this fashion. Mail ---- Mail is a complicated and difficult subject, and I believe that a small group of experts from IoA/RGO and the CS should meet to discuss the requirements and options. The rest of this section is merely a fleeting summary of some of the issues. Firstly, a political point must be clarified. At the time of writing it is absolutely forbidden to emit smtp (ie Unix/Internet style) mail into JIPS. This prohibition is national, and none of Cambridge's doing. We expect that the embargo will shortly be lifted somewhat, but there are certain to remain very strict rules about how smtp is to be used. Within Cambridge we are making best guesses as to the likely future rules and adopting those as current working practice. It must be understood however that the situation is highly volatile and that today's decisions may turn out to be wrong. The current rulings are (inter alia) Mail to/from outside Cambridge may only be grey (Ie. JANET style). Mail within Cambridge may be grey or smtp BUT the reply address MUST be valid in BOTH the Internet AND Janet (modulo reversal). Thus a workstation emitting smtp mail must ensure that the reply address contained is that of a current JANET mail host. Except that - Consenting machines in a closed workgroup in Cambridge are permitted to use smtp between themselves, though there is no support from the CS and the practice is discouraged. They must remember not to contravene the previous two rulings, on pain of disconnection. The good news is that a central mail hub/distributer will become available as a network service for the whole University within a few months, and will provide sufficient gateway function that ordinary smtp Unix workstations, with some careful configuration, can have full mail connectivity. In essence the workstation and the distributer will form one of those "closed workgroups", the workstation will send all its outbound mail to the distributer and receive all its inbound mail from the distributer, and the distributer will handle the forwarding to and from the rest of Cambridge, UK and the world. There is no prospect of DECnet mail being supported generally either nationally or within Cambridge, but I imagine Starlink/IoA/RGO will continue to use it for the time being, and whatever gateway function there is now will need preserving. This will have to be largely IoA/RGO's own responsibility, but the planning exercise may have to take account of any further constraints thus imposed. Input from IoA/RGO as to the requirements is needed. In the longer term there will probably be a general UK and worldwide shift to X.400 mail, but that horizon is probably too hazy to rate more than a nod at present. The central mail switch should in any case hide the initial impact from most users. The times are therefore a'changing rather rapidly, and some pragmatism is needed in deciding what to do. If mail to/from the IP machines is not an urgent requirement, and since they will be able to log in to the VMS nodes it may not be, then the best thing may well be to await the mail distributer service. If more direct mail is needed more urgently then we probably need to set up a private mail distributer service within IoA/RGO. This would entail setting up (probably) a Sun as a full JANET host and using it as the one and only (mail) route in or out of IoA/RGO. Something rather similar has been done in Molecular Biology and is thus known to work, but setting it up is no mean task. A further fall-back option might be to arrange to use Central Unix facilities as a mail gateway in similar vein. The less effort spent on interim facilities the better, however. Broken mail ----------- We discovered late in the day that smtp mail was in fact being used between IoA and RA, and the name changing broke this. We regret having thus trodden on existing facilities, and are willing to help try to recover any required functionality, but we believe that IoA/RGO/RA in fact have this in hand. We consider the activity to fall under the third rule above. If help is needed, please let us know. We should also report sideline problem we encountered and which will probably be a continuing cause of grief. CAVAD, and indeed any similar VMS system, emits mail with reply addresses of the form "CAVAD::user"@.... This is quite legal, but the quotes are syntactically significant, and must be returned in any reply. Unfortunately the great majority of Unix systems strip such quotes during emission of mail, so the reply address fails. Such stripping can occur at several levels, notably the sendmail (ie system) processing and the one of the most popular user-level mailers. The CS is fixing its own systems, but the problem is replicated in something like half a million independent Internet hosts, and little can be done about it. Other requirements ------------------ There may well be other requirements that have not been noticed or, perish the thought, we have inadvertently broken. Please let us know of these. Bandwidth improvements ---------------------- At present all IP communications between IoA/RGO and the rest of the world go down a rather slow (64Kb/sec) link. This should improve substantially when it is replaced with a GBN link, and to most of Cambridge the bandwidth will probably become 1-2Mb/sec. For comparison, the basic ethernet bandwidth is 10Mb/sec. The timescale is unclear, but sometime in 1992 is expected. The bandwidth of the national backbone facilities is of the order of 1Mb/sec, but of course this is shared with many institutions in a manner hard to predict or assess. For Computing Service, Tony Stoneley, ajms@cam.cus 29/11/91]]>
can now be slaved from the authdns*.csx servers by hosts within the CUDN, and they have been added to the sample BIND configuration at
https://jackdaw.cam.ac.uk/ipreg/nsconfig/sample.named.conf
Those who feel they must slave enough reverse zones to cover the whole CUDN may want to include them. These zones are not yet signed, but we expect them to be within a week or two.
A number of cosmetic changes to the comments in the sample configuration have also been made, mostly bringing up to date matters like the versions of BIND still being actively supported by ISC.
Those who use an explicit root hints file may want to note that a new version
was issued in early June, adding an IPv6 address to B.ROOT-SERVERS.NET.
The copy at https://jackdaw.cam.ac.uk/ipreg/nsconfig/db.cache
was updated.
service_ops
web page available which allows
authorised users to create service (SRV) records in the DNS for names
in the domains to which they have access. See the service_ops
help
page for more details.
]]>
More information can be found on the IP Register SSHFP documentation page
]]>mac
and dhcp_group
fields
on the single_ops
page, as well as related changes visible via the
table_ops
page.
These were intended in the first instance for maintaining a DHCP service for internal use in the UCS. It was perhaps unwise of us to make them visible outside and raise users' expectations prematurely. It remains a work in progress, and we have had to make changes of detail that affected some of those who had set these fields. The notes here describe the current state.
Although the single_ops
page doesn't make this obvious, the mac
and dhcp_group
fields are properties of the IP address rather than
the box. If a box or vbox has multiple IP addresses, each one can have
its own values for them. The fields are cleared automatically when the
IP address is rescinded.
MAC addresses can be entered in any of the usual formats but are
displayed as colon-separated. Because the intent is to support DHCP
servers, MAC addresses (if set) are required to be unique within any
particular mzone
/lan
combination. A non-null dhcp_group
value is
intended to indicate non-default DHCP options. To support automated
processing, it must correspond to a registered dhcp_group
object for
the given mzone
/lan
which can be created, modified or deleted via
table_ops
. The values should contain only alphanumeric, hyphen and
underline characters.
The degree to which any of this is of use to users outside the UIS is currently very limited. We do intend to add more usability features, though.
]]>There are now four "anames" used instead of three:
janet-filter.net.private.cam.ac.uk
for exceptions at the CUDN border routers, often permitting some
network traffic that would otherwise be blocked. This is
essentially the same as the old janet-acl.net.private.cam.ac.uk
which is temporarily an alias.
cudn-filter.net.private.cam.ac.uk
for exceptions at internal CUDN routers. This includes the old
high-numbered port blocking, where it is still in use, but also
many other sorts of exception which were previously not
represented. The old name cudn-acl.net.private.cam.ac.uk
is temporarily an alias.
cudn-blocklist.net.private.cam.ac.uk
for addresses for which all IP traffic is completely blocked,
usually as the result of a security incident. This is essentially
the same as the old block-list.net.private.cam.ac.uk
which is
temporarily an alias.
cudn-config.net.private.cam.ac.uk
for addresses that are referred to in the CUDN routing
infrastructure. This is completely new.
Both IPv4 and IPv6 addresses may appear in these lists (although at
the moment only cudn-config
has any IPv6 addresses).
Requests for the creation or removal of network access control
exceptions, or explanations of existing ones, should in most cases be
sent to network-support@uis.cam.ac.uk in the first instance, who
will redirect them if necessary. However, the CERT team at
cert@cam.ac.uk are solely responsible for the cudn-blocklist
contents in particular.
In particular, the name in the answer section of a response may now have a different case from that in the question section (which will always be identical to that in the original query). Previously they would (after decompression) have been identical. Resolvers are meant to use case- insensitive comparisons themselves, but this change could expose non- conformance in this area.
However, experiments we have performed so far, and information from the DNS community at large, suggests that such non-conformance is quite rare. We are therefore planning to upgrade the CUDN central nameservers (both authoritative and recursive) to BIND 9.9.5 over the next few days. Please keep an eye out for any problems that might be caused by the change, and let us (hostmaster at ucs.cam.ac.uk) know as soon as possible, while we still have the option of backing off.
]]>As regards DNSSEC validation, cl.cam.ac.uk now has a chain of trust from the root zone. We expect that 232.128.in-addr.arpa will also have one before long.
This last happened earlier in January. That made it sensible to sign the "consolidated reverse zone" in-addr.arpa.cam.ac.uk which provides reverse lookup results for IPv4 addresses in the range 128.232.[128-255].x. This has now been done, and the results of such reverse lookup can be fully validated using chains of trust from the root zone.
There is more information at
https://jackdaw.cam.ac.uk/ipreg/nsconfig/dnssec-signed.html
Moved to https://www.dns.cam.ac.uk/domains/signed.html
https://jackdaw.cam.ac.uk/ipreg/nsconfig/consolidated-reverse-zones.html
Moved to https://www.dns.cam.ac.uk/domains/reverse/
which have been brought up to date.
]]>https://jackdaw.cam.ac.uk/ipreg/nsconfig/sample.named.conf
that can be
slaved stealthily within the CUDN. Some of the commentary in that file has
also been brought up to date.
The main news is that the zones
are now all signed. They are therefore much larger than before, and have larger and more frequent incremental updates. Those who are slaving them may need to be aware of that.
As regards DNSSEC validation, cl.cam.ac.uk now has a chain of trust from the root zone. We expect that 232.128.in-addr.arpa will also have one before long. The IP reverse zone has DS (delegation signer) records in 1.2.0.0.3.6.0.1.0.0.2.ip6.arpa, but that itself can be validated only via the dlv.isc.org lookaside zone, as JANET have not yet signed its parent zone 0.3.6.0.1.0.0.2.ip6.arpa (despite an 18-month-old promise on their part).
]]>xlist_ops
page that provides more general list
operations on the IP registration database than does the list_ops
page. In particular it allows downloads of lists of boxes, vboxes,
cnames or anames, and uploads to perform bulk operations on multihomed
boxes, vboxes, cnames or (for registrars only) anames. See the
xlist_ops
help page for details.
The opportunity has been taken to make a number of other small
modifications. The order of links in the standard page header has been
altered, and multihome_ops
has been relegated to a link from the
preferred box_ops
page.
It will soon be the second anniversary of the date on which the root zone was signed, 15 July 2010. By now, everyone seriously into the business of DNSSEC validation should be using a trust anchor for the root zone, whether or not they also use lookaside validation via a trust anchor for dlv.isc.org. The latter facility was always meant to be an aid to early deployment of DNSSEC, not a permanent solution. While it remains useful to cover the many unsigned gaps in the tree of DNS zones, it no longer seems appropriate to have dlv.isc.org entries for DNS zones that can be validated via a chain of trust from the root zone.
Therefore, on or about 15 July 2012, we shall be dropping the entries
for the two zones cam.ac.uk
and 111.131.in-addr.arpa
from the
dlv.isc.org zone, as these have now have had chains of trust from the
root zone for well over a year. We will be retaining the entries for
a number of our signed reverse zones whose parent zones are not yet
signed - for details see
https://jackdaw.cam.ac.uk/ipreg/nsconfig/dnssec-signed.html]]>
https://jackdaw.cam.ac.uk/ipreg/nsconfig/sample.named.conf
The old block has been dropped from the definition of the "camnets" ACL.
The reverse zone 0.0.2.0.0.3.6.0.1.0.0.2.ip6.arpa has been dropped from the list of those which may be slaved.
We advise you to make the corresponding changes in your nameserver configurations if they are relevant.
]]>http://www.isc.org/software/bind/advisories/cve-2011-4313
It concerns a bug, thought to be a remotely exploitable, that crashes recursive nameservers, and they have provided new BIND versions (9.4-ESV-R5-P1, 9.6-ESV-R5-P1, 9.7.4-P1, 9.8.1-P1) which are proof against crashing from this cause, although the precise sequence of events that leads to it remains obscure.
Although we are not aware of any local nameservers that have been affected by this problem, several other sites have been badly affected in the last 24 hours.
The CUDN central recursive nameservers at 131.111.8.42 & 131.111.12.20 are now running BIND 9.8.1-P1.
]]>Old IPv6 address New IPv6 address authdns0.csx 2001:630:200:8080::d:a0 2001:630:212:8::d:a0 authdns1.csx 2001:630:200:8120::d:a1 2001:630:212:12::d:a1 recdns0.csx 2001:630:200:8080::d:0 2001:630:212:8::d:0 recdns1.csx 2001:630:200:8120::d:1 2001:630:212:12::d:1
The new addresses are working now, and the old addresses will continue to work as well until Monday 5 September, when they will be removed. If you are using them (e.g. in nameserver or stub resolver configuration files) you should switch to the new addresses (or the IPv4 ones) before then.
The comments in the sample configuration file
https://jackdaw.cam.ac.uk/ipreg/nsconfig/sample.named.conf
about using IPv6 addresses to address the nameservers have been modified appropriately.
]]>http://www.isc.org/software/bind/advisories/cve-2011-2464
This affects most still-supported versions of BIND. A suitably crafted UPDATE packet can trigger an assertion failure. Apparently not yet seen in the wild...
http://www.isc.org/software/bind/advisories/cve-2011-2465
This affects only users of Response Policy Zones in 9.8.x.
Fixed versions are 9.6-ESV-R4-P3, 9.7.3-P3 and 9.8.0-P4.
]]>As immediate consequences, the following changes have been made to
http://jackdaw.cam.ac.uk/ipreg/nsconfig/sample.named.conf
:
The "camnets" ACL has 2001:630:210::/44 added to it.
The reverse zone "1.2.0.0.3.6.0.1.0.0.2.ip6.arpa" is listed as available for (stealth) slaving.
Of course, the reverse zone has nothing significant in it yet! But if you are slaving the existing IPv6 reverse zone, you should probably start slaving the new one as well.
There will of course be other changes during the transition that may affect local nameserver administrators. In particular the IPv6 addresses
of the CUDN central authoritative and recursive nameservers will change at some point: this list will be informed before that happens.
A few minor issues while I have your attention:
The zone amtp.cam.ac.uk (old name for damtp.cam.ac.uk) is no longer delegated, and is about to vanish entirely. If you are still slaving it even after the message here on 9 March, now is the time to stop.
There has been another small change to the official root hints
file ftp://ftp.internic.net/domain/named.cache, and the copy
at http://jackdaw.cam.ac.uk/ipreg/nsconfig/db.cache
has been
updated accordingly. The change is the addition of an IPv6
address for d.root-servers.net, and rather appropriately it
was made on "IPv6 day".
My description of the BIND vulnerability CVE-2011-1910 was defective in two directions:
It isn't necessary to have DNSSEC validation turned on to be vulnerable to it.
On the other hand, only moderately recent versions of BIND are vulnerable: old enough ones are not.
The information at
http://www.isc.org/software/bind/advisories/cve-2011-1910
about which versions are affected is accurate (bearing in mind that some OS vendors make their own changes without altering the version number). If you are compiling from source, I can advise you on the code fragment to look for.
http://www.isc.org/software/bind/advisories/cve-2011-1910
and several fixed BIND versions are now available (9.4-ESV-R4-P1, 9.6-ESV-R4-P1, 9.7.3-P1, 9.8.0-P2).
This bug can only be exercised if DNSSEC validation is turned on, but that is increasingly becoming the default setup these days.
]]>box_ops
page which can be used as an alternative to
the multihome_ops
page to manipulate the registrations of hosts
("boxes" in the terminology of the IP registration database) with more
than one IP address.
Its functions and display are simpler than those of multihome_ops
and more in line with those of the other web pages. Unlike
multihome_ops
it supports the addition or removal of IPv6 addresses
(if any are assigned to the user's management zones) as well as IPv4
ones. However, it is lacking some of the facilities available with
multihome_ops
such as: using wildcards with display
, selecting by
address, and displaying detailed properties of the associated IP
address objects.
We hope to add at least some of these facilities to box_ops
(and to
other pages, such as vbox_ops
) in due course, and to eliminate the
necessity to keep mutihome_ops
in its current form. The main reason
for releasing box_ops
now in this somewhat undeveloped state is its
support for IPv6 addresses.
http://jackdaw.cam.ac.uk/ipreg/nsconfig/sample.named.conf
The following zones, which were not previously delegated, have been added:
The following zone, which is being phased out, has been removed:
There are no other changes.
]]>
- If all goes well, during the first week in March we will get the delegations of the 32 zones replaced by DNAMEs in the parent zone 232.128.in-addr.arpa.
with thanks to the Computer Lab hostmaster for his co-operation. We have no reports of any problems at this stage.
The sample nameserver configuration
https://jackdaw.cam.ac.uk/ipreg/nsconfig/sample.named.conf
has been updated to remove the 32 zones [224-255].232.128.in-addr.arpa from the list that may be slaved. (Apart from some modifications to the comments before "in-addr.arpa.cam.ac.uk", that is the only change.)
If you are slaving any or all of these 32 reverse zones, you should stop doing so now. Sometime next week we will start logging such slaving activity, and alert the administrators of any hosts involved.
The target date for step (3), the complete removal of these 32 reverse zones, remains Monday 14 March.
]]>
- On Monday 21 February we will replace the the 32 zones [224-255].232.128.in-addr.arpa by versions using DNAMEs that indirect into in-addr.arpa.cam.ac.uk.
on Monday as planned, but had to back off for 12 out of the 32 zones (those covering PWF subnets) because of a problem with a local script used in the PWF re-imaging process. This has now been fixed, and all 32 zones are using indirecting DNAMEs again.
At present we do not think that this delay will significantly affect the schedule for steps (2) and (3). If you are experiencing any problems which you think might be related to these changes, please contact hostmaster at ucs.cam.ac.uk as soon as possible.
]]>in-addr.arpa.cam.ac.uk
, described
here last November, to include 128.232.[224-255].x.
The web page
http://jackdaw.cam.ac.uk/ipreg/nsconfig/consolidated-reverse-zones.html
has been updated with the planned schedule and some new advice for users of Windows DNS Server.
To summarise:
On Monday 21 February we will replace the the 32 zones [224-255].232.128.in-addr.arpa by versions using DNAMEs that indirect into in-addr.arpa.cam.ac.uk.
If all goes well, during the first week in March we will get the delegations of the 32 zones replaced by DNAMEs in the parent zone 232.128.in-addr.arpa.
If all still goes well, we plan to remove the 32 zones [224-255].232.128.in-addr.arpa completely on Monday 14 March.
The schedule is rather tight because we want to complete this work during full term if possible. If there have to be substantial delays, some of the later steps will be postponed until after Easter.
BIND users who want to slave zones providing reverse lookup for substantially the whole CUDN should slave "in-addr.arpa.cam.ac.uk" and "232.128.in-addr.arpa" (the latter from the CL nameservers) if they are not already doing so, and they should cease slaving the 32 zones [224-255].232.128.in-addr.arpa after step (2) but before step (3). [There will be a further announcement here when step (2) has been completed.]
Windows DNS Server users should note that we no longer recommend that they should stealth slave any zones, see
http://www-tus.csx.cam.ac.uk/windows_support/Current/activedirectory/dns/dnssec.html
If you do feel you must continue such stealth slaving, the earlier link contains advice about which versions support zones
containing DNAMEs and which do not. In particular, those using Windows 2003 or 2003R2 should cease slaving any of the zones
[224-255].232.128.in-addr.arpa as soon as possible, before step (1).
]]>http://jackdaw.cam.ac.uk/ipreg/nsconfig/sample.named.conf
has been updated to include the zone in-addr.arpa.cam.ac.uk
. Until
recently this was contained within the cam.ac.uk
zone, but it is
now a separate (unsigned) delegated zone. It currently provides the
reverse lookup records for IP addresses in the range 128.232.[128-223].x
but we hope to extend that to cover the whole of 128.232.[128-255].x
eventually.
A description of the zone and our plans for it can be found at
http://jackdaw.cam.ac.uk/ipreg/nsconfig/consolidated-reverse-zones.html
Please be reassured that there will be further announcements here (and probably elsewhere) before the extension to cover 128.232.[224-255].x is implemented.
]]>A trust anchor for the root zone has now been added to the configuration
of the CUDN central recursive nameservers (at 131.111.8.42 & 131.111.12.20),
in addition to the existing one for dlv.isc.org
used for "lookaside
validation". There is no immediate prospect of being able to drop the
latter, as there are still huge gaps in the signed delegation tree
(the "ac.uk" zone, for example).
For those running their own validating recursive nameservers, the pages
https://jackdaw.cam.ac.uk/ipreg/nsconfig/dnssec-validation.html https://jackdaw.cam.ac.uk/ipreg/nsconfig/dnssec-testing.html
have been updated with some relevant information.
]]>http://jackdaw.cam.ac.uk/ipreg/nsconfig/db.cache
has been updated
with a copy. The substantive change is the addition of an IPv6
address for i.root-servers.net
. As usual with such changes, there
is little urgency to update your copies.
The rest of this posting is about validating DNSSEC-signed zones.
ICANN have held their first "key signing ceremony" and appear to be on target to sign the root zone on Thursday 15 July. See http://www.root-dnssec.org/ for details. We expect to be including a trust anchor for the signed root zone on the CUDN central recursive nameservers (131.111.8.42 and 131.111.12.20) shortly after it is available.
If you are operating a validating nameserver, there are issues about the supported signing algorithms. There are currently three important ones:
Mnemonic Code Supported by Can be used with which BIND versions[1] negative responses RSASHA1 5 9.4 Only zones using NSEC NSEC3RSASHA1 7 9.6 Zones using NSEC or NSEC3[2] RSASHA256 8 9.6.2 or 9.7 Zones using NSEC or NSEC3
[1] or later.
[2] but as NSEC3RSASHA1 is otherwise identical to RSASHA1, it is almost invariably used with zones using NSEC3 records.
Zones signed only with algorithms unsupported by particular software will be treated by them as unsigned.
Only RSASHA1 is officially mandatory to support according to current IETF standards, but as the intention is to sign the root zone with RSASHA256, it will become effectively mandatory as well. (Other organisations are already assuming this. For example, Nominet have signed the "uk" top-level domain using RSASHA256, although they do not intend to publish a trust anchor for it other than by having a signed delegation in the root zone.)
Therefore, if you want to be able to use a trust anchor for the
root zone you will need software that supports the RSASHA256
algorithm, e.g. BIND versions 9.6.2 / 9.7 or later. As an aid
for checking this, the test zone dnssec-test.csi.cam.ac.uk is
now signed using RSASHA256. For details on how to test, see
http://jackdaw.cam.ac.uk/ipreg/nsconfig/dnssec-testing.html
There are no immediate plans to change the algorithm used to sign our production DNS zones from RSASHA1.
]]>It has been the practice to post copies of messages posted to the cs-nameservers-announce mailing list to the local newsgroup ucam.comp.tcp-ip.
The local newsgroups ucam.* are expected to be phased out before long, so I propose that we discontinue this practice. If anyone feels differently, please let us know.
At that time, we received pleas to continue the copies to the ucam.comp.tcp-ip newsgroup for as long as it remained in existence (which has in fact been much longer than was then anticipated). However, its demise now really is imminent, see e.g.
http://ucsnews.csx.cam.ac.uk/articles/2010/03/30/newsgroups-and-bulletin-boards
Therefore I have removed the references to ucam.comp.tcp-ip from the mailing list description and from the sample.named.conf file, and this message will be the last one copied to the newsgroup.
]]>http://jackdaw.cam.ac.uk/ipreg/nsconfig/sample.named.conf
The following changes have been made:
Some references relating to DNSSEC validation have been added. For more details, though, consult as before
http://jackdaw.cam.ac.uk/ipreg/nsconfig/dnssec-validation.html
A recommended setting for "max-journal-size" is included. Without this, the journal files for incrementally updated zones will grow indefinitely, and for signed zones in particular they can become extremely large.
The most significant change concerns the zone private.cam.ac.uk
.
Previously, there was no delegation for this zone in cam.ac.uk
.
However, we have found that with the most recent versions of BIND,
defining private.cam.ac.uk
as either "type stub" or "type forward"
in combination with using DNSSEC validation, led to validation failures
due to BIND's inability to prove private.cam.ac.uk
unsigned while
cam.ac.uk
is signed.
On consideration, we have decided to create a delegation for
private.cam.ac.uk
after all. (The only effect for users outside
the CUDN should be that they will consistently get a REFUSED response
for names in that zone, instead of sometimes getting NXDOMAIN instead.)
This will also allow us to increase the number of official nameservers
for private.cam.ac.uk
(within the CUDN, obviously), and perhaps to
sign it without having to advertise a trust anchor for it by special
means.
Nameservers on the CUDN should therefore either slave private.cam.ac.uk
,
or not define it at all in their configuration. (Using "type stub" or
"type forward" will continue to work for non-validating servers, but
should be phased out.)
However, our corresponding reverse zones 16.172.in-addr.arpa
through 30.172.in-addr.arpa
cannot be delegated from the parent
zone "172.in-addr.arpa". Luckily there are delegations there to
the IANA "black hole" (AS112) servers, and this suffices to make
the zones provably unsigned. Any of "type slave", "type stub"
or "type forward" can be used for these zones (with or without
validation) and one of them must be used or reverse lookups
will fail.
ucam-itsupport
last Friday announcing new recommendations for Active Directory
and Windows DNS Server configurations within the CUDN, described
more fully at
http://www-tus.csx.cam.ac.uk/windows_support/Current/activedirectory/dns/ad_dns_config_info.html
These were the result of discussions between our PC Support group and our Hostmaster group. This message gives part of the background to our thinking, and some points may be relevant to institutions not using Windows DNS Server at all.
It will be no surprise that the advice not to ("stealth") slave zones
from the CUDN central (authoritative) nameservers was motivated by
the deficiencies of the various versions of Windows DNS Server when
slaving signed zones (not to mention other defects in its treatment
of unknown DNS record types and SOA serial number handling). Not
slaving zones such as cam.ac.uk
does have the disadvantage that
resolving of names and addresses of hosts local to the institution
may fail if it is cut off from the rest of the CUDN, but we think
this should be tolerated because of the other advantages.
The advice to forward requests not resolved locally to the CUDN
central (recursive) nameservers may seem contrary to advice given
in https://jackdaw.cam.ac.uk/ipreg/nsconfig/sample.named.conf
and
in previous messages to this list. In the case of Windows DNS Server
configurations the primary intent was to make sure that queries for
names in private.cam.ac.uk
and corresponding reverse lookups worked
correctly. (Configured laboriously via GUI, many zones were omitted
from Windows DNS Server setups in the past.) However, there is the
more general point that the central servers provide DNSSEC validation
for the increasing proportion of names for which it is available,
and forwarding requests to them takes advantage of that if validation
is not being performed locally. We should admit, though, that the
communication path between the institution and the CUDN central
nameservers is not yet secured cryptographically. (If there is
a fully functional validating recursive nameserver local to the
institution, that could of course be used instead of the CUDN
central nameservers.)
Another issue is the likelihood that we will be changing the set
of reverse zones available for slaving during the next year. In
particular we are likely to want to extend the scheme described
at http://people.pwf.cam.ac.uk/cet1/prune-reverse-zones
which we
are already using for reverse lookup of the 128.232.[128-223].x
range to cover 128.232.[224-255].x as well, eliminating the 32
individual zones used for the latter range at present.
When the Jackdaw Oracle server was moved to new hardware and upgraded to Oracle 10 earlier this year, the opportunity was taken to change the encoding it uses for character data from ISO Latin 1 to UTF-8. However, little change will have been apparent at the user interfaces, because translation from and to ISO Latin 1 were made for input and output.
This has now been changed so that all interfaces use UTF-8. In
particular, the IP registration web pages now use UTF-8 encoding, and
so do files downloaded from the list_ops
page. Files uploaded should
also be in UTF-8: invalid encodings (such as might be caused by using
the ISO Latin 1 encoding instead) will be replaced by Unicode
replacement characters '�' (U+FFFD).
Only those fields that are allowed to contain arbitrary text (such as
equipment
, location
, owner
, sysadmin
, end_user
, remarks
)
are affected by this change. Values (the great majority) that are in
7-bit ASCII will not be affected because it is the common subset of
ISO Latin 1 and UTF-8.
We have identified a few values in the IP registration data which have suffered the unfortunate fate of being converted from ISO Latin 1 to UTF-8 twice. We will be contacting the relevant institutional COs about them.
]]>All zones which are copied from any of the UCS servers (cam.ac.uk
,
private.cam.ac.uk
, and the reverse zones) need to be refreshed so they
have a serial number which starts 125... rather than 346... The serial
number can be found in the Start of Authority tab for the zones properties.
To refresh the zones try the following steps;
In a DNS MMC select the DNS server, right click and select clear cache. For any zone you copy, right click and select Transfer from Master. Check the serial number for the zone once it has loaded.
If the serial number hasn't been updated you may have tried too soon, wait a couple more minutes and try again. However if after ten minutes it hasn't updated you can also try;
If the serial number hasn't been updated delete the zone, clear the cache and re-create. Check the serial number once it has fully loaded.
Final resort: delete the zone, clear the cache, delete the files from C:\Windows\System32\DNS then re-create.
In most cases methods 1 or 2 will work.
For those with older copies of notes from the Active Directory course which are being used as reference, don't. You should check your configuration information at the following locations.
http://www-tus.csx.cam.ac.uk/windows_support/Current/activedirectory/dns/configureserver.html
http://www-tus.csx.cam.ac.uk/windows_support/Current/activedirectory/dns/dnssec.html
Incidentally, Windows 2008 DNS Server is not immune to the problem (but method 1 above should normally work for it).
]]>time_t
value). We had made sure that this was an increase in RFC 1982
(published August 1996) terms. No version of BIND has any problem with this.
Unfortunately, we did not foresee that many versions of Windows DNS Server (apparently even those as late as Windows 2003 R2) cannot cope with this change, repeatedly attempting to transfer the zone at short intervals and discarding the result. We are seeing a great deal of churning on our authoritative nameservers as a result. (This affects servers that are fetching from 131.111.12.73 [fakedns.csx.cam.ac.uk] as well.)
It is too late for us to undo this change. If you are running Windows DNS Server and are failing to fetch cam.ac.uk and similar DNS zones, you should discard your existing copy of the zone(s). Andy Judd advises us that you "need to delete the zone in a DNS MMC and then delete the zone files from C:\Windows\System32\dns and C:\Windows\System32\dns\backup, then re-create the zone". Please ask Hostmaster and/or PC Support for assistance if necessary.
We shall be contacting the administrators of the hosts that are causing the most continuous zone-fetching activity on our servers.
]]>111.131.in-addr.arpa 0.0.2.0.0.3.6.0.1.0.0.2.ip6.arpa
next Tuesday morning, 29 September 2009.
For those who stealth slave either or both of these zones, but cannot cope with signed zones, unsigned versions will remain available from fakedns.csx.cam.ac.uk [131.111.12.73]. Other relevant information may be found via the DNSSEC-related links on
https://jackdaw.cam.ac.uk/ipreg/nsconfig/
In future, we may not always announce when particular zones are expected to become signed.
Any problems should be referred to hostmaster@ucs.cam.ac.uk
]]>http://jackdaw.cam.ac.uk/ipreg/nsconfig/dnssec-windows.html
which we will update in the light of experience.
Only Windows 2008 R2 is practically trouble-free in this context. Earlier versions will generate very large numbers of messages in the system log about unknown record types, and may not result in a usable copy of the zone.
However, with Windows 2003 R2 or Windows 2008 you can use the registry option described at
(using the 0x2 setting) and this should allow you to slave a signed zone, although not actually to use the signatures.
For other versions, or in any case if problems arise, you can slave the zone from 131.111.12.73 [fakedns.csx.cam.ac.uk] instead of from 131.111.8.37 and/or 131.111.12.37. This server provides unsigned versions of all the zones described as available for slaving from
the latter addresses in
http://jackdaw.cam.ac.uk/ipreg/nsconfig/sample.named.conf
for transfer to clients within within the CUDN. It should not be used for any other purpose.
Any problems should be referred to hostmaster@ucs.cam.ac.uk.
]]>https://www.isc.org/node/474
This is high severity denial-of-service bug which is being exploited in the wild. Nameservers are vulnerable if
They have any zone of "type master", whose name is known to the attacker. Note that this includes zones such as "localhost" (but apparently not BIND's generated "automatic empty zones").
The attacker can get a DNS update request through to the server. For example, those with a port 53 block at the CUDN border router can be attacked (directly) only from within the CUDN. Access controls within BIND cannot protect against the vulnerability.
Those who use versions of BIND supplied with their operating system should look for advisories from their respective suppliers.
]]><4-digit-year><2-digit-month><2-digit-day><2-more-digits>
format for the zones the Computing Service maintains. We are about to switch to using "seconds since 1900-01-01" (not 1970-01-01, because we need the change to be an increase, in RFC 1982 terms). This is part of the preparations for using DNSSEC-signed zones, where some SOA serial increases are imposed by BIND as part of the re-signing operations.
All of our zones now contain an HINFO record at the apex which contains version information in the old format; e.g.
$ dig +short hinfo cam.ac.uk "SERIAL" "2009072120"
We expect these to remain a human-readable version indication, although not necessarily in exactly this format.
]]>http://jackdaw.cam.ac.uk/ipreg/nsconfig/dnssec-validation.html
We continue to make progress towards signing cam.ac.uk. The previous signed near-clone "cam.test" will be removed at the end of this week. Instead we have a new such zone "dnssec-test.csi.cam.ac.uk" which is properly delegated and registered at dlv.isc.org. Instructions on how to slave it or validate against it are at
http://jackdaw.cam.ac.uk/ipreg/nsconfig/dnssec-testing.html
We have had almost no feedback so far. We would like to hear from anyone who has successfully slaved it, but even more from those who tried and failed. We believe that much old nameserver software will be unable to cope, and expect to have to provide "dumbed-down" unsigned versions of the signed zones for such clients. We need to estimate how large the demand will be for such a service.
]]>131.111.8.42 or 2001:630:200:8080::d:0 131.111.12.20 or 2001:630:200:8120::d:1
since the morning of Tuesday 9 June, and no significant problems have arisen. We now expect this state to persist indefinitely.
Therefore, will all those who kindly assisted us by pointing their resolvers at the testing validating nameservers please switch back to using the regular ones. We shall be monitoring the use of the testing addresses and in due course contacting those who are still using them. Eventually they will be reused for other testing purposes.
]]>http://people.pwf.cam.ac.uk/cet1/dnssec-validation.html
As a separate but related exercise, we plan to sign our own zones,
starting with cam.ac.uk
, as soon as we can. To investigate the
problems involved, we have set up a signed almost-clone of cam.ac.uk
,
called cam.test
, and made it available in various ways within the CUDN.
Some of the things you could try doing with it are described here:
http://people.pwf.cam.ac.uk/cet1/signed-cam.html
[The fact that these web pages are in a personal space rather than in, say, http://jackdaw.cam.ac.uk/ipreg/ emphasizes their temporary and provisional nature. Please don't let that stop you reading them!]
]]>http://people.pwf.cam.ac.uk/cet1/dnssec-testing.html]]>
http://people.pwf.cam.ac.uk/cet1/prune-reverse-zones
At the moment we are using this method for these address ranges:
192.84.5.*
128.232.[128-223].*
Some nameserver software (especially Windows DNS Server) may be unable to cope with zones containing DNAMEs: they will have to avoid stealth slaving (for example) 232.128.in-addr.arpa. We don't believe that any stub resolvers fail to cope with the "synthesised CNAMEs" generated from DNAMEs, although at least some versions of the glibc resolver log warning messages about the DNAME (but give the right answer anyway). If anyone experiences problems as a result of what we are doing, please let us know.
In the light of experience, we may later extend this scheme to other address ranges, e.g. 128.232.[224-255].* which is currently covered by 32 separate reverse zones. However, we will give plenty of warning before making such a change.
]]>Historically, there were compelling reasons to favour 131.111.8.42 over 131.111.12.20, and therefore to list them in that order in resolver configurations. The machine servicing 131.111.12.20 was severely overloaded and often had much poorer response times.
For the last two years, this has not been the case. The two services run on machines with equal power and for nearly all locations within the CUDN there is no reason to prefer one over the other. Since last September, one of them has been in our main machine room on the New Museums Site, and one at Redstone, providing improved physical redundancy.
However. we observe that the load on 131.111.8.42 is still several times that on 131.111.12.20, presumably as a result of the historical situation. For a while now we have been randomising the order in which the two addresses appear in the "nameservers:" line generated when the "register" or "infofor*" functions are used on the ipreg/single_ops web page, but we suspect that COs rarely act on that when actually setting up resolver configurations.
We would like to encourage you to do a bit of randomising yourselves, or even to deliberately prefer 131.111.12.20 to redress the current imbalance. If you have resolvers which support it, and you are configuring only these two addresses as nameservers, then you could sensibly use "options rotate" to randomise the order they are tried within a single host. (Unfortunately, this doesn't work well if you have a preferred local resolver and want to use the two CS nameservers only as backups.)
]]>This message, like the earlier ones referred to, was sent to
ucam-itsupport
at lists because it is of concern to all IPreg
database updaters, not just to stealth slave administrators.
However, it has been plausibly suggested that they ought to have
been sent to cs-nameservers-announce at lists as well, if only so
that they appear in its archives. Therefore, this one is being so
sent!
Subsequent to the changes of schedule to every 12 hours (September) and every 6 hours (November), we have now made a further increase in the number of (potential) updates to our DNS zones. Currently the regular update job runs at approximately
01:00, 09:00, 13:00, 17:00 and 21:00
each day (the exact times are subject to variation and should not be relied upon). We are reserving the 05:00 slot, at which actual changes would be very rare, for other maintenance activity.
The "refresh" parameter for these zones has also been reduced from 6 hours to 4 hours: this is the amount by which stealth slaves may be out of date (in the absence of network problems). The TTL values for individual records remains 24 hours: this is how long they can remain in caches across the Internet.
]]>(Updated 2009-01-13)
Various exceptions to the general network access controls are applied at CUDN routers for some individual IP addresses. Some of these are at the border routers between the CUDN and JANET, and others at the individual CUDN routers interfacing to institutional networks.
We have implemented a scheme which we hope will enable us to keep
better control over these exceptions. When an exception is created
for a registered IP address, that address is added to one of the
following anames
janet-acl.net.private.cam.ac.uk
for exceptions at the border routers, usually permitting some
network traffic that would otherwise be blocked,
cudn-acl.net.private.cam.ac.uk
for exceptions at the local CUDN routers, usually allowing
some use of high-numbered ports for those vlans for which
such a restriction is imposed.
block-list.net.private.cam.ac.uk
for addresses for which all IP traffic is completely blocked,
usually as the result of a security incident.
As long as the attachment to the aname
remains, it prevents the main
registration from being rescinded. The intent is that this will result
in the institutional COs requesting removal of the exception at that point.
If the IP address is not registered, then it is first registered as
reserved.net.cam.ac.uk
or reserved.net.private.cam.ac.uk
as
appropriate, and then processed as above. This prevents it being
reused while the exception still exist. (Some of these cases are due
to the fact that we did not have the scheme in the past, and there are
several now-unregistered IP addresses whose exceptions were never
removed.)
Note that this apparatus only deals with exceptions for individual IP addresses, not those for whole subnets.
Requests for the creation or removal of network access control exceptions should be sent to cert@cam.ac.uk.
]]>cs-nameservers-announce
mailing list to the local newsgroup
ucam.comp.tcp-ip
. This is promised both in the descriptive text
for the mailing list, and in the initial comments in
The local newsgroups ucam.* are expected to be phased out before long, so I propose that we discontinue this practice. If anyone feels differently, please let us know.
The archives of the mailing list are accessible to non-members, at
and there is no intention to change that.
]]>pmms.cam.ac.uk
has been removed from the list of zones that
may be slaved given in
http://jackdaw.cam.ac.uk/ipreg/nsconfig/sample.named.conf
Historically, this zone was a clone of "dpmms.cam.ac.uk", but it is now
essentially empty and will soon be removed entirely. If your nameserver
currently slaves pmms.cam.ac.uk
, you should remove it from its
configuration file as soon as is convenient.
Independently, some comments have been added to the sample configuration file about IPv6 addresses that can be used as alternative to the IPv4 ones for fetching zones or forwarding requests, for those whose nameservers themselves have IPv6 connectivity.
]]>for an authoritative account. The remainder of this note refers specifically to what to do if you are running a recursive nameserver using BIND. (Authoritative-only servers have [almost] no cache and are not affected.)
For full details, see http://www.isc.org/ , especially the links under "Hot Topics" - "Upgrade Now!". In summary, ISC have released the following new versions:
if you are using upgrade to or if you are prepared use a "beta" version BIND 9.5.x 9.5,0-P1 9.5.1b1 BIND 9.4.x 9.4.2-P1 9.4.3b2 BIND 9.3.x 9.3.5-P1 BIND 9.2.x (or earlier) - no fix available - time to move!
Note that the earlier round of changes in July last year (versions 9.2.8-P1, 9.3.4-P1, 9.4.1-P1, 9.5.0a6), that improve defences against cache poisoning by randomising query ids, are no longer considered adequate. The new fixes rework the randomisation of query ids and also randomise the UDP port numbers used to make queries. Note that if you specify a specific port in the "query-source" setting, e.g. to work your way through a recalcitrant firewall, you will lose much of the advantage of the new fixes.
If you are not in a position to upgrade, you can forward all requests to other recursive nameservers that you trust. The recursive nameservers provided by the Computing Service, at IP addresses 131.111.8.42 and 131.111.12.20, are now running BIND 9.4.2-P1 and can be used in this way by hosts on the CUDN.
If you need advice about this, please contact hostmaster@ucs.cam.ac.uk.
]]>As before, all names in the database have an associated domain whose
value must be in a predefined table and is used to control user access.
However this can now be any suffix part of the name following a dot
(or it can be the whole name). If a CO has access to the domain
dept.cam.ac.uk
, then they can register names such as
foobar.dept.cam.ac.uk
(as previously) or foo.bar.dept.cam.ac.uk
,
or even dept.cam.ac.uk
alone (although this last may be inadvisable).
Such names can be used for "boxes" as registered and rescinded via the
single_ops
page, and also (to the rather limited extent that COs
have delegated control over them) for vboxes
and anames
.
There are cases when one already registered domain name is a
suffix of another, e.g. sub.dept.cam.ac.uk
and dept.cam.ac.uk
.
Often these are in the same management zone and the longer name
is present only to satisfy the previously enforced constraints.
In these cases we shall phase out the now unnecessary domain.
However, in a few cases they are in different management zones,
with different sets of COs having access to them. It is possible
for a CO with access only to dept.cam.ac.uk
to register a name
such as foobar.sub.dept.cam.ac.uk
, but its domain part will
be taken as dept.cam.ac.uk
and not sub.dept.cam.ac.uk
. This
is likely to cause confusion, and we will be relying on the good
sense of COs to avoid such situations.
For CNAMEs, the mechanism using strip_components
described in the
previous article still exists at the
moment, but it will be soon be replaced by a cname_ops
web page in
which the domain part is deduced automatically, as for the other
database object types mentioned above, rather than having to be
specified explicitly. (Now implemented, 2008-06-05.)
We advise that COs should not use sub-domains too profligately, and plan their naming schemes carefully. Any questions about the new facilities should be emailed to us.
]]>There is also a new root hints file with these addresses added, and the copy at
http://jackdaw.cam.ac.uk/ipreg/nsconfig/db.cache
has been updated.
Of course, the IPv6 addresses are not useful if your nameserver does not (yet) have IPv6 connectivity, but they should do no harm, and on general principles it's inadvisable to let one's root hints file get too out of date.
]]>http://jackdaw.cam.ac.uk/ipreg/nsconfig/
None of these require urgent action.
First, the set of locally defined empty reverse zones, intended to stop queries for the corresponding IP addresses being sent to the Internet's root nameservers, has been brought into line with those created automatically by BIND 9.4 and later. Some of the IP address ranges covered are larger than before, while some are smaller. If you are actually running BIND 9.4 or later, then you can omit most of these zone definitions, but note that "0.in-addr.arpa" should not yet be omitted (as of BIND 9.4.2), and nor should those for the RFC1918 institution-wide private addresses.
There are new versions of the zone files db.null, db.localhost, and db.localhost-rev. The first has been made identical to that which BIND 9.4 generates internally, except that the SOA.mname value is "localhost" rather than a copy of the zone name (this avoids a warning message from BIND when it is loaded). The other two, intended to provide forward and reverse lookup for the name "localhost", have been modified in a similar way. These files no longer have "sample" in their name, because they no longer require any local modification before being used by BIND.
Some changes to sample.named.conf have been made in support of IPv6. The CUDN IPv6 range 2001:630:200::/48 has been added to the "camnets" ACL definition: this becomes relevant if you are running a nameserver providing service over IPv6. The corresponding reverse zone "0.0.2.0.0.3.6.0.1.0.0.2.ip6.arpa" has been added to the list that can be slaved from 131.111.8.37 and 131.111.12.37: it may be desirable to do that if your nameserver is providing a lookup service to clients on IPv6-enabled networks, whether it uses IPv6 itself or not.
In addition, a number of comments have been corrected or clarified. Note in particular that BIND does not require a "controls" statement in the configuration file to make run-time control via the "rndc" command work. See the comments for more details. It should only rarely be necessary to actually restart a BIND daemon due to a change in its configuration.
]]>l.root-servers.net
,
has changed from 198.32.64.12 to 199.7.83.42. (Such changes are
rare: the last one was in January 2004.)
If you are running a nameserver with a root hints zone file, that should be updated. There are a number of ways of generating a new version, but the official with-comments one is at
ftp://ftp.internic.net/domain/named.root
and there is a copy of that locally at
http://jackdaw.cam.ac.uk/ipreg/nsconfig/db.cache
Modern versions of BIND 9 have a compiled-in version of the root hints zone to use if none is defined in the configuration file. As a result of this change, the compiled-in version will be out of date for existing BIND versions: a corrected version has been promised for the next versions of BIND 9.3.x, 9.4.x and 9.5.x.
Using a slightly out-of-date root hints zone is unlikely to cause serious problems, but it is something that should not be allowed to persist indefinitely.
]]>www-foo.dept.cam.ac.uk
rather than www.foo.dept.cam.ac.uk
.
We have tentative plans to restructure the database to liberalise this constraint everywhere, but this is a major undertaking and will not happen soon. However, we have been able to provide partial relief in the special case of CNAMEs.
In the table_ops
page under object type cname
there is now
a field strip_components
. This can be set to a number which
controls how many leading components are stripped from the
name
value to convert it to a domain. (Note that it has no
affect on the treatment of target_name
.) For example, setting
it to 2 for www.foo.dept.cam.ac.uk
associates it with the domain
dept.cam.ac.uk
rather than the (probably non-existent) domain
foo.dept.cam.ac.uk
. Leaving the field null is equivalent to setting
it to 1. (0 is an allowed value, but note that creating a CNAME
dept.cam.ac.uk
is disallowed if there is a mail domain with that
name.)
First, the configuration files have been moved from
ftp://ftp.cus.cam.ac.uk/pub/IP/Cambridge
to
http://jackdaw.cam.ac.uk/ipreg/nsconfig
and internal references have been adjusted to match. The old location will contain copies of the updated files only for a very limited overlap period.
Second, the sample.named.conf file now recommends use of
notify no;
in the "options" statement. BIND is by default profligate with its use of notify messages, and a purely stealth nameserver can and should dispense with them. See the comments in the file for what to do if you also master or officially slave other DNS zones.
Third, comments in the file previously suggested that one could use a "type forward" zone for private.cam.ac.uk. Although this does work for the corresponding private reverse zones, it does not for the forward zone if cam.ac.uk itself is being slaved. In that case, if you don't want to slave the whole of private.cam.ac.uk, then you should use a "type stub" zone instead. See the new comments for details.
]]>Unfortunately, our recovery procedure was flawed, and introduced creeping corruption into the filing system. The relevant machine became unusable at about 14:45 today (Monday 16 July). In order to get the most important services functional again,
the recursive nameserver at 131.111.12.20 [recdns1.csx] was moved to a new Solaris 10 zone on the machine already hosting authdns0 & recdns0: this was functional from about 15:45 (although there were some short interruptions later);
the non-recursive authoritative nameserver at 131.111.12.37 [authdns1.csx] had its address added to those being serviced by the authdns0 nameserver at about 20:10 this evening.
Of course, we hope to get the failed machine operational again as soon as possible, and authdns1 & recdns1 will then be moved back to it.
Please send any queries about these incidents or their consequences to hostmaster@ucs.cam.ac.uk.
]]>We currently plan to lock down the recursive nameservers at 131.111.8.42 and 131.111.12.20, so that they do not respond to queries from outside the CUDN and also do not allow zone transfers, during the first week of the Easter term (23-27 April). We will update you on this closer to the time.
We now intend to make these changes, at least insofar as zone transfers are concerned, early on Thursday 26 April.
We would like to thank all those who made changes to nameservers in their jurisdiction to fetch DNS zones from 131.111.8.37 / 131.111.12.37 instead. Logging has shown that the number of hosts still fetching from 131.111.8.42 / 131.111.12.20 is now quite small. Some final reminders will be sent to those who still have not made the change.
]]>ftp://ftp.cus.cam.ac.uk/pub/IP/Cambridge/sample.named.conf
Firstly, one of the MRC-CBU subnets was incorrectly omitted from the "camnets" ACL, and has been added.
Secondly, questions were asked about the setting of "forwarders" in the "options" statement, and so I have added some comments about that. We used to recommend its use, but have not done so for some time now, except in situations where the nameserver doing the forwarding does not have full access to the Internet. However, if query forwarding is used, it should always be to recursive nameservers, hence to 131.111.8.42 and 131.111.12.20 rather than to the authoritative but non-recursive nameservers at 131.111.8.37 and 131.111.12.37.
We are now logging all outgoing zone transfers from 131.111.8.42 and 131.111.12.20, and will be contacting users who have not made the change to fetch from 131.111.8.37 and 131.111.12.37 instead, as time and effort permit. Help us by making the change before we get around to you!
We currently plan to lock down the recursive nameservers at 131.111.8.42 and 131.111.12.20, so that they do not respond to queries from outside the CUDN and also do not allow zone transfers, during the first week of the Easter term (23-27 April). We will update you on this closer to the time.
]]>ftp://ftp.cus.cam.ac.uk/pub/IP/Cambridge/sample.named.conf
This is a major revision, which includes new reverse zones, advice on access control settings, and several other changes. However the most important, and one which anyone managing such a slave nameserver should act on as soon as possible, is that the zones which were previously being fetched from
masters { 131.111.8.42; 131.111.12.20; };
should now be fetched from
masters { 131.111.8.37; 131.111.12.37; };
instead. The background to this is described below.
We are in the process of separating the authoritative nameservers for the Cambridge University DNS zones from those providing a recursive DNS lookup service for clients on the CUDN. To minimise the pain, it is the latter which have to retain the existing IP addresses. When the transformation is complete we will have
authdns0.csx.cam.ac.uk [131.111.8.37] authdns1.csx.cam.ac.uk [131.111.12.37]
providing non-recursive authoritative access to our zones (and zone transfer for appropriate zones to clients on the CUDN) while
recdns0.csx.cam.ac.uk [131.111.8.42] recdns1.csx.cam.ac.uk [131.111.12.20]
will provide a recursive lookup service to CUDN clients (but not zone transfers), and no service at all outside the CUDN.
]]>cs-nameservers-announce@lists.cam.ac.uk
has been
converted from an old-style list to a Mailman list. (See
https://www.lists.cam.ac.uk for background information.)
The list options attempt to match the previous state of affairs. The list is moderated, and subscription requires administrator action (but you can now request it via the web pages as well as by message). On the other hand, unsubscription by end-user is enabled.
Digests are not available. Archives will be kept and can be read even by non-members.
]]>See the download-cookie page on Jackdaw
for a more complete description of the scheme. At the moment only the
list_ops
page can be used with downloaded cookies for the ipreg
realm, and it requires a certain amount of reverse engineering to be
used with a non-interactive tool. Pages more suitable for this sort of
use may be provided later in the light of experience. The current
state is quite experimental and we would ask anyone planning to use it
in production to let us know.
Some departments and colleges are using firewall software written by Ben McKeegan at Netservers Ltd., which interacts with the IP registration database using the old method of authentication via an Oracle account password. A version of this software that uses downloaded cookies as described above is under development and we hope it will be available soon.
For several reasons we want to restrict the number of people who have SQL-level access to the underlying Oracle database, and there has been a recent purge of unused Oracle accounts. If you have good reason to need such access to the IP registration part of the database, please let us know.
]]>Such operations can be done using the table_ops
page after selecting
object type cname
, in ways that will be familiar to those who have
performed modifications to existing CNAMEs in the past. We recognise
that this interface is somewhat clunky, and a tailored cname_ops
web
page may be made available in the future.
There is a maximum number of CNAMEs associated with each management zone in the database, which can be altered only by us. hese limits have been set high enough that we do not expect sensible use of CNAMEs to reach them very often. Users will be expected to review their existing CNAMEs before asking for an increase.
]]>