IP Register schema wishlist

2017-12-19 - Future - Tony Finch

Here are some criticisms of the IP Register database schema and some thoughts on how we might change it.

There is a lot of infrastructure work to do before I am in a position to make changes - principally, porting from Oracle to PostgreSQL, and developing a test suite so I can make changes with confidence.

Still, it's worth writing down my thoughts so far, so colleagues can see what I have in mind, and so we have some concrete ideas to discuss.

I expect to add to this list as thoughts arise.

Existing context on Jackdaw

It's worth unpacking "Jackdaw" a bit.

There is Jackdaw the service, consisting of a number of Linux servers that host an Oracle DBMS and Apache mod_perl application framework.

In the Oracle DBMS there are a number of databases: a production database that is also called jackdaw, which is occasionally cloned for operational reasons to make jdawdev, jdawtest, etc.

The original user of the Jackdaw database was the user administration system; the IP Register database is a separate schema within the same database.

Future: Better use of SQL schemas outside Jackdaw

The plan is to make the IP Register database its own self-contained system hosted on PostgreSQL.

Instead of multiple databases within a DBMS, there will be just one. The dev and test databases will be on separate Linux systems isolated from the production servers. (I'm currently working on a dev system using a cluster of VMs hosted on my workstation.)

Breaking away from the user-admin database makes it possible to use multiple schemas within the IP Register database. There's an obvious place where this can make a big improvement.

At the moment, the Perl IP Register front-ends pass around a $prefix which privileged users can set to choose whether they are working at their normal privilege level (my_) or in read-all mode (ra_) or in access-all-areas mode (all_). This means that most of the SQL queries are full of string interpolations, which means the code is not trivially obviously free of SQL injection bugs.

If each set of access control views is moved to its own schema, then the database front end can can set PostgreSQL's schema search path to choose between my, ra, and all schemas, and use static SQL instead of interpolating $prefix.

Further future: row-level security

Dean Rasheed tells me I should look at row-level security as a better access control mechanism than IP Register's existing updatable views. In particular, it would allow us to give users read access to objects which they can't update.

I like the sound of this a lot, but it will require a lot more thought to work out how to make good use of it.

Coupling between IP Register and User-Admin

There are a few cases where the IP Register database makes use of the fact that it is just a schema within the wider Jackdaw database.

Firstly, the mzone_co, areader, and registrar tables refer to people by their CRSID; there are constraints tying these tables to the pers table in the user-admin database. I will have more to say about this below.

Secondly, the Gossamer system which handles the provisioning of Falcon web sites (and used to handle MWS sites) has feet in both camps: the authoritative details of the web sites are held in the user-admin database, and the DNS-specific information is copied to the IP Register database. The current provisioning scripts take advantage of the fact that these are actually the same database, and use transactions that span both user-admin and IP Register databases when provisioning sites.

Those are the only cases I am currently aware of...

Future: ACLs from Lookup

There's a lot of unnecessary bureaucracy maintaining the IP Register access control lists.

Instead of manual maintenance by UIS staff, the access control lists should come from Lookup groups via LDAP, so each institution can maintain their own access list in the same way as they maintain other groups.

Currently I think that it should require a quick manual process for an existing mzone_co to mirror a new ACL from Lookup to the IP Register database. This is because control over the DNS gives you huge amounts of privilege, and I don't want a compromise of Lookup to automatically lead to compromise of the DNS.

Future: two factor authentication

We currently have a rough prototype implementation of TOTP for registrar access to the IP Register database.

This should be made available as an option for all mzone_cos.

The TOTP secrets need to be added to the database, with a mechanism allowing IT staff within an institution to reset each others TOTP secrets wihtout having to talk to the UIS.

Future: Gossamer, MWS, etc.

There's no real need or benefit from Gossamer using transactions that span the user-admin and IP Register databases - all the other provisioning systems that hang off Jackdaw are loosely coupled and eventually consistent, and this is true for the other web provisioning side of Gossamer.

We have a number of special interfaces to IP Register that (like Gossamer) need special privilege: the MWS3 API; the IP filter interlock; the RPZ pruner; ... These should all be refactored to follow a common set of design principles to be determined.

DNS backend vs asset register

In every-day usage the IP Register database falls awkwardly between two stools: it isn't just a DNS backend, because each entry in the database requires certain metadata which isn't exported to the DNS; but the schema for this metadata is frequently too impoverished to be helpful for much of the information we want to record about DNS registrations.

As far as I can tell, this aspect of the IP Register database came from the manual / semi-automated practices of the IP Register team in the 1990s, and it has not been re-worked since then.

Future: arbitrary metadata

The PostgreSQL jsonb data type allows us to attach schemaless data to a row. I would like to replace all the IP Register metadata fields that aren't used by the database with a jsonb column.

All tables have a remarks column, which is the basic metadata to be replaced by jsonb; other fields to fold into it include purpose, equipment, location, sysadmin, etc. usw.

Questions of what metadata to ask for then become a matter for the user interface.

Future: box vs vbox

The differences between boxes and vboxes are:

  • boxes and vboxes have different un-constrained metadata fields;

  • vboxes are hosted on some other box or vbox, as recorded in the vbox_box table.

After the jsonb metadata change, the first difference goes away.

The second difference also becomes moot, because a box is equivalent to a vbox without an entry in the vbox_box table.

So this distinction should be eliminated.

LANs and subnets

Another hangover from the 1990s is the IP Register representation of LANs and subnets. It seems to be designed to make it convenient, when registering a box, to provide manual configuration details (router, subnet mask, name servers) to the admin of the box.

In practice an IP Register LAN typically corresponds to a VLAN as configured into the relevant network equipment - but the database does not record VLAN numbers. It seems to be a half-finished feature.

Future: DHCP support

The IP Register database currently has bare minimum support for exporting static registrations to our DHCP servers: you can tie an IP address to a MAC address and a DHCP group, and that's it.

I would like to be able to represent the rest of our DHCP configuration in the database too: DHCP-enabled subnets, dynamic pools, etc.

I would like an IP Register LAN to correspond to a VLAN from the network equipment point of view and a shared-network from the DHCP point of view, with a flag determining whether it is exported to the DHCP servers. We should also be able to match up the contents of the IP Register database to the router configurations more automatically.

The LAN rubric stuff then becomes useful as DHCP options.

I would also like to be able to represent DHCP pools in the database. In this area we need to be thinking about replacing old ISC DHCP with ISC Kea - the latter is database-backed which might or might not mean we will benefit from tying it directly to the IP Register database.

IPv6

There are a couple of annoying limitations with the way the IP Register database models IPv6 subnets.

  • It requires the subnet prefix length to be 64 bits. We generally use long prefixes for link subnets, so we want to be able to represent them as such in the database.

  • We have a convention for IPv6 service identifiers, using addresses of the form prefix::service:instance where prefix is 64 bits, and service and instance are 16 bits. We currently use a wiki page for keeping track of service identifiers; this should be in the database.

  • In IPv4, LAN names are associated with v4_address objects, so a subnet can be split between multiple logical LANs, and LANs can be updated by mzone_cos. In IPv6, the LAN is associated with the v6_prefix and they require privileged access to update. This should be delegated.

  • Ideally it should be possible to associate a LAN with a pair of a v6_prefix and a service identifier, so that static IPv6 addresses can be allocated automatically in a sensible way.

CNAMEs and anames

The only delegated mechanism we have for making DNS aliases is with CNAMEs.

Any interesting alias setups have to be done with aname objects, which can only be created by privileged IP Register users. This includes things like service aliases that combine multiple vboxes (which we use for the ppsw and recdns services), bare www-less web site names, and aliases for off-site services.

We also use anames as interlocks, to prevent records from being deleted when there is some network configuration that refers to them. (This is the ipfilter mechanism.)

Future: CNAME and aname refinement

CNAME quotas should be abolished.

We should have an alias mechanism that is published in the DNS as the addresses of the target, to avoid the restrictions on CNAME records. We have something like this for domains hosted on the MZS; in the future it should be based on standard ANAME records - which are not the same thing as IP Register anames.

The IP Register aname table should be renamed to avoid confusion with standard ANAMEs. (Renamed to what, though?)

I would like to delegate the ability to create IP Register anames, though this may hit a snag with the current mutable view acess control mechanism.

Future: unrestricted aliases

Our process for setting up off-site servers is currently very bureaucratic. To a large extent this is because it requires awkward work-arounds for limitations in the IP Register database.

To fix this, we should allow staff to set up arbitrary CNAMEs, arbitrary standard ANAMEs, and arbitrary address records.

Future: cross-mzone objects

This is a knotty area, and I'm not entirely sure what the right approach is.

At the moment, staff who have access to multiple mzones can create objects that span mzones without restriction - e.g. a name from one mzone pointing at an address in another mzone.

I'm inclined to think this is a bad idea, because it has awkward edge cases where not all staff in a group have access to the same mzones, and when mzones split or merge.

Instead, ordinary users should only be able to create objects that are entirely within an mzone. If they need to create mzone-spanning links they can use unrestricted aliases.

Privileged cross-mzone objects might still be used for ipfilter interlocks, and maybe for Falcon / MWS aliases.

TXT records

A large proportion of the non-database records in the cam.ac.uk zone are TXT records. These fall into roughly three areas:

  • SPF records, which are managed by a special-purpose script;

  • DKIM / DMARC records;

  • Microsoft / Google / etc. domain authentication records.

Future: SPF

SPF records are closely tied to mail domains. They should be moved into the database, hooked into the maildom table similarly to the mx table, to replace the current ad-hoc configuration file.

Future: other TXT

Staff should be able to create arbitrary TXT records, which would cover the other cases listed above. I'm not sure if it's worth implementing special cases for DMARC and DKIM - maybe syntax checks in the user interface will be enough.

IP addresses and subnets

The current IP Register database has a number of subroutines for converting IP addresses between packed octet strings (for storage in Oracle) and textual form (for presentation to users and export to the DNS).

Future: ip4r

The ip4r PostgreSQL extension provides native data types for IP addresses and subnets. Using this would allow us to eliminate a lot of awkward custom code.

Anything else?

So far this is just one person's thoughts, and I have a fairly limited and unusual perspective on the database. Because of this I'm very keen to hear other people's ideas and feedback: Do you like what I wrote above? Did I miss anything out? Do you have any suggestions?

Please send email to ip-register@uis.cam.ac.uk!