The future of the IP address register

17 Dec 1999 - Tony Stoneley

Those who do not learn from CAPSA are doomed to repeat it.

This was originally written as an informal internal Computing Service document, and "we", "us", etc refer to the CS. It is now offered up for comment, equally informal and preferably by email to ajms@cam.ac.uk, from anyone potentially affected. It is still early days, however, and progress is unlikely to be rapid.

Introduction

The human effort needed to support ip-register as currently conceived has been growing steadily over the years, and the growth is likely to continue for several years to come. The style differs significantly between Departments and Colleges, but there is comparable growth in both areas. In Colleges there is presumably a natural plateau of some sort to this growth, when every student has a private computer, but at present we are only about one third of the way there and in any case we already have students with more than one computer. Likewise there is presumably some comparable plateau to departmental requirements, but again that is quite some way off and there is no reason to suppose an actual ceiling. The overall workload for ip-register nowadays amounts to roughly one fte, and that only when those in the hot seat have long familiarity with the job. This Autumn for example ip-register has been dealing with a sustained 40-50 emails every day, some containing substantial batches of separate requests. Nor is it easy to see how the work as currently organised could be split much more than it already is, interlock being mainly by human serialisation. It seems clear to me that some kind of crisis is drawing closer, and will certainly overtake us before any natural ceiling to the work is reached.

On the other hand, there is growing pressure for and growing sense in local COs being able to update the records directly, thus cutting out the tedium, irritation and delay, however small, in having to do this indirectly by human-to-human email request. Many requests in fact are dealt with on autopilot, as it were, in the knowledge that the requesting CO is competent and trustworthy. Some checking and constraint is clearly necessary, but in many transactions there is nothing that could not in principle be automated.

It seems to me, therefore, that it is time to set about redesigning the whole ip-register organisation, automating and delegating wherever possible. This will not be an easy operation, but if we don't make a start soon we will collapse in crisis with heaven knows what horrors forced on us. For some time now I have been ruminating about what could be done and have consulted with various people, but it is time to set some kind of plan in black and white so that all who are interested can chip in.

General requirements and consequential strategy

We have learnt the painful way that we cannot safely devolve custody of the data. The data have to be held by us and remain ours, even though we may devolve access rights. However, devolving access is precisely what we want to do, and this means devolving to several hundred individuals. Clearly the access has to be constrained so that only certain individuals can access particular parts of the data, the records relating to their bailiwicks. Equally clearly even then there needs to be control on the form of the data that they can specify. We need controlled and safe multiple update of the overall database. It seems obvious that the current loosely termed database must be transformed into a database in the strong technical sense.

The only widespread standard for database access is SQL, and the relational model is the only widely deployed one. There is one leading commercial vendor, Oracle, and a small number of freeware offerings. Sadly, none of the freeware offerings seems to have industrial strength. Perhaps the most credible are PostgreSQL and MySQL. The former does indeed have some attractive features such as the IP address datatype and subclassing, but it seems to lack any significant attention to access control, lacks write-through views, and is said to perform poorly with the simultaneity of access that we require. MySQL takes a more realistic view of security but currently lacks views entirely. Without write-through views SQL seems singularly useless for the purpose at hand: they are essential, not just a luxury, as for that matter are stored privileged procedures. The business world clumsiness of Oracle fills me with depression, but it appears to be the clear winner. If there's anything better, please please tell me about it!

It should be clearly understood that the size of the data is not an issue. In commercial database terms ours is titchy. In its entirety it would fit in main memory, and could thereby gain much freedom of expression. What we do need however is secure interlocked controlled multiple remote access with non-parochial protocols and apparatus, and in practice these are only found in packages intended for much larger databases, and to which we are apparently thus forced.

Some kind of automatic audit trail is clearly needed. We currently have four principal trails: the overt preservation of tombstones, the sccs trail, the automatically archived copies of all email, and the selective manual classified mail archive. These collectively guard against ip-register blunders, confusion on the part of clients, disputes about who did or did not do what, and loss of background info with the passage of time and COs. They have all been invaluable, and in as much as they cannot be simply extended into the new regime some alternative apparatus is needed. This clearly needs to be inboard and automatic, and whilst Oracle does not directly provide such facilities it does have apparatus for building them. In particular, procedures can be called automatically at any update, and can be written to maintain audit trails in whatever detail you like.

We also have to tie in with other databases. The IP address register associates addresses with people. It also has role concepts like Computer Officers. It lists locations, affiliations, and so forth. Although the University is weak (to say the least) in its personnel databases, we should try to work towards integration (in some sense) with them rather than trying to build yet another one. Not withstanding the uncertainties about the future of personnel databases around here, there seems to be some virtue in running with the general stream, which is Oracle, and probably in being in close contact with the CS adminstrative database.

The database is only one end of the story, being useless without client interfaces, and it seems clear that at least two client interfaces are needed, a batch programmable interface and a GUI. On the one hand we have colleges requiring bulk registrations counted in hundreds if not thousands, the data usually coming out of their own private databases, and for such people a GUI is utterly useless. On the other hand we have small technically semi-literate departments which will be quite unable to manage without the fully prompting manual interface that constitutes a GUI. Moreover this interface has to be available to someone who only has a PC and doesn't really understand the PC. Even some of the larger departments may in fact prefer to work with a GUI, the transactions coming in dribs and drabs and without any local database. Of course these alternatives should not be exclusive.

The obvious batch interface, if we do indeed use standard database technology, is SQL, in harness with routines implemented in the server and accessible via SQL. Many programming languages have interfaces to SQL, and that wide choice should prove comfortable whatever the user's programming prejudices.

The obvious vehicle for the GUI interface is the web. The GUI needs to be accessible without hassle to the least technical people, and the ubiquity of web browsers offsets all limitations. In any case the requirement is largely transactional, matching the web architecture, and the obvious problem of security has well established (if distasteful) solutions.

In either case possibly the hardest primitive to provide is "find me a suitable new address". The diversity of mandatory and advisory constraints currently guiding humans in this function may not be so easy to codify, but it will have to be done, even possibly at the expense of some present day friendliness and flexibility.

There also needs to be a back-end interface, out of which the DNS files are generated, out of which other network management functions can extract data, and into which perhaps some data can be returned from such functions. If we do adopt standard database technology, we can (and must) ride on SQL, again with suitable and suitably privileged in-built procedures, so in the grand sense we do not really have to forge a new interface mechanism after all, though the devil in the detail remains.

Finally, we have to recognise that we need phased introduction rather than big bang, and also that there will doubtless remain a myriad of special case requirements that are and will remain best dealt with by hand. We should aim for a system that takes care of the common easy cases in short order, allowing the complex cases to be left to humans in the first instance and possibly for automation at some later stage.

Other network management data

I have thus far over-simplified the picture, quite deliberately, in order to concentrate on one particular problem, which I see as big, distinct and urgent. There are however two other interests which need to be reviewed. One is the current use of ip-register's file to record CUDN routing structure, and the other is the network mapping project. In both cases I am hampered by my limited knowledge and would welcome any comment, but I do want to avoid miring the attack on my principal target, the great multitude of simple address assignments.

The routers immediately present an escalation of complexity, inherently having multiple interfaces and supporting multiple routes on individual interfaces. There is clearly no bar to simply recording the use of router addresses along with all other addresses, but representing the full network structure seems to demand a database complexity that is top heavy in respect of the common cases. It also seems to fit less comfortably on the relational database model, though not impossibly. Nor does it require the devolved multiple update that is necessary for ordinary address management. My feeling is that we should set this problem aside for separate treatment, perhaps in the same database but at some later stage. I hope I make it clear elsewhere that the transition to my proposed world does not imply demolition of the old, merely depopulation, and the old apparatus can certainly continue in respect of routing records.

The network mapping database as I understand it differs from the administrative one I envisage in several ways. It represents complex structure in a more direct, detailed and convenient way than a relational database can. It is a synthesis from several sources, including active probing of the network. It is not designed for multiple devolved update by minimally trained people. I think we agree that the way forward is for this database and ip-register's to communicate with each other on an enquiry basis, but actually to be distinct and unfettered by each other.

What do other people do?

It might reasonably be thought that Cambridge is not unique in having this problem. Is there any ready made solution? Well, Cambridge is not unique, though it does have some peculiarities, and there does not appear to be a good ready made solution. The problem is frequently posed in the newsgroups, but answers are few.

Many people see this as a DNS management problem, and approach it from that side. To them devolving address management is synonymous with devolving a DNS zone. There are (at least) two objections to this. Firstly it implies precisely what we do not want to do, that is to devolve custody of and control of the data. Secondly, it requires that every devolved establishment has its own DNS management specialist, not to mention its own actual servers. In Cambridge, this expertise is singularly lacking in precisely the quarters where it would most be needed. We have a skills shortage and a budgetary impossibility.

At least one establishment has taken our view, but rolled its own technology. The one I know about is Ganymede, which on the face of it looks like the right sort of thing but which in reality leaves me uneasy. For example we seem on the borderlines of the envisaged scale, which appears to be that of a large department rather than a whole university. On the other hand there is generality in the design, and hardware may well keep pace with growth requirements. There also seem to be various idiosyncracies which suggest that this is still a parochial development project, for all its admitted generality. It is specifically only in beta test of first full release. Oh, and it's Java-based, which may turn you on or off.

There are a number of commercial offerings, but these either don't really address our particular needs or are at breathtaking prices or both. Many are toys in our context, but those that are serious all concentrate on the problems we don't actually have, the generation of DNS and NIS(+) etc files, and have a corporate model utterly alien to any university.

I believe the truth is that most universities do not do anything like what we would consider proper network management, and that no University, even in the USA, could afford the resource that larger corporations consider appropriate and essential.

Finally, we come up against the fact that this is boring administrative computing. No one wants to have anything to do with it. It is also tied in with parochial things like people and rooms, and no two company personnel databases are alike. (The total absence of a unified personnel database may be a little extreme, but I feel sure that comparable shambles are not unusual.)

I don't think there is a viable ready made answer, but I'd love to hear about one.

Database technicalities

There is a systems analysis issue at the outset. On the one hand the data to be represented potentially have complex structure. A box can support several addresses, either on multiple interfaces or on just one. It can have several associated names and the forward and reverse mappings between names and addresses can be complicated. These names and addresses may simply represent different routes to the same thing, or different services on the same entity or completely unrelated virtual machines. The addresses even may float between physical machines for fail-over. At least in principle we need to allow for the full generality of all possible combinations, and this obviously requires a number of separate SQL tables to represent the multiple relationships between the names, addresses and machines.

On the other hand, overwhelmingly the commonest case is a simple machine with a single address and a single name, and this is therefore the case that we particularly want to cover. It would be perfectly possible to represent only this case in the database, lifting all more complicated stuff out for separate treatment, probably manual and much as at present. All that would be needed then would be a single SQL table, each row representing just one machine, address and name, and this simplification holds much attraction. The essentially cost-free enhancement of a flag warning of complications would allow complicated cases to retain some representation in the simple database. Simplicity is always desirable of itself, but in this case it maximises the ratio of benefit to effort and also minimises the actual implementation time.

There is also a third way which adds a further dimension of thought. Rather than allowing complete generality it would be possible to build into the structure of the database the structure of the (network) world as we intend it and which our actual use of the DNS etc is supposed to mirror. For example, one could have overt conceptual types like virtual host, multi-homed host, service address and simple alias. In this world many required constraints fall out naturally from the structure, whereas they have to be programmed into the update code explicitly if the full generality is available in the data representation. On the other hand, this world design can then be a piece of concrete standing in the way of tomorrow's as yet unknown developments.

At time of writing I remain doubtful about which of these three lines to take, though I have swung from favouring the quick and simple to favouring the fully general, not least as a result of time spent pondering the system analysis. Comment from experienced database designers would be most welcome.

At all events, I am convinced that the addresses should be represented in a single table that contains all of them, all that is that are in use anywhere in the management domain, and also all that are available for use. In the v4 world that simply means all addresses in the CS-managed ranges. In the v6 world where the supply of addresses is so much larger it might rather have to mean the stock currently released (by the CS) for use, that stock being replenished centrally in bulk as necessary. Each such address would also carry a status indicator, showing whether it was in use, available for use or perhaps not to be used. Pre-populating the table with all possible addresses has several advantages over allowing generation (by ordinary COs) of new entries. In particular, universally denying the ability to add new rows demolishes the problem of policing who can add which rows. It also helps with the get-new-address function, which now reduces to a search within the range of authority for an address marked as unused. It may even have the psycho-social effect of encouraging harvesting of disused addresses, an ethic which a generator of new numbers might not so clearly foster.

The representation of addresses gives us pause. Clearly we should be able to represent full IPv6 addresses, and at root these are arbitrary 128-bit bitstrings. There is established convention for ways in which these can be represented textually, but nothing that could really be described as the unique or single canonical representation, sufficiently canonical that one could rely on character string matching to establish equivalence. In any case, masking operations are needed. I propose to represent addresses internally as Oracle RAW type. Unfortunately Oracle provides no masking operations, but these can be synthesized straightforwardly if rather horribly by packaged routines, and likewise with translation between the internal RAW form and standard textual representations.

Transition

We currently have a system which though onerous is working well and whose good working is crucially important. Numerous true stories from elsewhere make it abundantly plain that phased transition is essential and that big bang cutover would be crying out for trouble. Fortunately as it happens I don't see any great difficulty about phased transition. Presumably we start by migrating just one departmental or college network, quite possibly the CS staff network, and do not actually devolve anything, rather using the new interfaces ourselves. Having built up some experience with that we can assess what more needs to be done before migrating a few more, trying some more complex cases, and trying out devolution with just a few guinea pigs. Gradually rather than suddenly the run of the mill transactions will shift to the new world, leaving as planned just the complex cases for manual action by us as now.

That said, it will be a long time before we are even ready to start, a much longer time before we have shed any significant workload, and a very long time before the transition is in any sense complete.