Unicode in the IP registration database

2009-12-09 - News - Chris Thompson

(These changes first went into service on 2009-12-07, but had to withdrawn due to problems with uploads, in particular. They are now back in service.)

When the Jackdaw Oracle server was moved to new hardware and upgraded to Oracle 10 earlier this year, the opportunity was taken to change the encoding it uses for character data from ISO Latin 1 to UTF-8. However, little change will have been apparent at the user interfaces, because translation from and to ISO Latin 1 were made for input and output.

This has now been changed so that all interfaces use UTF-8. In particular, the IP registration web pages now use UTF-8 encoding, and so do files downloaded from the list_ops page. Files uploaded should also be in UTF-8: invalid encodings (such as might be caused by using the ISO Latin 1 encoding instead) will be replaced by Unicode replacement characters '�' (U+FFFD).

Only those fields that are allowed to contain arbitrary text (such as equipment, location, owner, sysadmin, end_user, remarks) are affected by this change. Values (the great majority) that are in 7-bit ASCII will not be affected because it is the common subset of ISO Latin 1 and UTF-8.

We have identified a few values in the IP registration data which have suffered the unfortunate fate of being converted from ISO Latin 1 to UTF-8 twice. We will be contacting the relevant institutional COs about them.