this is a presentation within a presentation outer presentation is about the real question: should we be hacking DNS to add support for non-US-ASCII character sets {at this time, ever}? inner presentation is about how to do it if we decide that it's a good idea. the outer presentation is the real issue, the inner is a small matter of protocol design. soundbite summary: we need to decide whether we should do this before worrying about how we should do it. the non-question: should we evolve the internet towards a state where people can name machines in character sets other than us-ascii? yes, obviously. the big question: should we do this in the dns? pro: dns is what we have now, it's what the world knows about, and there may not be anything we can do to prevent non-us-ascii dns. we know that iso-latin-1 dns is already in use in at least one european country. do we really want to create a whole new battleground for the name wars? perhaps it would be better to reuse the battleground we already have... it's not very hard to add other character sets to the dns itself from a technical standpoint. con: dns is already being used for things it's not particularly good at, like white pages service. perhaps it's time to stop enhancing dns and move on to something better, or at least different. normalization (glyph => code mapping) is hard in the non-us-ascii space, and none of the solutions are entirely satisfactory. how complicated (and slow) are we willing to make the process of mapping the glyphs on a business card into an IP address? adding non-us-ascii support to dns is the least of the technical problems. updating applications to do something reasonable is a much bigger job. this list is almost certainly not exhaustive. deploying without serious examination of the big question would be a very bad idea. there really seem to be three choices here: a) stick with us-ascii dns and address the problem elsewhere b) transition dns to us-ascii + unicode. c) transition dns to the mime model: support ALL the character sets, and tag them so that we can figure out which one(s) we're looking at. if we say "stick with us-ascii", some people will do non-us-ascii anyway. it probably won't interoperate, and in any case will be beyond our control. if we say "transition to unicode", utf-8 seems like the way to go. utf-5 doesn't add much beyond the ability to deploy without updating any software, but the value of deploying without updating software to make use of the new names seems dubious at best. if we say "transition to mime model with tagged character sets", we can do that. it's a relatively minor change to the dns protocols. in the following, don't forget that we haven't answered the big question yet, so while this is all good fun for us bitheads, we may decide that this is worse than what we have now. dns myths: - dns labels are limited to alphanumerics plus "-" - dns labels are limited to us-ascii don't reinvent the wheel: - steal design from mime wherever possible [insert existing kakameymi.example slide here] how to display these names on screens without support for the specified character set - at worst, there's always mime encoding: =?ISO-8859-8?Q?=E9=EE=E9=E9=EE=E0=F7=E0=F7?=.example - perhaps someone will think of a less ugly way. normalization: - this is hard. it's hard just within unicode, having multiple charsets makes it harder. - it probably would be possible to encode some kind of "search list" as a new RR type in the DNS, so that at any given level in the tree can have its own normalization order, at least as far as trying different character sets goes. even assuming that this is a good idea (unclear), it needs to be examined carefully for circular dependencies and bootstrapping problems. eg, do charset search lists need additional section processing (ie, is there a "glue" problem here)? - us-ascii almost certainly should remain first on all normalization lists for the forseeable future, because of the installed base. - it's possible that, within particular cultures, leaving unicode out of the search list would make this mechanism simpler than any mechanism involving unicode could ever be. i could be wrong. none of this is relevant until we answer the big question.