this is a presentation within a presentation

outer presentation is about the real question: should we be hacking
DNS to add support for non-US-ASCII character sets {at this time, ever}?

inner presentation is about how to do it if we decide that it's a good idea.

the outer presentation is the real issue, the inner is a small matter
of protocol design.

soundbite summary: we need to decide whether we should do this before
worrying about how we should do it.

the non-question: should we evolve the internet towards a state where
people can name machines in character sets other than us-ascii?  yes,
obviously.

the big question: should we do this in the dns?

 pro:	dns is what we have now, it's what the world knows about, and there may
	not be anything we can do to prevent non-us-ascii dns.  we know that
	iso-latin-1 dns is already in use in at least one european country.

	do we really want to create a whole new battleground for the name wars?
	perhaps it would be better to reuse the battleground we already have...

	it's not very hard to add other character sets to the dns itself from
	a technical standpoint.

  con:	dns is already being used for things it's not particularly good
	at, like white pages service.  perhaps it's time to stop enhancing
	dns and move on to something better, or at least different.

	normalization (glyph => code mapping) is hard in the non-us-ascii space,
	and none of the solutions are entirely satisfactory.  how complicated
	(and slow) are we willing to make the process of mapping the glyphs on
	a business card into an IP address?

	adding non-us-ascii support to dns is the least of the technical problems.
	updating applications to do something reasonable is a much bigger job.

  this list is almost certainly not exhaustive.  deploying without serious
  examination of the big question would be a very bad idea.

there really seem to be three choices here:

  a) stick with us-ascii dns and address the problem elsewhere

  b) transition dns to us-ascii + unicode.

  c) transition dns to the mime model: support ALL the character sets,
     and tag them so that we can figure out which one(s) we're looking at.

if we say "stick with us-ascii", some people will do non-us-ascii
anyway.  it probably won't interoperate, and in any case will be
beyond our control.

if we say "transition to unicode", utf-8 seems like the way to go.
utf-5 doesn't add much beyond the ability to deploy without updating
any software, but the value of deploying without updating software to
make use of the new names seems dubious at best.

if we say "transition to mime model with tagged character sets", we
can do that.  it's a relatively minor change to the dns protocols.  in
the following, don't forget that we haven't answered the big question
yet, so while this is all good fun for us bitheads, we may decide that
this is worse than what we have now.

dns myths:
  - dns labels are limited to alphanumerics plus "-"
  - dns labels are limited to us-ascii

don't reinvent the wheel:

  - steal design from mime wherever possible

[insert existing kakameymi.example slide here]

how to display these names on screens without support for the
specified character set

  - at worst, there's always mime encoding:
	=?ISO-8859-8?Q?=E9=EE=E9=E9=EE=E0=F7=E0=F7?=.example

  - perhaps someone will think of a less ugly way.

normalization:

  - this is hard.  it's hard just within unicode, having multiple
    charsets makes it harder.

  - it probably would be possible to encode some kind of "search list"
    as a new RR type in the DNS, so that at any given level in the
    tree can have its own normalization order, at least as far as
    trying different character sets goes.   even assuming that this
    is a good idea (unclear), it needs to be examined carefully for
    circular dependencies  and bootstrapping problems.   eg, do charset
    search lists need additional section processing (ie, is there a
    "glue" problem here)?

  - us-ascii almost certainly should remain first on all normalization
    lists for the forseeable future, because of the installed base.

  - it's possible that, within particular cultures, leaving unicode out
    of the search list would  make this mechanism simpler than any
    mechanism involving unicode could ever be.  i could be wrong.

none of this is relevant until we answer the big question.