On Tue, 11 Oct 2005, Paul Eggert wrote:
> 
> >  - any _raw_byte_ is in the 0x80-0x9f range (it might not be UTF-8)
> 
> Why quote the raw bytes?  Is this for terminal escapes on older xterm
> (or xterm-like) implementations that don't understand UTF-8?

It's not about "understanding" UTF-8.

Even a perfectly modern xterm may simply not be in UTF-8 mode: if it 
wasn't in an UTF-8 locale, then it won't do UTF-8 decoding.

> If so, I'm not sure I'd bother, as it would introduce a lot of annoying
> quoting with perfectly reasonable UTF-8, and (if we assume the world
> is moving to UTF-8) it addresses a problem that is going away.

UTF-8 is only _now_ getting really widespread, and I think it's because 
RedHat bit the bullet and made UTF-8 the default locale a few years ago.

These things take _decades_.

I don't know if you realize it, but it's only within the last couple of 
years that the old 7-bit "finnish ASCII" went away. Finnish and Swedish 
have three extra characters: хфі (latin1) and УЅУЄУЖ (utf-8). But only
within the last few years has the really _old_ ASCII representation really 
gone away so much that I don't see it at all (the characters '{' '}' and 
'|' were taken over, so that if you had a Finnish ASCII font, programming 
in C was really funky - but it was common enough that I could do it 
without thinking much about it ;)

So lots of people still use the byte-wide encodings. Whether really old 
ASCII only or some special locale-dependent one (of which latin1 and the 
"win-latin1" thing are obviously the most common by far). And in that 
locale, it's not the UTF-8 control characters that matter, it's the _byte_ 
control characters that do.

So if you want to support any other locale than UTF-8, you need to escape 
them. Assuming you want to escape control characters at all, of course (I 
still think it's perfectly fine to just let the raw mess through and 
depend on escaping at higher levels)

			Linus