On Tue, 11 Oct 2005, Paul Eggert wrote: > > > - any _raw_byte_ is in the 0x80-0x9f range (it might not be UTF-8) > > Why quote the raw bytes? Is this for terminal escapes on older xterm > (or xterm-like) implementations that don't understand UTF-8? It's not about "understanding" UTF-8. Even a perfectly modern xterm may simply not be in UTF-8 mode: if it wasn't in an UTF-8 locale, then it won't do UTF-8 decoding. > If so, I'm not sure I'd bother, as it would introduce a lot of annoying > quoting with perfectly reasonable UTF-8, and (if we assume the world > is moving to UTF-8) it addresses a problem that is going away. UTF-8 is only _now_ getting really widespread, and I think it's because RedHat bit the bullet and made UTF-8 the default locale a few years ago. These things take _decades_. I don't know if you realize it, but it's only within the last couple of years that the old 7-bit "finnish ASCII" went away. Finnish and Swedish have three extra characters: åäö (latin1) and åäö (utf-8). But only within the last few years has the really _old_ ASCII representation really gone away so much that I don't see it at all (the characters '{' '}' and '|' were taken over, so that if you had a Finnish ASCII font, programming in C was really funky - but it was common enough that I could do it without thinking much about it ;) So lots of people still use the byte-wide encodings. Whether really old ASCII only or some special locale-dependent one (of which latin1 and the "win-latin1" thing are obviously the most common by far). And in that locale, it's not the UTF-8 control characters that matter, it's the _byte_ control characters that do. So if you want to support any other locale than UTF-8, you need to escape them. Assuming you want to escape control characters at all, of course (I still think it's perfectly fine to just let the raw mess through and depend on escaping at higher levels) Linus