Git Mailing List Archive on lore.kernel.org
 help / color / Atom feed
From: Linus Torvalds <torvalds@osdl.org>
To: Paul Eggert <eggert@CS.UCLA.EDU>
Cc: Junio C Hamano <junkio@cox.net>,
	Robert Fitzsimons <robfitz@273k.net>,
	Alex Riesen <raa.lkml@gmail.com>,
	git@vger.kernel.org, Kai Ruemmler <kai.ruemmler@gmx.net>
Subject: Re: [PATCH] Try URI quoting for embedded TAB and LF in pathnames
Date: Wed, 12 Oct 2005 07:59:55 -0700 (PDT)
Message-ID: <Pine.LNX.4.64.0510120749230.14597@g5.osdl.org> (raw)
In-Reply-To: <877jcjmdmq.fsf@penguin.cs.ucla.edu>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1949 bytes --]



On Tue, 11 Oct 2005, Paul Eggert wrote:
> 
> >  - any _raw_byte_ is in the 0x80-0x9f range (it might not be UTF-8)
> 
> Why quote the raw bytes?  Is this for terminal escapes on older xterm
> (or xterm-like) implementations that don't understand UTF-8?

It's not about "understanding" UTF-8.

Even a perfectly modern xterm may simply not be in UTF-8 mode: if it 
wasn't in an UTF-8 locale, then it won't do UTF-8 decoding.

> If so, I'm not sure I'd bother, as it would introduce a lot of annoying
> quoting with perfectly reasonable UTF-8, and (if we assume the world
> is moving to UTF-8) it addresses a problem that is going away.

UTF-8 is only _now_ getting really widespread, and I think it's because 
RedHat bit the bullet and made UTF-8 the default locale a few years ago.

These things take _decades_.

I don't know if you realize it, but it's only within the last couple of 
years that the old 7-bit "finnish ASCII" went away. Finnish and Swedish 
have three extra characters: åäö (latin1) and åäö (utf-8). But only
within the last few years has the really _old_ ASCII representation really 
gone away so much that I don't see it at all (the characters '{' '}' and 
'|' were taken over, so that if you had a Finnish ASCII font, programming 
in C was really funky - but it was common enough that I could do it 
without thinking much about it ;)

So lots of people still use the byte-wide encodings. Whether really old 
ASCII only or some special locale-dependent one (of which latin1 and the 
"win-latin1" thing are obviously the most common by far). And in that 
locale, it's not the UTF-8 control characters that matter, it's the _byte_ 
control characters that do.

So if you want to support any other locale than UTF-8, you need to escape 
them. Assuming you want to escape control characters at all, of course (I 
still think it's perfectly fine to just let the raw mess through and 
depend on escaping at higher levels)

			Linus

  reply index

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-10-07 19:35 [RFC] " Junio C Hamano
2005-10-07 23:29 ` Alex Riesen
2005-10-07 23:44   ` Junio C Hamano
2005-10-08  6:45     ` Alex Riesen
2005-10-08  9:10       ` Junio C Hamano
2005-10-08 13:30         ` [PATCH] Try URI quoting for " Robert Fitzsimons
2005-10-08 18:30           ` Junio C Hamano
2005-10-08 20:19             ` Junio C Hamano
2005-10-11  6:20               ` Paul Eggert
2005-10-11  7:37                 ` Junio C Hamano
2005-10-11 15:17                 ` Linus Torvalds
2005-10-11 18:03                   ` Paul Eggert
2005-10-11 18:37                     ` Linus Torvalds
2005-10-11 19:42                       ` Paul Eggert
2005-10-11 20:56                         ` Linus Torvalds
2005-10-12  6:51                           ` Paul Eggert
2005-10-12 14:59                             ` Linus Torvalds [this message]
2005-10-12 19:07                               ` Daniel Barkalow
2005-10-12 19:52                                 ` Linus Torvalds
2005-10-12 20:21                                   ` H. Peter Anvin
     [not found]                               ` <87vf02qy79.fsf@penguin.cs.ucla.edu>
2005-10-12 21:02                                 ` Junio C Hamano
2005-10-12 21:05                                 ` Linus Torvalds
2005-10-12 21:09                                   ` H. Peter Anvin
2005-10-12 21:15                                   ` Johannes Schindelin
2005-10-12 21:33                                   ` Junio C Hamano
2005-10-14  0:57                                   ` Paul Eggert
2005-10-14  5:43                                     ` Linus Torvalds
2005-10-12 21:24                                 ` Linus Torvalds
2005-10-14  0:16                                   ` Paul Eggert
2005-10-14  5:20                                     ` Linus Torvalds
2005-10-14 17:18                                       ` H. Peter Anvin
2005-10-14  6:59                                 ` Junio C Hamano
2005-10-09 10:42           ` Junio C Hamano

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0510120749230.14597@g5.osdl.org \
    --to=torvalds@osdl.org \
    --cc=eggert@CS.UCLA.EDU \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    --cc=kai.ruemmler@gmx.net \
    --cc=raa.lkml@gmail.com \
    --cc=robfitz@273k.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Git Mailing List Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/git/0 git/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 git git/ https://lore.kernel.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.git


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git