git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Paul Eggert <eggert@CS.UCLA.EDU>
To: Linus Torvalds <torvalds@osdl.org>
Cc: Junio C Hamano <junkio@cox.net>,
	Robert Fitzsimons <robfitz@273k.net>,
	Alex Riesen <raa.lkml@gmail.com>,
	git@vger.kernel.org, Kai Ruemmler <kai.ruemmler@gmx.net>
Subject: Re: [PATCH] Try URI quoting for embedded TAB and LF in pathnames
Date: Tue, 11 Oct 2005 23:51:41 -0700	[thread overview]
Message-ID: <877jcjmdmq.fsf@penguin.cs.ucla.edu> (raw)
In-Reply-To: <Pine.LNX.4.64.0510111346220.14597@g5.osdl.org> (Linus Torvalds's message of "Tue, 11 Oct 2005 13:56:12 -0700 (PDT)")

Linus Torvalds <torvalds@osdl.org> writes:

> you can read it as a UTF-8 stream, but then quote things at a byte
> level (ie if you quote one "character", you quote _all_ bytes in
> that character).

Yes, that's what I had in mind.

> And you quote if:
>
>  - the UTF-8 _character_ is in the 0x80-0x9f control range

Yes.  Or more generally, if it's any UTF-8 control character.

>  - any _raw_byte_ is in the 0x80-0x9f range (it might not be UTF-8)

Why quote the raw bytes?  Is this for terminal escapes on older xterm
(or xterm-like) implementations that don't understand UTF-8?  If so,
I'm not sure I'd bother, as it would introduce a lot of annoying
quoting with perfectly reasonable UTF-8, and (if we assume the world
is moving to UTF-8) it addresses a problem that is going away.

>  - any _raw_byte_ is 0xfe-0xff (illegal UTF-8 character)
>  - misformed UTF-8 (non-shortest sequence, or just generally invalid 
>    sequences with missing or wrong high bits)

Yes, that makes sense.

> quite frankly, that's a pretty painful thing to write.

It's not trivially short, yes.  But it shouldn't be that hard.

Also, I guess we don't have to write it, at least not at first.  As
long as we specify something like the C quoted-string format mentioned
earlier, we can encode into that format using a naive algorithm (e.g.,
quote any non-ASCII byte or ASCII control character), and beautify the
encoding method later.

> The upside is that it's easy to decode: you can _unquote_ it just as
> a byte stream.

Yes, that's the idea.

Also, the interchange format is the most important thing.  We have to
decode anything that is in the format, and we must encode into the
format.  Encoding prettily is nice, but not necessary.

  reply	other threads:[~2005-10-12  6:52 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-10-07 19:35 [RFC] embedded TAB and LF in pathnames Junio C Hamano
2005-10-07 23:29 ` Alex Riesen
2005-10-07 23:44   ` Junio C Hamano
2005-10-08  6:45     ` Alex Riesen
2005-10-08  9:10       ` Junio C Hamano
2005-10-08 13:30         ` [PATCH] Try URI quoting for " Robert Fitzsimons
2005-10-08 18:30           ` Junio C Hamano
2005-10-08 20:19             ` Junio C Hamano
2005-10-11  6:20               ` Paul Eggert
2005-10-11  7:37                 ` Junio C Hamano
2005-10-11 15:17                 ` Linus Torvalds
2005-10-11 18:03                   ` Paul Eggert
2005-10-11 18:37                     ` Linus Torvalds
2005-10-11 19:42                       ` Paul Eggert
2005-10-11 20:56                         ` Linus Torvalds
2005-10-12  6:51                           ` Paul Eggert [this message]
2005-10-12 14:59                             ` Linus Torvalds
2005-10-12 19:07                               ` Daniel Barkalow
2005-10-12 19:52                                 ` Linus Torvalds
2005-10-12 20:21                                   ` H. Peter Anvin
     [not found]                               ` <87vf02qy79.fsf@penguin.cs.ucla.edu>
2005-10-12 21:02                                 ` Junio C Hamano
2005-10-12 21:05                                 ` Linus Torvalds
2005-10-12 21:09                                   ` H. Peter Anvin
2005-10-12 21:15                                   ` Johannes Schindelin
2005-10-12 21:33                                   ` Junio C Hamano
2005-10-14  0:57                                   ` Paul Eggert
2005-10-14  5:43                                     ` Linus Torvalds
2005-10-12 21:24                                 ` Linus Torvalds
2005-10-14  0:16                                   ` Paul Eggert
2005-10-14  5:20                                     ` Linus Torvalds
2005-10-14 17:18                                       ` H. Peter Anvin
2005-10-14  6:59                                 ` Junio C Hamano
2005-10-09 10:42           ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877jcjmdmq.fsf@penguin.cs.ucla.edu \
    --to=eggert@cs.ucla.edu \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    --cc=kai.ruemmler@gmx.net \
    --cc=raa.lkml@gmail.com \
    --cc=robfitz@273k.net \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).