git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Paul Eggert <eggert@CS.UCLA.EDU>
To: Linus Torvalds <torvalds@osdl.org>
Cc: Junio C Hamano <junkio@cox.net>,
	Robert Fitzsimons <robfitz@273k.net>,
	Alex Riesen <raa.lkml@gmail.com>,
	git@vger.kernel.org, Kai Ruemmler <kai.ruemmler@gmx.net>
Subject: Re: [PATCH] Try URI quoting for embedded TAB and LF in pathnames
Date: Tue, 11 Oct 2005 12:42:12 -0700	[thread overview]
Message-ID: <87slv7zvqj.fsf@penguin.cs.ucla.edu> (raw)
In-Reply-To: <Pine.LNX.4.64.0510111121030.14597@g5.osdl.org> (Linus Torvalds's message of "Tue, 11 Oct 2005 11:37:46 -0700 (PDT)")

Linus Torvalds <torvalds@osdl.org> writes:

> the simplest question to ask is "what are we protecting against?"

I'd like to protect against:

  1.  File names that cannot be handled correctly with the current
      formats.  Newline is the obvious problem here, along with
      (arguably) tab and space.

  2.  Common transliterations of patches.  Many programs (and mailers,
      alas) expand tabs to spaces, append CR to lines, prepend spaces
      to lines, break lines at spaces, etc.  'patch' already deals
      with this to some extent, but it'd be nice if the format
      resisted these transliterations better.

  3.  Humans misreading patches.  The patch format is intended to be
      human-readable, after all.

  4.  Reencoded patches.  Programs like Emacs can and will convert
      patches from UTF-8 to EUC-JP, for example.

You convinced me that (4) is not worth the hassle, but I'd still like
to address (1)-(3) when it's easy.

> invalid UTF-8 [is] invalid UTF-8

Yes, but (2) and (3) can lose information about invalid UTF-8 if we
don't suitably protect the encoding errors.  I daresay that many
mailers will mishandle invalid UTF-8, for example.

> There _is_ something you may want to quote, namely the standard CSI
> terminal escapes.

If I understand you aright, we could do that by modifying my previous
proposal to escape all bytes in the UTF-8 representation of a control
character.  In Unicode, the characters 0080 through 009F are control
characters, so that should suffice to quote the terminal escapes you
mentioned.  (Perhaps we should also escape unassigned Unicode
characters too, on the theory that they might become control
characters in the future.)

> For any UTF-8 quoting scheme you come up with, I'll point out
> something that it does wrong or looks horrible for a Latin1 filename
> ;)

Yes, quite true.  But we don't have to come up with something that's
perfect in all cases, just something that's good enough to handle
cases that we expect will be common in practice, in a world where
UTF-8 is the preferred encoding for non-ASCII characters.

  reply	other threads:[~2005-10-11 19:43 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-10-07 19:35 [RFC] embedded TAB and LF in pathnames Junio C Hamano
2005-10-07 23:29 ` Alex Riesen
2005-10-07 23:44   ` Junio C Hamano
2005-10-08  6:45     ` Alex Riesen
2005-10-08  9:10       ` Junio C Hamano
2005-10-08 13:30         ` [PATCH] Try URI quoting for " Robert Fitzsimons
2005-10-08 18:30           ` Junio C Hamano
2005-10-08 20:19             ` Junio C Hamano
2005-10-11  6:20               ` Paul Eggert
2005-10-11  7:37                 ` Junio C Hamano
2005-10-11 15:17                 ` Linus Torvalds
2005-10-11 18:03                   ` Paul Eggert
2005-10-11 18:37                     ` Linus Torvalds
2005-10-11 19:42                       ` Paul Eggert [this message]
2005-10-11 20:56                         ` Linus Torvalds
2005-10-12  6:51                           ` Paul Eggert
2005-10-12 14:59                             ` Linus Torvalds
2005-10-12 19:07                               ` Daniel Barkalow
2005-10-12 19:52                                 ` Linus Torvalds
2005-10-12 20:21                                   ` H. Peter Anvin
     [not found]                               ` <87vf02qy79.fsf@penguin.cs.ucla.edu>
2005-10-12 21:02                                 ` Junio C Hamano
2005-10-12 21:05                                 ` Linus Torvalds
2005-10-12 21:09                                   ` H. Peter Anvin
2005-10-12 21:15                                   ` Johannes Schindelin
2005-10-12 21:33                                   ` Junio C Hamano
2005-10-14  0:57                                   ` Paul Eggert
2005-10-14  5:43                                     ` Linus Torvalds
2005-10-12 21:24                                 ` Linus Torvalds
2005-10-14  0:16                                   ` Paul Eggert
2005-10-14  5:20                                     ` Linus Torvalds
2005-10-14 17:18                                       ` H. Peter Anvin
2005-10-14  6:59                                 ` Junio C Hamano
2005-10-09 10:42           ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87slv7zvqj.fsf@penguin.cs.ucla.edu \
    --to=eggert@cs.ucla.edu \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    --cc=kai.ruemmler@gmx.net \
    --cc=raa.lkml@gmail.com \
    --cc=robfitz@273k.net \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).