git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <junkio@cox.net>
To: Paul Eggert <eggert@CS.UCLA.EDU>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] Try URI quoting for embedded TAB and LF in pathnames
Date: Tue, 11 Oct 2005 00:37:57 -0700	[thread overview]
Message-ID: <7vy850v6zu.fsf@assigned-by-dhcp.cox.net> (raw)
In-Reply-To: <87mzlgh8xa.fsf@penguin.cs.ucla.edu> (Paul Eggert's message of "Mon, 10 Oct 2005 23:20:01 -0700")

Paul Eggert <eggert@CS.UCLA.EDU> writes:

> The convention I had been thinking of adding is to have GNU diff
> use shell-quoting style, e.g.,
>
> 'three
> o'\''clock'
>
> to represent a file name with a newline and an apostrophe in it.
> This sort of file name can be cut and pasted into the shell.
> The quoting could be used with any file name containing a
> troublesome character.
>
> Perhaps another quoting style would be better.

A patch header (both "diff --git" line and ---/+++ lines) I've
been considering, and have in the proposed updates branch, looks
something like this:

    diff --git a/def\nghi/pqr b/dee/pqr
    similarity index 72%
    rename from def\nghi/pqr
    rename to dee/pqr
    index 9ee055c..243fbbc 100644
    --- a/def\nghi/pqr
    +++ b/dee/pqr
    @@ -1 +1,3 @@
     Fri Oct  7 23:19:04 PDT 2005
    +foo
    +foo

If we can keep things on one line, that would help parsing the
stuff very simple, but more importantly, it is easier to see
what's happening.  The pattern is the same whether you have
funny pathnames or not, and that helps the human consumer.

Adjusting the "git diff" output to the style the GNU diff with
your shell quoting style would produce something like this:

    diff --git 'a/def
    ghi/pqr' b/dee/pqr
    similarity index 72%
    rename from 'def
    ghi/pqr'
    rename to dee/pqr
    index 9ee055c..243fbbc 100644
    --- 'a/def
    ghi/pqr'
    +++ b/dee/pqr
    @@ -1 +1,3 @@
     Fri Oct  7 23:19:04 PDT 2005
    +foo
    +foo

Which, while it is possible to make tools parse them, is very
distracting for humans to read and review.  Yes, LF is quoted,
but it still breaks the line, disrupting the pattern we are used
to see.  If you are talking about a funny file, whose name is
"a\ndiff --git a/b/c", your diff would look like this:

    diff --git 'a/
    diff --git a/b/c' 'b/
    diff --git a/b/c'
    index 9ee055c..243fbbc 100644
    --- 'a/
    diff --git a/b/c'
    +++ 'b/
    diff --git a/b/c'
    @@ -1 +1,3 @@
     Fri Oct  7 23:19:04 PDT 2005
    +foo
    +foo

We are used to tell the "less" command to do "/^diff --git .*"
while reviewing patches.  The shell quoting, while I admit I
learned its beauty from you, is a disaster for human consumption.

For diff output quoting purposes, LF is the only thing that
matters, as you mentioned in another message to me.  Our parsing
side ("GNU patch" counterpart) checks two pathnames on "diff
--git" line and makes sure what follows a/ and b/ are consistent
(that is, they should be identical, or each are the same as
"rename from" and "rename to"), so there is no ambiguity.  But
again for human consumption purposes, we cannot easily tell SP
and TAB apart by just reading, and a TAB is so unusual character
to have in pathname (as opposed to SP which is not that
uncommon), we may be better off making them visible.

Quoting TAB incidentally has an added benefit, which you as GNU
diff/patch person would probably not care too much about.  Our
other tools sometimes need to show two paths in one record, and
TAB is used as the field separator between two paths (LF is the
record separator).  The tools do have '-z' mode to let us use
anything but NUL in the pathname, and carefully written scripts
tend to run them with '-z' flag and use Perl or Python to parse
paths out, but it would be nicer if we did not always have to.

For example, the 'git commit' command prepares the log editor
with the status information about changes being committed, and
needs to mention paths.  This is purely for human consumption,
and showing something like:

	# Type commit message to this file.  Lines that start
        # with '#' are ignored.
        #
        # Updated but not checked in:
        #   (will commit)
        #
        #	new file: ab\n\tc/mno
        #	modified: abc/mno
        #	renamed: def\nghi/pqr -> dee/pqr
        ...

is perfectly readable for human users, and can be done without
running the tool in '-z' mode, if the tool output is quoted with
'\n' and '\t' convention -- the parsing and formatting side can
just split the field with TAB and show them, without worrying
about an embedded LF making the rest of the pathname spilling
over to the next line.  And once we start teaching the user we
represent funny characters in their paths this way, it becomes
nicer to be consistent in the diff output as well.

  reply	other threads:[~2005-10-11  7:39 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-10-07 19:35 [RFC] embedded TAB and LF in pathnames Junio C Hamano
2005-10-07 23:29 ` Alex Riesen
2005-10-07 23:44   ` Junio C Hamano
2005-10-08  6:45     ` Alex Riesen
2005-10-08  9:10       ` Junio C Hamano
2005-10-08 13:30         ` [PATCH] Try URI quoting for " Robert Fitzsimons
2005-10-08 18:30           ` Junio C Hamano
2005-10-08 20:19             ` Junio C Hamano
2005-10-11  6:20               ` Paul Eggert
2005-10-11  7:37                 ` Junio C Hamano [this message]
2005-10-11 15:17                 ` Linus Torvalds
2005-10-11 18:03                   ` Paul Eggert
2005-10-11 18:37                     ` Linus Torvalds
2005-10-11 19:42                       ` Paul Eggert
2005-10-11 20:56                         ` Linus Torvalds
2005-10-12  6:51                           ` Paul Eggert
2005-10-12 14:59                             ` Linus Torvalds
2005-10-12 19:07                               ` Daniel Barkalow
2005-10-12 19:52                                 ` Linus Torvalds
2005-10-12 20:21                                   ` H. Peter Anvin
     [not found]                               ` <87vf02qy79.fsf@penguin.cs.ucla.edu>
2005-10-12 21:02                                 ` Junio C Hamano
2005-10-12 21:05                                 ` Linus Torvalds
2005-10-12 21:09                                   ` H. Peter Anvin
2005-10-12 21:15                                   ` Johannes Schindelin
2005-10-12 21:33                                   ` Junio C Hamano
2005-10-14  0:57                                   ` Paul Eggert
2005-10-14  5:43                                     ` Linus Torvalds
2005-10-12 21:24                                 ` Linus Torvalds
2005-10-14  0:16                                   ` Paul Eggert
2005-10-14  5:20                                     ` Linus Torvalds
2005-10-14 17:18                                       ` H. Peter Anvin
2005-10-14  6:59                                 ` Junio C Hamano
2005-10-09 10:42           ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7vy850v6zu.fsf@assigned-by-dhcp.cox.net \
    --to=junkio@cox.net \
    --cc=eggert@CS.UCLA.EDU \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).