* [PATCH 0/2] (experimental) diff --quote-path-with-sp @ 2021-09-15 22:33 Junio C Hamano 2021-09-15 22:33 ` [PATCH 1/2] diff: simplify quote_two() Junio C Hamano ` (2 more replies) 0 siblings, 3 replies; 7+ messages in thread From: Junio C Hamano @ 2021-09-15 22:33 UTC (permalink / raw) To: git Long time ago, we had a discussion with GNU patch/diff maintainer and agreed that pathnames with certain "difficult" bytes needs to be quoted to ensure the resulting patch is machine parseable in an unambiguous way [*1*]. Recently, we saw a report that found that GNU patch is unhappy with our diff output for a path with SP in it [*2*]. With this experimental option, the beginning part of the patch output will have pathnames with SP in them enclosed inside a pair of double quotes, like so: diff --git "a/A Name" "b/A Name" --- "a/A Name" +++ "b/A Name" We have always done this enclosing inside dq when the pathname had an even more difficult byte (like LF, double quotes, and HT), so we know Git tools can handle such a patch output just fine already. *1* https://lore.kernel.org/git/87ek6s0w34.fsf@penguin.cs.ucla.edu/ *2* https://lore.kernel.org/git/YR9Iaj%2FFqAyCMade@tilde.club/ Junio C Hamano (2): diff: simplify quote_two() diff: --quote-path-with-sp Documentation/diff-options.txt | 5 +++++ diff.c | 26 ++++++++++---------------- diff.h | 1 + quote.c | 23 ++++++++++++++++++----- quote.h | 3 ++- t/t3902-quoted.sh | 24 +++++++++++++++++++++++- 6 files changed, 59 insertions(+), 23 deletions(-) -- 2.33.0-649-gd4f4107c2b ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/2] diff: simplify quote_two() 2021-09-15 22:33 [PATCH 0/2] (experimental) diff --quote-path-with-sp Junio C Hamano @ 2021-09-15 22:33 ` Junio C Hamano 2021-09-15 22:33 ` [PATCH 2/2] diff: --quote-path-with-sp Junio C Hamano 2021-09-16 3:33 ` [PATCH 0/2] (experimental) diff --quote-path-with-sp Gwyneth Morgan 2 siblings, 0 replies; 7+ messages in thread From: Junio C Hamano @ 2021-09-15 22:33 UTC (permalink / raw) To: git Reimplement the function as a mere thin wrapper around quote_two_c_style(). Signed-off-by: Junio C Hamano <gitster@pobox.com> --- diff.c | 12 +----------- 1 file changed, 1 insertion(+), 11 deletions(-) diff --git a/diff.c b/diff.c index a8113f1707..8143b737b7 100644 --- a/diff.c +++ b/diff.c @@ -482,19 +482,9 @@ int git_diff_basic_config(const char *var, const char *value, void *cb) static char *quote_two(const char *one, const char *two) { - int need_one = quote_c_style(one, NULL, NULL, CQUOTE_NODQ); - int need_two = quote_c_style(two, NULL, NULL, CQUOTE_NODQ); struct strbuf res = STRBUF_INIT; - if (need_one + need_two) { - strbuf_addch(&res, '"'); - quote_c_style(one, &res, NULL, CQUOTE_NODQ); - quote_c_style(two, &res, NULL, CQUOTE_NODQ); - strbuf_addch(&res, '"'); - } else { - strbuf_addstr(&res, one); - strbuf_addstr(&res, two); - } + quote_two_c_style(&res, one, two, 0); return strbuf_detach(&res, NULL); } -- 2.33.0-649-gd4f4107c2b ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/2] diff: --quote-path-with-sp 2021-09-15 22:33 [PATCH 0/2] (experimental) diff --quote-path-with-sp Junio C Hamano 2021-09-15 22:33 ` [PATCH 1/2] diff: simplify quote_two() Junio C Hamano @ 2021-09-15 22:33 ` Junio C Hamano 2021-09-16 9:06 ` Ævar Arnfjörð Bjarmason 2021-09-16 3:33 ` [PATCH 0/2] (experimental) diff --quote-path-with-sp Gwyneth Morgan 2 siblings, 1 reply; 7+ messages in thread From: Junio C Hamano @ 2021-09-15 22:33 UTC (permalink / raw) To: git Long time ago, we had a discussion with GNU patch/diff maintainer and agreed that pathnames with certain "difficult" bytes needs to be quoted to ensure the resulting patch is machine parseable in an unambiguous way [*1*]. Recently, we saw a report that found that GNU patch is unhappy with our diff output for a path with SP in it [*2*]. Teach "git diff" and friends the "--quote-path-with-sp" option, that encloses a pathname with SP in it inside a pair of double-quotes, even though there is otherwise no byte in the pathname that need to be encoded in the octal. As an earlier parts of t/t3902 (outside the patch context) shows, output from "ls-files", "ls-tree", and "diff --name-only" all follow the same rule to decide paths with what bytes in them need quoting and how they are quoted. This experimental option deliberately refrains from touching these output and affects ONLY the paths that appear in the patch header, i.e. "diff --git", "--- a/path" and "+++ b/path" lines, that GNU patch may care. This is to minimize potential damage this change may cause to tools and scripts the users have been relying on. *1* https://lore.kernel.org/git/87ek6s0w34.fsf@penguin.cs.ucla.edu/ *2* https://lore.kernel.org/git/YR9Iaj%2FFqAyCMade@tilde.club/ Signed-off-by: Junio C Hamano <gitster@pobox.com> --- Documentation/diff-options.txt | 5 +++++ diff.c | 16 ++++++++++------ diff.h | 1 + quote.c | 23 ++++++++++++++++++----- quote.h | 3 ++- t/t3902-quoted.sh | 24 +++++++++++++++++++++++- 6 files changed, 59 insertions(+), 13 deletions(-) diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt index c89d530d3d..dd9b35c806 100644 --- a/Documentation/diff-options.txt +++ b/Documentation/diff-options.txt @@ -507,6 +507,11 @@ ifndef::git-format-patch[] Implies `--patch`. endif::git-format-patch[] +--quote-path-with-sp:: + (experimental) Quote a pathname that has SP in it inside a + pair of double quotes. Such a quoted pathname should be + handled correctly by both Git and "GNU patch". + --abbrev[=<n>]:: Instead of showing the full 40-byte hexadecimal object name in diff-raw format output and diff-tree header diff --git a/diff.c b/diff.c index 8143b737b7..73e42a9e18 100644 --- a/diff.c +++ b/diff.c @@ -480,11 +480,12 @@ int git_diff_basic_config(const char *var, const char *value, void *cb) return git_default_config(var, value, cb); } -static char *quote_two(const char *one, const char *two) +static char *quote_two(const char *one, const char *two, const struct diff_options *o) { struct strbuf res = STRBUF_INIT; + int flags = o->flags.quote_path_with_sp ? CQUOTE_QUOTE_SP : 0; - quote_two_c_style(&res, one, two, 0); + quote_two_c_style(&res, one, two, flags); return strbuf_detach(&res, NULL); } @@ -1784,6 +1785,7 @@ static void emit_rewrite_diff(const char *name_a, size_t size_one, size_two; struct emit_callback ecbdata; struct strbuf out = STRBUF_INIT; + int flags = o->flags.quote_path_with_sp ? CQUOTE_QUOTE_SP : 0; if (diff_mnemonic_prefix && o->flags.reverse_diff) { a_prefix = o->b_prefix; @@ -1798,8 +1800,8 @@ static void emit_rewrite_diff(const char *name_a, strbuf_reset(&a_name); strbuf_reset(&b_name); - quote_two_c_style(&a_name, a_prefix, name_a, 0); - quote_two_c_style(&b_name, b_prefix, name_b, 0); + quote_two_c_style(&a_name, a_prefix, name_a, flags); + quote_two_c_style(&b_name, b_prefix, name_b, flags); size_one = fill_textconv(o->repo, textconv_one, one, &data_one); size_two = fill_textconv(o->repo, textconv_two, two, &data_two); @@ -3449,8 +3451,8 @@ static void builtin_diff(const char *name_a, name_a = DIFF_FILE_VALID(one) ? name_a : name_b; name_b = DIFF_FILE_VALID(two) ? name_b : name_a; - a_one = quote_two(a_prefix, name_a + (*name_a == '/')); - b_two = quote_two(b_prefix, name_b + (*name_b == '/')); + a_one = quote_two(a_prefix, name_a + (*name_a == '/'), o); + b_two = quote_two(b_prefix, name_b + (*name_b == '/'), o); lbl[0] = DIFF_FILE_VALID(one) ? a_one : "/dev/null"; lbl[1] = DIFF_FILE_VALID(two) ? b_two : "/dev/null"; strbuf_addf(&header, "%s%sdiff --git %s %s%s\n", line_prefix, meta, a_one, b_two, reset); @@ -5445,6 +5447,8 @@ static void prep_parse_options(struct diff_options *options) PARSE_OPT_NONEG | PARSE_OPT_NOARG, diff_opt_binary), OPT_BOOL(0, "full-index", &options->flags.full_index, N_("show full pre- and post-image object names on the \"index\" lines")), + OPT_BOOL(0, "quote-path-with-sp", &options->flags.quote_path_with_sp, + N_("quote path with SP in it")), OPT_COLOR_FLAG(0, "color", &options->use_color, N_("show colored diff")), OPT_CALLBACK_F(0, "ws-error-highlight", options, N_("<kind>"), diff --git a/diff.h b/diff.h index 8ba85c5e60..f90f07bb72 100644 --- a/diff.h +++ b/diff.h @@ -198,6 +198,7 @@ struct diff_flags { unsigned suppress_diff_headers; unsigned dual_color_diffed_diffs; unsigned suppress_hunk_header_line_count; + unsigned quote_path_with_sp; }; static inline void diff_flags_or(struct diff_flags *a, diff --git a/quote.c b/quote.c index 8a3a5e39eb..292ad76de5 100644 --- a/quote.c +++ b/quote.c @@ -233,9 +233,11 @@ static inline int cq_must_quote(char c) return cq_lookup[(unsigned char)c] + quote_path_fully > 0; } -/* returns the longest prefix not needing a quote up to maxlen if positive. - This stops at the first \0 because it's marked as a character needing an - escape */ +/* + * Return the longest prefix not needing a quote up to maxlen if positive. + * This stops at the first \0 because it's marked as a character needing an + * escape. + */ static size_t next_quote_pos(const char *s, ssize_t maxlen) { size_t len; @@ -320,11 +322,22 @@ size_t quote_c_style(const char *name, struct strbuf *sb, FILE *fp, unsigned fla return quote_c_style_counted(name, -1, sb, fp, flags); } +/* + * Quote concatenation of prefix and path (nothing in between); when + * even one of the two needs quoting, the whole thing needs to be + * quoted. This is done by concatenating the result of quoting prefix + * and path without surrounding dq next to each other and then enclosing + * the whole thing inside a dq pair if needed. + */ void quote_two_c_style(struct strbuf *sb, const char *prefix, const char *path, unsigned flags) { - int nodq = !!(flags & CQUOTE_NODQ); - if (quote_c_style(prefix, NULL, NULL, 0) || + int nodq = (flags & CQUOTE_NODQ); + int force_dq = ((flags & CQUOTE_QUOTE_SP) && + (strchr(prefix, ' ') || strchr(path, ' '))); + + if (force_dq || + quote_c_style(prefix, NULL, NULL, 0) || quote_c_style(path, NULL, NULL, 0)) { if (!nodq) strbuf_addch(sb, '"'); diff --git a/quote.h b/quote.h index 049d8dd0b3..3e027d28c2 100644 --- a/quote.h +++ b/quote.h @@ -81,7 +81,8 @@ int sq_dequote_to_strvec(char *arg, struct strvec *); int unquote_c_style(struct strbuf *, const char *quoted, const char **endp); /* Bits in the flags parameter to quote_c_style() */ -#define CQUOTE_NODQ 01 +#define CQUOTE_NODQ 01 /* do not enclose the result in dq pair */ +#define CQUOTE_QUOTE_SP 02 /* string with SP needs quoting */ size_t quote_c_style(const char *name, struct strbuf *, FILE *, unsigned); void quote_two_c_style(struct strbuf *, const char *, const char *, unsigned); diff --git a/t/t3902-quoted.sh b/t/t3902-quoted.sh index f528008c36..c76cde435c 100755 --- a/t/t3902-quoted.sh +++ b/t/t3902-quoted.sh @@ -27,7 +27,7 @@ for_each_name () { "$FN$HT$GN" "$FN$LF$GN" "$FN $GN" "$FN$GN" "$FN$DQ$GN" \ "With SP in it" "$FN/file" do - eval "$1" + eval "$1" || return 1 done } @@ -45,6 +45,14 @@ test_expect_success 'setup' ' ' +# With core.quotepath=true (default), bytes with hi-bit set, in addition to +# controls like \n and \t, are written in octal and the path is enclosed in +# a pair of double-quotes. +# +# With core.quotepath=false, the special case for bytes with hi-bit set is +# disabled. +# +# A SP is treated just like any other bytes, nothing special. test_expect_success 'setup expected files' ' cat >expect.quoted <<\EOF && Name @@ -149,4 +157,18 @@ test_expect_success 'check fully quoted output from ls-tree' ' ' +test_expect_success 'diff --quote-path-with-sp' ' + git diff --quote-path-with-sp HEAD^ HEAD -- "With*" >actual && + sed -e "s/Z$//" >expect <<-\EOF && + diff --git "a/With SP in it" "b/With SP in it" + index e79c5e8..e019be0 100644 + --- "a/With SP in it" Z + +++ "b/With SP in it" Z + @@ -1 +1 @@ + -initial + +second + EOF + test_cmp expect actual +' + test_done -- 2.33.0-649-gd4f4107c2b ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] diff: --quote-path-with-sp 2021-09-15 22:33 ` [PATCH 2/2] diff: --quote-path-with-sp Junio C Hamano @ 2021-09-16 9:06 ` Ævar Arnfjörð Bjarmason 2021-09-16 20:59 ` Junio C Hamano 0 siblings, 1 reply; 7+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2021-09-16 9:06 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Paul Eggert, Andreas Gruenbacher, Gwyneth Morgan On Wed, Sep 15 2021, Junio C Hamano wrote: [CC-ing Andreas Gruenbacher who's currently active in GNU patch development, and Paul Eggert at his current address, in case he's curious about this blast from the past. Both: The full context for this proposed change in Git is at https://lore.kernel.org/git/20210915223316.1653443-1-gitster@pobox.com/] > Long time ago, we had a discussion with GNU patch/diff maintainer > and agreed that pathnames with certain "difficult" bytes needs to be > quoted to ensure the resulting patch is machine parseable in an > unambiguous way [*1*]. Recently, we saw a report that found that > GNU patch is unhappy with our diff output for a path with SP in it > [*2*]. It would also be good to mention 4f6fbcdcf96 (Functions to quote and unquote pathnames in C-style., 2005-10-14) here, which is what came out of the [*1*] discussion you're citing, and that also link to the later: https://lore.kernel.org/git/7vll0wvb2a.fsf@assigned-by-dhcp.cox.net/ Where you quote a message of Paul Eggert's that didn't make it into the archive, but which AFAICT accurately summarizes the behavior in 4f6fbcdcf96. Whereas the [*1*] you linked to is still the early proposal of handling UTF-8 specially (not quoting it), which doesn't appear to be what either GNU patch or Git went for in the end (both fully quote some fairly vanilla (also in latin1) UTF-8 when I tested it). It's still not clear to me if what was agreed upon was accurately implemented by Git at the time, but that GNU patch had a bug vis-a-vis the desired discussed behavior, if the bug is Git's, or both etc. Does a fix still need to be made in GNU patch? There's also a mention of busybox's interaction with this behavior in https://lore.kernel.org/git/YUK7Bl9uzNE1YErg@tilde.club/; has anyone (you or Gwyneth) sent them an FYI about this in case they'd like to adjust the behavior of their patch tool? > Teach "git diff" and friends the "--quote-path-with-sp" option, that > encloses a pathname with SP in it inside a pair of double-quotes, > even though there is otherwise no byte in the pathname that need to > be encoded in the octal. > > As an earlier parts of t/t3902 (outside the patch context) shows, > output from "ls-files", "ls-tree", and "diff --name-only" all follow > the same rule to decide paths with what bytes in them need quoting > and how they are quoted. > > This experimental option deliberately refrains from touching these > output and affects ONLY the paths that appear in the patch header, > i.e. "diff --git", "--- a/path" and "+++ b/path" lines, that GNU > patch may care. This is to minimize potential damage this change > may cause to tools and scripts the users have been relying on. > > *1* https://lore.kernel.org/git/87ek6s0w34.fsf@penguin.cs.ucla.edu/ > *2* https://lore.kernel.org/git/YR9Iaj%2FFqAyCMade@tilde.club/ > [...patch omitted...] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] diff: --quote-path-with-sp 2021-09-16 9:06 ` Ævar Arnfjörð Bjarmason @ 2021-09-16 20:59 ` Junio C Hamano 0 siblings, 0 replies; 7+ messages in thread From: Junio C Hamano @ 2021-09-16 20:59 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: git, Paul Eggert, Andreas Gruenbacher, Gwyneth Morgan Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > On Wed, Sep 15 2021, Junio C Hamano wrote: > > [CC-ing Andreas Gruenbacher who's currently active in GNU patch > development, and Paul Eggert at his current address, in case he's > curious about this blast from the past. > > Both: The full context for this proposed change in Git is at > https://lore.kernel.org/git/20210915223316.1653443-1-gitster@pobox.com/] > >> Long time ago, we had a discussion with GNU patch/diff maintainer >> and agreed that pathnames with certain "difficult" bytes needs to be >> quoted to ensure the resulting patch is machine parseable in an >> unambiguous way [*1*]. Recently, we saw a report that found that >> GNU patch is unhappy with our diff output for a path with SP in it >> [*2*]. > > It would also be good to mention 4f6fbcdcf96 ... Ah, I didn't mean "this is the single message to understand the issue". The reference was meant as "you need to read the discussion that starts around here", which you apparently did ;-) There is another message that is much more relevant that does not appear in the thread (and I do not know its message ID), where Linus came up with a way to parse the "diff --git a/... b/..." line, or the first "--- a/..." and "+++ b/..." lines, and learn the pathnames on the preimage side and on the postimage side unambiguously. I think that discovery led to the current implementation. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 0/2] (experimental) diff --quote-path-with-sp 2021-09-15 22:33 [PATCH 0/2] (experimental) diff --quote-path-with-sp Junio C Hamano 2021-09-15 22:33 ` [PATCH 1/2] diff: simplify quote_two() Junio C Hamano 2021-09-15 22:33 ` [PATCH 2/2] diff: --quote-path-with-sp Junio C Hamano @ 2021-09-16 3:33 ` Gwyneth Morgan 2021-09-16 5:31 ` Junio C Hamano 2 siblings, 1 reply; 7+ messages in thread From: Gwyneth Morgan @ 2021-09-16 3:33 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On 2021-09-15 15:33:14-0700, Junio C Hamano wrote: > Long time ago, we had a discussion with GNU patch/diff maintainer > and agreed that pathnames with certain "difficult" bytes needs to be > quoted to ensure the resulting patch is machine parseable in an > unambiguous way [*1*]. Recently, we saw a report that found that > GNU patch is unhappy with our diff output for a path with SP in it > [*2*]. > > With this experimental option, the beginning part of the patch > output will have pathnames with SP in them enclosed inside a pair of > double quotes, like so: > > diff --git "a/A Name" "b/A Name" > --- "a/A Name" > +++ "b/A Name" I believe GNU patch is fine with unquoted spaces in the "--- a/path" and "+++ b/path", and only has an issue with unquoted spaces in the "diff --git" line. busybox patch does seem to have an issue with quoted filenames in the "---" and "+++" lines but is fine if those lines are unquoted. Maybe we could leave spaces unquoted in those lines, only quoted if there's some other character that needs it. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 0/2] (experimental) diff --quote-path-with-sp 2021-09-16 3:33 ` [PATCH 0/2] (experimental) diff --quote-path-with-sp Gwyneth Morgan @ 2021-09-16 5:31 ` Junio C Hamano 0 siblings, 0 replies; 7+ messages in thread From: Junio C Hamano @ 2021-09-16 5:31 UTC (permalink / raw) To: Gwyneth Morgan; +Cc: git Gwyneth Morgan <gwymor@tilde.club> writes: > On 2021-09-15 15:33:14-0700, Junio C Hamano wrote: >> Long time ago, we had a discussion with GNU patch/diff maintainer >> and agreed that pathnames with certain "difficult" bytes needs to be >> quoted to ensure the resulting patch is machine parseable in an >> unambiguous way [*1*]. Recently, we saw a report that found that >> GNU patch is unhappy with our diff output for a path with SP in it >> [*2*]. >> >> With this experimental option, the beginning part of the patch >> output will have pathnames with SP in them enclosed inside a pair of >> double quotes, like so: >> >> diff --git "a/A Name" "b/A Name" >> --- "a/A Name" >> +++ "b/A Name" > > I believe GNU patch is fine with unquoted spaces in the "--- a/path" > and "+++ b/path", and only has an issue with unquoted spaces in the > "diff --git" line. busybox patch does seem to have an issue with quoted > filenames in the "---" and "+++" lines but is fine if those lines are > unquoted. Maybe we could leave spaces unquoted in those lines, only > quoted if there's some other character that needs it. Well, if they are so inconsistent, perhaps we shouldn't bend over backwards to cater to them, I would have to say. I do not think we want to break the convention that these a/ & b/ labeled paths are spelled exactly the same way, only to please external tools. Thanks for a quick report. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2021-09-16 20:59 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-09-15 22:33 [PATCH 0/2] (experimental) diff --quote-path-with-sp Junio C Hamano 2021-09-15 22:33 ` [PATCH 1/2] diff: simplify quote_two() Junio C Hamano 2021-09-15 22:33 ` [PATCH 2/2] diff: --quote-path-with-sp Junio C Hamano 2021-09-16 9:06 ` Ævar Arnfjörð Bjarmason 2021-09-16 20:59 ` Junio C Hamano 2021-09-16 3:33 ` [PATCH 0/2] (experimental) diff --quote-path-with-sp Gwyneth Morgan 2021-09-16 5:31 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).