git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* git 2.40.1 tree-diff crashes with (glob) magic on pathspecs
@ 2023-05-27 13:53 Jim Pryor
  2023-05-27 17:39 ` Jeff King
  0 siblings, 1 reply; 12+ messages in thread
From: Jim Pryor @ 2023-05-27 13:53 UTC (permalink / raw)
  To: git

Thank you for filling out a Git bug report!
Please answer the following questions to help us understand your issue.

What did you do before the bug happened? (Steps to reproduce your issue)

$ mkdir test2
$ cd test2
$ git init
Initialized empty Git repository in /Users/jim/repo/test2/.git/
$ mkdir -p foo/bar
$ touch foo/bar/test
$ git add foo/bar/test
$ git commit -m "first commit" -u
[master (root-commit) c6eb1c1] first commit
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 foo/bar/test
$ git mv foo/bar/test foo/bar/test2
$ git commit -m "second commit" -u
[master 2bacc0c] second commit
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename foo/bar/{test => test2} (100%)
$ echo "contents" > foo/bar/test2
$ git commit -m "third commit" -a
[master 13a1b11] third commit
 1 file changed, 1 insertion(+)

$ git --no-pager log --oneline -- ':(glob)**/test2'

What did you expect to happen? (Expected behavior)

13a1b11 (HEAD -> master) third commit
2bacc0c second commit
c6eb1c1 first commit

What happened instead? (Actual behavior)

13a1b11 (HEAD -> master) third commit
BUG: tree-diff.c:596: unsupported magic 8
error: git died of signal 6

What's different between what you expected and what actually happened?

git tree-diff crashed

Anything else you want to add:

* Also get the crashes with ':(glob)*/bar/test2' and ':(glob)foo/*/test2'.
* The `git mv` does not seem to be necessary to reproduce.

Please review the rest of the bug report below.
You can delete any lines you don't wish to share.


[System Info]
git version:
git version 2.40.1
cpu: x86_64
no commit associated with this build
sizeof-long: 8
sizeof-size_t: 8
shell-path: /bin/sh
feature: fsmonitor--daemon
uname: Darwin 19.6.0 Darwin Kernel Version 19.6.0: Tue Jun 21 21:18:39 PDT 2022; root:xnu-6153.141.66~1/RELEASE_X86_64 x86_64
compiler info: clang: 12.0.0 (clang-1200.0.32.29)
libc info: no libc information available
$SHELL (typically, interactive shell): /bin/bash


[Enabled Hooks]

-- 
dubiousjim@gmail.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: git 2.40.1 tree-diff crashes with (glob) magic on pathspecs
  2023-05-27 13:53 git 2.40.1 tree-diff crashes with (glob) magic on pathspecs Jim Pryor
@ 2023-05-27 17:39 ` Jeff King
  2023-05-27 19:00   ` Jim Pryor
  2023-06-01  5:38   ` git 2.40.1 tree-diff crashes with (glob) magic on pathspecs Junio C Hamano
  0 siblings, 2 replies; 12+ messages in thread
From: Jeff King @ 2023-05-27 17:39 UTC (permalink / raw)
  To: Jim Pryor; +Cc: git

On Sat, May 27, 2023 at 03:53:31PM +0200, Jim Pryor wrote:

> $ git --no-pager log --oneline -- ':(glob)**/test2'
> 
> What did you expect to happen? (Expected behavior)
> 
> 13a1b11 (HEAD -> master) third commit
> 2bacc0c second commit
> c6eb1c1 first commit
> 
> What happened instead? (Actual behavior)
> 
> 13a1b11 (HEAD -> master) third commit
> BUG: tree-diff.c:596: unsupported magic 8
> error: git died of signal 6

Thanks for your report. I had trouble reproducing at first, but there is
one extra twist. By default, the command you gave produces:

  $ git --no-pager log --oneline -- ':(glob)**/test2'
  48eab09 (HEAD -> main) third commit
  89a47e5 second commit

which makes sense; we did not ask about "test", which is the path the
original commit touched. So I guessed you might have log.follow set, and
indeed:

  $ git --no-pager log --oneline --follow -- ':(glob)**/test2'
  48eab09 (HEAD -> main) third commit
  BUG: tree-diff.c:596: unsupported magic 8
  Aborted

I _think_ in this case that ":(glob)" is not actually doing much. It is
already the default that we will glob pathspecs; it is only needed to
override "git --literal" (which changes the default to not glob).

But this also seems to be wading into a spot that is not actually
supported by --follow. The code near where the BUG() is triggered looks
like this:

static void try_to_follow_renames([...])
{
[...]
        /*
         * follow-rename code is very specific, we need exactly one
         * path. Magic that matches more than one path is not
         * supported.
         */
        GUARD_PATHSPEC(&opt->pathspec, PATHSPEC_FROMTOP | PATHSPEC_LITERAL);
#if 0
        /*
         * We should reject wildcards as well. Unfortunately we
         * haven't got a reliable way to detect that 'foo\*bar' in
         * fact has no wildcards. nowildcard_len is merely a hint for
         * optimization. Let it slip for now until wildmatch is taught
         * about dry-run mode and returns wildcard info.
         */
        if (opt->pathspec.has_wildcard)
                BUG("wildcards are not supported");
#endif

So follow-mode does not actually work with wildcards, but we err on the
side of not rejecting them outright. In that sense, the simplest "fix"
here would be to allow :(glob) to match the '#if 0' section, like:

diff --git a/tree-diff.c b/tree-diff.c
index 69031d7cba..d287079253 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -593,7 +593,8 @@ static void try_to_follow_renames(const struct object_id *old_oid,
 	 * path. Magic that matches more than one path is not
 	 * supported.
 	 */
-	GUARD_PATHSPEC(&opt->pathspec, PATHSPEC_FROMTOP | PATHSPEC_LITERAL);
+	GUARD_PATHSPEC(&opt->pathspec,
+		       PATHSPEC_FROMTOP | PATHSPEC_LITERAL | PATHSPEC_GLOB);
 #if 0
 	/*
 	 * We should reject wildcards as well. Unfortunately we

That would avoid the BUG() and make things work consistently. But it
still would not produce the output you expect, because --follow simply
does not work with wildcard pathspecs. And fixing that would presumably
be a much bigger change (I didn't dig into it).

Complicating this slightly is the fact that you didn't explicitly ask
for --follow, but just had log.follow set. I think we automatically
disable log.follow when there is more than one pathspec, and in theory
we ought to do the same for actual wildcards. But maybe it would be
enough for follow-mode to kick in, but to end up mostly ignored (which
is what the patch above would do).

(I say "mostly" because I suspect you could construct a history where
the glob matched a literal filename, too, and the follow code gets
confused. But outside of intentionally-constructed pathological cases,
that seems unlikely).

-Peff

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: git 2.40.1 tree-diff crashes with (glob) magic on pathspecs
  2023-05-27 17:39 ` Jeff King
@ 2023-05-27 19:00   ` Jim Pryor
  2023-06-01 17:37     ` [PATCH 0/3] handling pathspec magic with --follow Jeff King
  2023-06-01  5:38   ` git 2.40.1 tree-diff crashes with (glob) magic on pathspecs Junio C Hamano
  1 sibling, 1 reply; 12+ messages in thread
From: Jim Pryor @ 2023-05-27 19:00 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On Sat, May 27, 2023, at 7:39 PM, Jeff King wrote:
> On Sat, May 27, 2023 at 03:53:31PM +0200, Jim Pryor wrote:
>
>> $ git --no-pager log --oneline -- ':(glob)**/test2'
>> 
>> What did you expect to happen? (Expected behavior)
>> 
>> 13a1b11 (HEAD -> master) third commit
>> 2bacc0c second commit
>> c6eb1c1 first commit
>> 
>> What happened instead? (Actual behavior)
>> 
>> 13a1b11 (HEAD -> master) third commit
>> BUG: tree-diff.c:596: unsupported magic 8
>> error: git died of signal 6
>
> Thanks for your report. I had trouble reproducing at first, but there is
> one extra twist. By default, the command you gave produces:
>
>   $ git --no-pager log --oneline -- ':(glob)**/test2'
>   48eab09 (HEAD -> main) third commit
>   89a47e5 second commit
>
> which makes sense; we did not ask about "test", which is the path the
> original commit touched. So I guessed you might have log.follow set, and
> indeed:
>
>   $ git --no-pager log --oneline --follow -- ':(glob)**/test2'
>   48eab09 (HEAD -> main) third commit
>   BUG: tree-diff.c:596: unsupported magic 8
>   Aborted
>
> I _think_ in this case that ":(glob)" is not actually doing much. It is
> already the default that we will glob pathspecs; it is only needed to
> override "git --literal" (which changes the default to not glob).
>
> But this also seems to be wading into a spot that is not actually
> supported by --follow. The code near where the BUG() is triggered looks
> like this:
>
> static void try_to_follow_renames([...])
> {
> [...]
>         /*
>          * follow-rename code is very specific, we need exactly one
>          * path. Magic that matches more than one path is not
>          * supported.
>          */
>         GUARD_PATHSPEC(&opt->pathspec, PATHSPEC_FROMTOP | PATHSPEC_LITERAL);
> #if 0
>         /*
>          * We should reject wildcards as well. Unfortunately we
>          * haven't got a reliable way to detect that 'foo\*bar' in
>          * fact has no wildcards. nowildcard_len is merely a hint for
>          * optimization. Let it slip for now until wildmatch is taught
>          * about dry-run mode and returns wildcard info.
>          */
>         if (opt->pathspec.has_wildcard)
>                 BUG("wildcards are not supported");
> #endif
>
> So follow-mode does not actually work with wildcards, but we err on the
> side of not rejecting them outright. In that sense, the simplest "fix"
> here would be to allow :(glob) to match the '#if 0' section, like:
>
> diff --git a/tree-diff.c b/tree-diff.c
> index 69031d7cba..d287079253 100644
> --- a/tree-diff.c
> +++ b/tree-diff.c
> @@ -593,7 +593,8 @@ static void try_to_follow_renames(const struct 
> object_id *old_oid,
>  	 * path. Magic that matches more than one path is not
>  	 * supported.
>  	 */
> -	GUARD_PATHSPEC(&opt->pathspec, PATHSPEC_FROMTOP | PATHSPEC_LITERAL);
> +	GUARD_PATHSPEC(&opt->pathspec,
> +		       PATHSPEC_FROMTOP | PATHSPEC_LITERAL | PATHSPEC_GLOB);
>  #if 0
>  	/*
>  	 * We should reject wildcards as well. Unfortunately we
>
> That would avoid the BUG() and make things work consistently. But it
> still would not produce the output you expect, because --follow simply
> does not work with wildcard pathspecs. And fixing that would presumably
> be a much bigger change (I didn't dig into it).
>
> Complicating this slightly is the fact that you didn't explicitly ask
> for --follow, but just had log.follow set. I think we automatically
> disable log.follow when there is more than one pathspec, and in theory
> we ought to do the same for actual wildcards. But maybe it would be
> enough for follow-mode to kick in, but to end up mostly ignored (which
> is what the patch above would do).
>
> (I say "mostly" because I suspect you could construct a history where
> the glob matched a literal filename, too, and the follow code gets
> confused. But outside of intentionally-constructed pathological cases,
> that seems unlikely).
>
> -Peff

Confirmed I have log.follow = true, and also GIT_NOGLOB_PATHSPECS set, thus the explicit `(glob)` magic.

I'd be happy with --follow not working with the wildcard pathspecs, but with the crash avoided.

-- 
dubiousjim@gmail.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: git 2.40.1 tree-diff crashes with (glob) magic on pathspecs
  2023-05-27 17:39 ` Jeff King
  2023-05-27 19:00   ` Jim Pryor
@ 2023-06-01  5:38   ` Junio C Hamano
  2023-06-01 13:30     ` Jeff King
  1 sibling, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2023-06-01  5:38 UTC (permalink / raw)
  To: Jeff King; +Cc: Jim Pryor, git

Jeff King <peff@peff.net> writes:

>> 13a1b11 (HEAD -> master) third commit
>> BUG: tree-diff.c:596: unsupported magic 8
>> error: git died of signal 6
> ...
> Thanks for your report. I had trouble reproducing at first, but there is
> one extra twist. By default, the command you gave produces:
> But this also seems to be wading into a spot that is not actually
> supported by --follow. The code near where the BUG() is triggered looks
> like this:
>
> static void try_to_follow_renames([...])
> {
> [...]
>         /*
>          * follow-rename code is very specific, we need exactly one
>          * path. Magic that matches more than one path is not
>          * supported.
>          */
>         GUARD_PATHSPEC(&opt->pathspec, PATHSPEC_FROMTOP | PATHSPEC_LITERAL);
> #if 0
>         /*
>          * We should reject wildcards as well. Unfortunately we
>          * haven't got a reliable way to detect that 'foo\*bar' in
>          * fact has no wildcards. nowildcard_len is merely a hint for
>          * optimization. Let it slip for now until wildmatch is taught
>          * about dry-run mode and returns wildcard info.
>          */
>         if (opt->pathspec.has_wildcard)
>                 BUG("wildcards are not supported");
> #endif
>
> So follow-mode does not actually work with wildcards, but we err on the
> side of not rejecting them outright. In that sense, the simplest "fix"
> here would be to allow :(glob) to match the '#if 0' section, like:

Is it "fix" or widening the wound, though?  The runtime BUG is very
unpleasant to see, but silently giving a possibly wrong result would
be even worse, I am afraid.

> That would avoid the BUG() and make things work consistently. But it
> still would not produce the output you expect, because --follow simply
> does not work with wildcard pathspecs. And fixing that would presumably
> be a much bigger change (I didn't dig into it).

True.  And wildcard is not the only thing it is broken for; the
biggest one is that it only follows one path at the time across all
the traversal paths, which has interesting implications like it can
give inconsistent result depending on the traversal order in a mergy
history.

If somebody cares about the "--follow" mode very deeply, the "upon
finding that the path disappears, run the rename detection with the
parent and switch the (single) path to follow to the old path in the
parent" logic must be updated to keep track of these pathspecs per
traversal paths.  If false positives are allowed, an approximation
that may be easier to implement is to add paths to the set of paths
to be followed every time such a rename is found.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: git 2.40.1 tree-diff crashes with (glob) magic on pathspecs
  2023-06-01  5:38   ` git 2.40.1 tree-diff crashes with (glob) magic on pathspecs Junio C Hamano
@ 2023-06-01 13:30     ` Jeff King
  0 siblings, 0 replies; 12+ messages in thread
From: Jeff King @ 2023-06-01 13:30 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jim Pryor, git

On Thu, Jun 01, 2023 at 02:38:05PM +0900, Junio C Hamano wrote:

> > So follow-mode does not actually work with wildcards, but we err on the
> > side of not rejecting them outright. In that sense, the simplest "fix"
> > here would be to allow :(glob) to match the '#if 0' section, like:
> 
> Is it "fix" or widening the wound, though?  The runtime BUG is very
> unpleasant to see, but silently giving a possibly wrong result would
> be even worse, I am afraid.

I think it's the same wrong result we'd give in other circumstances,
which in some ways makes it orthogonal. I.e., this is already wrong:

  git log --follow 'foo*'

Doing:

  git --noglob-pathspecs --follow 'foo*'

makes it right (assuming you really have a literal '*' in your
pathname). And then doing:

  git --noglob-pathspecs --follow ':(glob)foo*'

makes it wrong again, but in exactly the same way as the first case.
Which is why I said "orthogonal" above, because it is really just
twiddling options to cancel each other out and reach the same broken
state. ;)

All of that said, I do think the patch I showed earlier is pretty ugly,
and it is not even fixing all of the problems (you can trigger the same
BUG() for ":(icase)", etc). I have a series which moves us in a better
direction, but I need to put a few finishing touches on it. I'll
hopefully send it later today.

> If somebody cares about the "--follow" mode very deeply, the "upon
> finding that the path disappears, run the rename detection with the
> parent and switch the (single) path to follow to the old path in the
> parent" logic must be updated to keep track of these pathspecs per
> traversal paths.  If false positives are allowed, an approximation
> that may be easier to implement is to add paths to the set of paths
> to be followed every time such a rename is found.

Yes, it would be nice to fix all of these --follow bugs once and for
all. But that's a pretty big task. I think in the meantime, it is not
too hard to make Jim's case behave more sensibly.

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 0/3] handling pathspec magic with --follow
  2023-05-27 19:00   ` Jim Pryor
@ 2023-06-01 17:37     ` Jeff King
  2023-06-01 17:38       ` [PATCH 1/3] pathspec: factor out magic-to-name function Jeff King
                         ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Jeff King @ 2023-06-01 17:37 UTC (permalink / raw)
  To: Jim Pryor; +Cc: Junio C Hamano, git

On Sat, May 27, 2023 at 09:00:27PM +0200, Jim Pryor wrote:

> Confirmed I have log.follow = true, and also GIT_NOGLOB_PATHSPECS set,
> thus the explicit `(glob)` magic.

OK, all of that makes sense. The same problem hits ":(icase)" and other
pathspec magic, too. So talking about :(glob) too much is kind of a red
herring, anyway.

> I'd be happy with --follow not working with the wildcard pathspecs,
> but with the crash avoided.

Here's a series that I think should improve things for you. For an
explicit --follow, it finds the forbidden magic sooner (which gives us a
nice error message instead of the BUG()), and for log.follow, it avoids
follow-mode when forbidden pathspecs are used rather than enabling it
and then complaining.

The interesting bits are in the third patch. The other two are
preparatory.

  [1/3]: pathspec: factor out magic-to-name function
  [2/3]: diff: factor out --follow pathspec check
  [3/3]: diff: detect pathspec magic not supported by --follow

 builtin/log.c  |  2 +-
 diff.c         | 29 +++++++++++++++++++++++++++--
 diff.h         |  7 +++++++
 pathspec.c     | 19 ++++++++++++-------
 pathspec.h     |  8 ++++++++
 t/t4202-log.sh | 15 +++++++++++++++
 6 files changed, 70 insertions(+), 10 deletions(-)


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/3] pathspec: factor out magic-to-name function
  2023-06-01 17:37     ` [PATCH 0/3] handling pathspec magic with --follow Jeff King
@ 2023-06-01 17:38       ` Jeff King
  2023-06-01 17:41       ` [PATCH 2/3] diff: factor out --follow pathspec check Jeff King
  2023-06-01 17:43       ` [PATCH 3/3] diff: detect pathspec magic not supported by --follow Jeff King
  2 siblings, 0 replies; 12+ messages in thread
From: Jeff King @ 2023-06-01 17:38 UTC (permalink / raw)
  To: Jim Pryor; +Cc: Junio C Hamano, git

When we have unsupported magic in a pathspec (because a command or code
path does not support particular items), we list the unsupported ones in
an error message.

Let's factor out the code here that converts the bits back into their
human-readable names, so that it can be used from other callers, which
may want to provide more flexible error messages.

Signed-off-by: Jeff King <peff@peff.net>
---
 pathspec.c | 19 ++++++++++++-------
 pathspec.h |  8 ++++++++
 2 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/pathspec.c b/pathspec.c
index 6966b265d3..5049dbb528 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -531,24 +531,29 @@ static int pathspec_item_cmp(const void *a_, const void *b_)
 	return strcmp(a->match, b->match);
 }
 
-static void NORETURN unsupported_magic(const char *pattern,
-				       unsigned magic)
+void pathspec_magic_names(unsigned magic, struct strbuf *out)
 {
-	struct strbuf sb = STRBUF_INIT;
 	int i;
 	for (i = 0; i < ARRAY_SIZE(pathspec_magic); i++) {
 		const struct pathspec_magic *m = pathspec_magic + i;
 		if (!(magic & m->bit))
 			continue;
-		if (sb.len)
-			strbuf_addstr(&sb, ", ");
+		if (out->len)
+			strbuf_addstr(out, ", ");
 
 		if (m->mnemonic)
-			strbuf_addf(&sb, _("'%s' (mnemonic: '%c')"),
+			strbuf_addf(out, _("'%s' (mnemonic: '%c')"),
 				    m->name, m->mnemonic);
 		else
-			strbuf_addf(&sb, "'%s'", m->name);
+			strbuf_addf(out, "'%s'", m->name);
 	}
+}
+
+static void NORETURN unsupported_magic(const char *pattern,
+				       unsigned magic)
+{
+	struct strbuf sb = STRBUF_INIT;
+	pathspec_magic_names(magic, &sb);
 	/*
 	 * We may want to substitute "this command" with a command
 	 * name. E.g. when "git add -p" or "git add -i" dies when running
diff --git a/pathspec.h b/pathspec.h
index a5b38e0907..fec4399bbc 100644
--- a/pathspec.h
+++ b/pathspec.h
@@ -130,6 +130,14 @@ void parse_pathspec_file(struct pathspec *pathspec,
 void copy_pathspec(struct pathspec *dst, const struct pathspec *src);
 void clear_pathspec(struct pathspec *);
 
+/*
+ * Add a human-readable string to "out" representing the PATHSPEC_* flags set
+ * in "magic". The result is suitable for error messages, but not for
+ * parsing as pathspec magic itself (you get 'icase' with quotes, not
+ * :(icase)).
+ */
+void pathspec_magic_names(unsigned magic, struct strbuf *out);
+
 static inline int ps_strncmp(const struct pathspec_item *item,
 			     const char *s1, const char *s2, size_t n)
 {
-- 
2.41.0.346.g8d12207a4f


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/3] diff: factor out --follow pathspec check
  2023-06-01 17:37     ` [PATCH 0/3] handling pathspec magic with --follow Jeff King
  2023-06-01 17:38       ` [PATCH 1/3] pathspec: factor out magic-to-name function Jeff King
@ 2023-06-01 17:41       ` Jeff King
  2023-06-01 17:43       ` [PATCH 3/3] diff: detect pathspec magic not supported by --follow Jeff King
  2 siblings, 0 replies; 12+ messages in thread
From: Jeff King @ 2023-06-01 17:41 UTC (permalink / raw)
  To: Jim Pryor; +Cc: Junio C Hamano, git

In --follow mode, we require exactly one pathspec. We check this
condition in two places:

  - in diff_setup_done(), we complain if --follow is used with an
    inapropriate pathspec

  - in git-log's revision "tweak" function, we enable log.follow only if
    the pathspec allows it

The duplication isn't a big deal right now, since the logic is so
simple. But in preparation for it becoming more complex, let's pull it
into a shared function.

Signed-off-by: Jeff King <peff@peff.net>
---
I don't love the die_on_error pattern here, but the obvious alternative
of having the caller die when the boolean is false means it won't be
able to give as descriptive a message. And I didn't think it was worth
getting into passing out an err strbuf, given that there are only these
two callers.

 builtin/log.c |  2 +-
 diff.c        | 14 ++++++++++++--
 diff.h        |  7 +++++++
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/builtin/log.c b/builtin/log.c
index 676de107d6..83d9b69dba 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -866,7 +866,7 @@ static void log_setup_revisions_tweak(struct rev_info *rev,
 				      struct setup_revision_opt *opt)
 {
 	if (rev->diffopt.flags.default_follow_renames &&
-	    rev->prune_data.nr == 1)
+	    diff_check_follow_pathspec(&rev->prune_data, 0))
 		rev->diffopt.flags.follow_renames = 1;
 
 	if (rev->first_parent_only)
diff --git a/diff.c b/diff.c
index 3c88c37908..9f548f3471 100644
--- a/diff.c
+++ b/diff.c
@@ -4751,6 +4751,16 @@ unsigned diff_filter_bit(char status)
 	return filter_bit[(int) status];
 }
 
+int diff_check_follow_pathspec(struct pathspec *ps, int die_on_error)
+{
+	if (ps->nr != 1) {
+		if (die_on_error)
+			die(_("--follow requires exactly one pathspec"));
+		return 0;
+	}
+	return 1;
+}
+
 void diff_setup_done(struct diff_options *options)
 {
 	unsigned check_mask = DIFF_FORMAT_NAME |
@@ -4858,8 +4868,8 @@ void diff_setup_done(struct diff_options *options)
 
 	options->diff_path_counter = 0;
 
-	if (options->flags.follow_renames && options->pathspec.nr != 1)
-		die(_("--follow requires exactly one pathspec"));
+	if (options->flags.follow_renames)
+		diff_check_follow_pathspec(&options->pathspec, 1);
 
 	if (!options->use_color || external_diff())
 		options->color_moved = 0;
diff --git a/diff.h b/diff.h
index 3a7a9e8b88..6c10ce289d 100644
--- a/diff.h
+++ b/diff.h
@@ -539,6 +539,13 @@ void repo_diff_setup(struct repository *, struct diff_options *);
 struct option *add_diff_options(const struct option *, struct diff_options *);
 int diff_opt_parse(struct diff_options *, const char **, int, const char *);
 void diff_setup_done(struct diff_options *);
+
+/*
+ * Returns true if the pathspec can work with --follow mode. If die_on_error is
+ * set, die() with a specific error message rather than returning false.
+ */
+int diff_check_follow_pathspec(struct pathspec *ps, int die_on_error);
+
 int git_config_rename(const char *var, const char *value);
 
 #define DIFF_DETECT_RENAME	1
-- 
2.41.0.346.g8d12207a4f


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/3] diff: detect pathspec magic not supported by --follow
  2023-06-01 17:37     ` [PATCH 0/3] handling pathspec magic with --follow Jeff King
  2023-06-01 17:38       ` [PATCH 1/3] pathspec: factor out magic-to-name function Jeff King
  2023-06-01 17:41       ` [PATCH 2/3] diff: factor out --follow pathspec check Jeff King
@ 2023-06-01 17:43       ` Jeff King
  2023-06-02  7:27         ` Junio C Hamano
  2 siblings, 1 reply; 12+ messages in thread
From: Jeff King @ 2023-06-01 17:43 UTC (permalink / raw)
  To: Jim Pryor; +Cc: Junio C Hamano, git

The --follow code doesn't handle most forms of pathspec magic. We check
that no unexpected ones have made it to try_to_follow_renames() with a
runtime GUARD_PATHSPEC() check, which gives behavior like this:

  $ git log --follow ':(icase)makefile' >/dev/null
  BUG: tree-diff.c:596: unsupported magic 10
  Aborted

The same is true of ":(glob)", ":(attr)", and so on. It's good that we
notice the problem rather than continuing and producing a wrong answer.
But there are two non-ideal things:

  1. The idea of GUARD_PATHSPEC() is to catch programming errors where
     low-level code gets unexpected pathspecs. We'd usually try to catch
     unsupported pathspecs by passing a magic_mask to parse_pathspec(),
     which would give the user a much better message like:

       pathspec magic not supported by this command: 'icase'

     That doesn't happen here because git-log usually _does_ support
     all types of pathspec magic, and so it passes "0" for the mask
     (this call actually happens in setup_revisions()). It needs to
     distinguish the normal case from the "--follow" one but currently
     doesn't.

  2. In addition to --follow, we have the log.follow config option. When
     that is set, we try to turn on --follow mode only when there is a
     single pathspec (since --follow doesn't handle anything else). But
     really, that ought to be expanded to "use --follow when the
     pathspec supports it". Otherwise, we'd complain any time you use an
     exotic pathspec:

       $ git config log.follow true
       $ git log ':(icase)makefile' >/dev/null
       BUG: tree-diff.c:596: unsupported magic 10
       Aborted

     We should instead just avoid enabling follow mode if it's not
     supported by this particular invocation.

This patch expands our diff_check_follow_pathspec() function to cover
pathspec magic, solving both problems.

A few final notes:

  - we could also solve (1) by passing the appropriate mask to
    parse_pathspec(). But that's not great for two reasons. One is that
    the error message is less precise. It says "magic not supported by
    this command", but really it is not the command, but rather the
    --follow option which is the problem. The second is that it always
    calls die(). But for our log.follow code, we want to speculatively
    ask "is this pathspec OK?" and just get a boolean result.

  - This is obviously the right thing to do for ':(icase)' and most
    other magic options. But ':(glob)' is a bit odd here. The --follow
    code doesn't support wildcards, but we allow them anyway. From
    try_to_follow_renames():

	#if 0
	        /*
	         * We should reject wildcards as well. Unfortunately we
	         * haven't got a reliable way to detect that 'foo\*bar' in
	         * fact has no wildcards. nowildcard_len is merely a hint for
	         * optimization. Let it slip for now until wildmatch is taught
	         * about dry-run mode and returns wildcard info.
	         */
	        if (opt->pathspec.has_wildcard)
	                BUG("wildcards are not supported");
	#endif

    So something like "git log --follow 'Make*'" is already doing the
    wrong thing, since ":(glob)" behavior is already the default (it is
    used only to countermand an earlier --noglob-pathspecs).

    So we _could_ loosen the guard to allow :(glob), since it just
    behaves the same as pathspecs do by default. But it seems like a
    backwards step to do so. It already doesn't work (it hits the BUG()
    case currently), and given that the user took an explicit step to
    say "this pathspec should glob", it is reasonable for us to say "no,
    --follow does not support globbing" (or in the case of log.follow,
    avoid turning on follow mode). Which is what happens after this
    patch.

  - The set of allowed pathspec magic is obviously the same as in
    GUARD_PATHSPEC(). We could perhaps factor these out to avoid
    repetition. The point of having separate masks and GUARD calls is
    that we don't necessarily know which parsed pathspecs will be used
    where. But in this case, the two are heavily correlated. Still,
    there may be some value in keeping them separate; it would make
    anyone think twice about adding new magic to the list in
    diff_check_follow_pathspec(). They'd need to touch
    try_to_follow_renames() as well, which is the code that would
    actually need to be updated to handle more exotic pathspecs.

  - The documentation for log.follow says that it enables --follow
    "...when a single <path> is given". We could possibly expand that to
    say "with no unsupported pathspec magic", but that raises the
    question of documenting which magic is supported. I think the
    existing wording of "single <path>" sufficiently encompasses the
    idea (the forbidden magic is stuff that might match multiple
    entries), and the spirit remains the same.

Reported-by: Jim Pryor <dubiousjim@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
---
 diff.c         | 15 +++++++++++++++
 t/t4202-log.sh | 15 +++++++++++++++
 2 files changed, 30 insertions(+)

diff --git a/diff.c b/diff.c
index 9f548f3471..8ca16fefe7 100644
--- a/diff.c
+++ b/diff.c
@@ -4753,11 +4753,26 @@ unsigned diff_filter_bit(char status)
 
 int diff_check_follow_pathspec(struct pathspec *ps, int die_on_error)
 {
+	unsigned forbidden_magic;
+
 	if (ps->nr != 1) {
 		if (die_on_error)
 			die(_("--follow requires exactly one pathspec"));
 		return 0;
 	}
+
+	forbidden_magic = ps->items[0].magic & ~(PATHSPEC_FROMTOP |
+						 PATHSPEC_LITERAL);
+	if (forbidden_magic) {
+		if (die_on_error) {
+			struct strbuf sb = STRBUF_INIT;
+			pathspec_magic_names(forbidden_magic, &sb);
+			die(_("pathspec magic not supported by --follow: %s"),
+			    sb.buf);
+		}
+		return 0;
+	}
+
 	return 1;
 }
 
diff --git a/t/t4202-log.sh b/t/t4202-log.sh
index ae73aef922..5b54d7928e 100755
--- a/t/t4202-log.sh
+++ b/t/t4202-log.sh
@@ -187,6 +187,21 @@ test_expect_success 'git config log.follow does not die with no paths' '
 	git log --
 '
 
+test_expect_success 'git log --follow rejects unsupported pathspec magic' '
+	test_must_fail git log --follow ":(top,glob,icase)ichi" 2>stderr &&
+	# check full error message; we want to be sure we mention both
+	# of the rejected types (glob,icase), but not the allowed one (top)
+	echo "fatal: pathspec magic not supported by --follow: ${SQ}glob${SQ}, ${SQ}icase${SQ}" >expect &&
+	test_cmp expect stderr
+'
+
+test_expect_success 'log.follow disabled with unsupported pathspec magic' '
+	test_config log.follow true &&
+	git log --format=%s ":(glob,icase)ichi" >actual &&
+	echo third >expect &&
+	test_cmp expect actual
+'
+
 test_expect_success 'git config log.follow is overridden by --no-follow' '
 	test_config log.follow true &&
 	git log --no-follow --pretty="format:%s" ichi >actual &&
-- 
2.41.0.346.g8d12207a4f

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] diff: detect pathspec magic not supported by --follow
  2023-06-01 17:43       ` [PATCH 3/3] diff: detect pathspec magic not supported by --follow Jeff King
@ 2023-06-02  7:27         ` Junio C Hamano
  2023-06-15  7:26           ` Jeff King
  0 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2023-06-02  7:27 UTC (permalink / raw)
  To: Jeff King; +Cc: Jim Pryor, git

Jeff King <peff@peff.net> writes:

> The --follow code doesn't handle most forms of pathspec magic. We check
> that no unexpected ones have made it to try_to_follow_renames() with a
> runtime GUARD_PATHSPEC() check, which gives behavior like this:
>
>   $ git log --follow ':(icase)makefile' >/dev/null
>   BUG: tree-diff.c:596: unsupported magic 10
>   Aborted

Stepping back a bit, isn't the real reason why follow mode is
incompatible with the pathspec magic because it does not want to
deal with anything but a single literal pathname?

In other words, in "git log --follow ':(icase)makefile'", if the
given pathspec matches a single path in the starting commit, we
should be able to start from the commit inspecting that uniquely
found path, compare the commit with its parent(s) at the path, and
when the parent does not have the path, detect rename and continue
following that the name before the rename as the single path
starting from that parent.  Perhaps the commit we start at, HEAD,
has "Makefile" and not "makefile" (if we have two, that is an
error), so we start traversal, following the pathname "Makefile".
At some point, we may discover that the parent did not have
"Makefile", but has "GNUMakefile" with a similar contents, at which
point we swap the path we follow from "Makefile" to "GNUMakefile".

At each step, tree-diff does not need to and does not want to do
anything fancy with pathspec---the operation is always "this path in
the parent, does it exist in the parent? If not, did it come from a
different path?", and use of "literal pathspec" mode makes sense
there.

So why does try_to_follow_renames() even *need* to say anything
other than "We are in follow mode; what we have is a single path;
use it as a literal pathspec that matches the path literally without
any funky magic"?  GUARD_PATHSPEC() is "please inspect the pathspec
and see what kind of special pattern matching it requests---we
cannot allow it to ask for certain things but do allow it to ask for
other things", that sounds terribly misguided here.  The string is
literal, so we do not even want to look at it to see what it
asks---it is taken as nothing but a literal string and we do not let
it ask for anything special.

For example, if we started at HEAD to follow Makefile and in an
ancestor commit we find that the Makefile came from a funny path
":(icase)foo", instead of allowing the pathspec match code to
interprete it as a magic pathspec that matches any case permutations
of 'f/o/o', we want to force the pathspec match code to match
nothing but the funny path "colon parens icase close parens foo" by
using literal pathspec magic.

So perhaps having GUARD_PATHSPEC() there is the mistake, no?

For the above idea to work, I think we need to resolve the pathspec
early; "log --follow" should find its starting point, apply the
given pathspec to its treeish to grab the literal pathname, and use
that single literal pathname as the literal pathspec and continue,
which I do not think it does right now.  But with it done upfront,
the pathspec limiting done during the history traversal, including
try_to_follow_renames() bit, should be able to limit itself to "is
the path literally the string given as the pathspec"?

I dunno.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] diff: detect pathspec magic not supported by --follow
  2023-06-02  7:27         ` Junio C Hamano
@ 2023-06-15  7:26           ` Jeff King
  2023-06-15 19:00             ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff King @ 2023-06-15  7:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jim Pryor, git

On Fri, Jun 02, 2023 at 04:27:05PM +0900, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > The --follow code doesn't handle most forms of pathspec magic. We check
> > that no unexpected ones have made it to try_to_follow_renames() with a
> > runtime GUARD_PATHSPEC() check, which gives behavior like this:
> >
> >   $ git log --follow ':(icase)makefile' >/dev/null
> >   BUG: tree-diff.c:596: unsupported magic 10
> >   Aborted
> 
> Stepping back a bit, isn't the real reason why follow mode is
> incompatible with the pathspec magic because it does not want to
> deal with anything but a single literal pathname?

Sorry, I'm a little behind on list reading; I know this topic has
advanced almost to 'master' in the meantime, but I had a few thoughts
worth getting out.

Basically, yes, I agree with everything in your email. My series is more
or less papering over deficiencies in the --follow code, and that code
would be much better off thinking of single literal paths that it is
following.

I was trying with my series not to go down the rabbit hole of --follow
deficiencies. So in that sense, I think it is still a good idea for the
short-term. And though we might perhaps revert some of it if we actually
improve how --follow works, I don't think it would be too hard to do so
later. But of course if we are going to improve --follow _now_, then it
could replace my series.

And what you outlined here:

> For the above idea to work, I think we need to resolve the pathspec
> early; "log --follow" should find its starting point, apply the
> given pathspec to its treeish to grab the literal pathname, and use
> that single literal pathname as the literal pathspec and continue,
> which I do not think it does right now.  But with it done upfront,
> the pathspec limiting done during the history traversal, including
> try_to_follow_renames() bit, should be able to limit itself to "is
> the path literally the string given as the pathspec"?

seems mostly sensible to me, except that the notion of "starting point"
is ambiguous. If I do:

  git log --follow branch1 branch2 -- foo*

then we have two treeishes. Do we look at one? Both? What happens if the
pathspec resolves differently?

Off the top of my head, I'd say that we should look at all starting
tips, resolve the pathspec in all of them, and complain if it doesn't
resolve to the same single path in each one. Because it otherwise would
produce garbled results.

To be fair, this same garbled case already happens when traversing a
merge sends us down two paths of history, as a followed rename might be
present on one but not the other, and we keep only a single path. But as
you know, that is a long-time deficiency of --follow. :) At least
detecting it from the starting tips is easy and something the user can
deal with by modifying the command invocation.

So I dunno. That doesn't sound _too_ hard to do, though it probably also
needs a lot of the same infrastructure I introduced in my series. Namely
that log.follow needs to ask "hey, is this pathspec usable for
following?" so that it can quietly turn the feature off, rather than
triggering an error.

So I'd suggest to let my series continue graduating, and then if anybody
wants to work on more sensible pathspec handling, to do so on top.

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] diff: detect pathspec magic not supported by --follow
  2023-06-15  7:26           ` Jeff King
@ 2023-06-15 19:00             ` Junio C Hamano
  0 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2023-06-15 19:00 UTC (permalink / raw)
  To: Jeff King; +Cc: Jim Pryor, git

Jeff King <peff@peff.net> writes:

> ... except that the notion of "starting point"
> is ambiguous. If I do:
>
>   git log --follow branch1 branch2 -- foo*
>
> then we have two treeishes. Do we look at one? Both? What happens if the
> pathspec resolves differently?

If we consider a middle-ground where we do not keep a separate
pathname that each history traversal is following but only keep a
single pathname, then the answer is one of "ignore others and pick
one" (which is essentially what needs to happen in the current
implementation when a path gets renamed between a child commit and
its parent while the other parents do not have renames), "ignore
others and pick one, but give a warning that the result may be
incomplete", or "barf".  The latter two would need us to look at all
treeishes.

A false-positive-ridden "solution" that would slightly be better
than that would be to still keep a global, single, pathspec that has
more than one elements (all of which are literal pathspec elements).
Every time we discover a rename, we add the newly discovered old
name as an additional element, instead of replacing the single
element.  If we were to go that route, then starting from multiple
points, each following a pathname, would be a simple matter of
looking at all trees and finding which pathname each history
traversal wants to follow and creating a pathspec with all these
(literal) pathnames.

But once we fix the underlying problem, and start tracking which
path each history traversal trajectory is currently following, then
it is not a problem for each starting point to have a pathname that
is being followed.  Each commit would _know_ which pathname it is
interested in, and when a traversal from a commit to its parent sees
a rename, the pathname being followed for the parent commit will be
different from the child.

When this happens, the parent may have a pathname it is already
interested in, set by traversing from another child (i.e. one side
of a fork renamed the path one way, while the other side of a fork
kept the same name or renamed it to another way, and we traversed
both forks down and are now at the fork point).  It most likely
should be the same name we discovered from the other child, but in
weird cases it may be different (e.g. a child thinks its path A came
from parent's path X while another child thinks its path A came from
parent's path Y), in which case we may have to follow two pathnames
in the parent commit, as if the pathname we have been following in
the child had combined contents from multiple paths.

> So I'd suggest to let my series continue graduating, and then if anybody
> wants to work on more sensible pathspec handling, to do so on top.

I think we are in agreement on this---otherwise I wouldn't have
merged the patch as-is down to 'next' ;-)

As I said multiple times in the past, the current "log --follow" is
more of a checkbox item than a feature, and in that context, I think
your "fix", which may be papering over just one obvious breakage, is
what the codepath should be happy with, before it gets reworked more
seriously.

Thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-06-15 19:00 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-27 13:53 git 2.40.1 tree-diff crashes with (glob) magic on pathspecs Jim Pryor
2023-05-27 17:39 ` Jeff King
2023-05-27 19:00   ` Jim Pryor
2023-06-01 17:37     ` [PATCH 0/3] handling pathspec magic with --follow Jeff King
2023-06-01 17:38       ` [PATCH 1/3] pathspec: factor out magic-to-name function Jeff King
2023-06-01 17:41       ` [PATCH 2/3] diff: factor out --follow pathspec check Jeff King
2023-06-01 17:43       ` [PATCH 3/3] diff: detect pathspec magic not supported by --follow Jeff King
2023-06-02  7:27         ` Junio C Hamano
2023-06-15  7:26           ` Jeff King
2023-06-15 19:00             ` Junio C Hamano
2023-06-01  5:38   ` git 2.40.1 tree-diff crashes with (glob) magic on pathspecs Junio C Hamano
2023-06-01 13:30     ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).