All of lore.kernel.org
 help / color / mirror / Atom feed
* Change on check-attr behavior
@ 2019-01-17 15:47 Sérgio Peixoto
  2019-01-17 16:07 ` Jeff King
  0 siblings, 1 reply; 15+ messages in thread
From: Sérgio Peixoto @ 2019-01-17 15:47 UTC (permalink / raw)
  To: git

Hi,

I think there is a bug on the check-attr behavior of git when asking
for the "allowed-ext" attribute.  Check the logs below to see that
with version  2.20.1.windows.1 we get unspecified even the attribute
is there as you can see when asking for all the attributes.

=== OLD VERSION ===
> git version
git version 2.8.1.windows.1

> git check-attr -a test.py
test.py: text: set
test.py: allowed-ext: 100

> git check-attr allowed-ext test.py
test.py: allowed-ext: 100

=== NEW VERSION ===
> git --version
git version 2.20.1.windows.1

> git check-attr -a test.py
test.py: text: set
test.py: allowed-ext: 100

> git check-attr allowed-ext test.py
test.py: allowed-ext: unspecified

Cheers,
Sérgio Peixoto

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Change on check-attr behavior
  2019-01-17 15:47 Change on check-attr behavior Sérgio Peixoto
@ 2019-01-17 16:07 ` Jeff King
  2019-01-18  9:41   ` Sérgio Peixoto
  0 siblings, 1 reply; 15+ messages in thread
From: Jeff King @ 2019-01-17 16:07 UTC (permalink / raw)
  To: Sérgio Peixoto; +Cc: git

On Thu, Jan 17, 2019 at 03:47:09PM +0000, Sérgio Peixoto wrote:

> I think there is a bug on the check-attr behavior of git when asking
> for the "allowed-ext" attribute.  Check the logs below to see that
> with version  2.20.1.windows.1 we get unspecified even the attribute
> is there as you can see when asking for all the attributes.
> 
> === OLD VERSION ===
> > git version
> git version 2.8.1.windows.1
> 
> > git check-attr -a test.py
> test.py: text: set
> test.py: allowed-ext: 100
> 
> > git check-attr allowed-ext test.py
> test.py: allowed-ext: 100
> 
> === NEW VERSION ===
> > git --version
> git version 2.20.1.windows.1
> 
> > git check-attr -a test.py
> test.py: text: set
> test.py: allowed-ext: 100
> 
> > git check-attr allowed-ext test.py
> test.py: allowed-ext: unspecified

I can't reproduce here (on Linux, but I don't think the attr code is
particularly Windows-specific).  Can you show us what's in your
.gitattributes file(s)?

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Change on check-attr behavior
  2019-01-17 16:07 ` Jeff King
@ 2019-01-18  9:41   ` Sérgio Peixoto
  2019-01-18 16:58     ` Jeff King
  0 siblings, 1 reply; 15+ messages in thread
From: Sérgio Peixoto @ 2019-01-18  9:41 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Create a file  .git/info/attributes with contents

If the contents are:
[attr]allowed-ext
*.py allowed-ext=100

then the problem occurs.

If contents are:
#[attr]allowed-ext
*.py allowed-ext=100

the problem is gone

On Thu, Jan 17, 2019 at 4:07 PM Jeff King <peff@peff.net> wrote:
>
> On Thu, Jan 17, 2019 at 03:47:09PM +0000, Sérgio Peixoto wrote:
>
> > I think there is a bug on the check-attr behavior of git when asking
> > for the "allowed-ext" attribute.  Check the logs below to see that
> > with version  2.20.1.windows.1 we get unspecified even the attribute
> > is there as you can see when asking for all the attributes.
> >
> > === OLD VERSION ===
> > > git version
> > git version 2.8.1.windows.1
> >
> > > git check-attr -a test.py
> > test.py: text: set
> > test.py: allowed-ext: 100
> >
> > > git check-attr allowed-ext test.py
> > test.py: allowed-ext: 100
> >
> > === NEW VERSION ===
> > > git --version
> > git version 2.20.1.windows.1
> >
> > > git check-attr -a test.py
> > test.py: text: set
> > test.py: allowed-ext: 100
> >
> > > git check-attr allowed-ext test.py
> > test.py: allowed-ext: unspecified
>
> I can't reproduce here (on Linux, but I don't think the attr code is
> particularly Windows-specific).  Can you show us what's in your
> .gitattributes file(s)?
>
> -Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Change on check-attr behavior
  2019-01-18  9:41   ` Sérgio Peixoto
@ 2019-01-18 16:58     ` Jeff King
  2019-01-18 21:34       ` [PATCH] attr: do not mark queried macros as unset Jeff King
  0 siblings, 1 reply; 15+ messages in thread
From: Jeff King @ 2019-01-18 16:58 UTC (permalink / raw)
  To: Sérgio Peixoto; +Cc: Brandon Williams, git

On Fri, Jan 18, 2019 at 09:41:03AM +0000, Sérgio Peixoto wrote:

> Create a file  .git/info/attributes with contents
> 
> If the contents are:
> [attr]allowed-ext
> *.py allowed-ext=100
> 
> then the problem occurs.
> 
> If contents are:
> #[attr]allowed-ext
> *.py allowed-ext=100
> 
> the problem is gone

Ah, thanks, that's the secret sauce: it only affects macros.

I don't know how representative that attributes file is of your real
repo, but there's possibly one immediate workaround: there's no need for
the [attr] line here. You are free to define your own attributes, and
only need "[attr]" if you're defining a macro that expands to other
attributes.

Now, on to the actual bug. The simplest reproduction is:

  (echo "[attr]foo bar"; echo "* foo") >.gitattributes
  git check-attr foo file

which should report "foo" as set. This bisects to 60a12722ac (attr:
remove maybe-real, maybe-macro from git_attr, 2017-01-27), and it seems
like an unintentional regression there. I haven't yet poked into that
commit to see what the fix will look like.

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH] attr: do not mark queried macros as unset
  2019-01-18 16:58     ` Jeff King
@ 2019-01-18 21:34       ` Jeff King
  2019-01-18 21:46         ` Jeff King
                           ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Jeff King @ 2019-01-18 21:34 UTC (permalink / raw)
  To: Sérgio Peixoto
  Cc: Nguyễn Thái Ngọc Duy, Brandon Williams, git

On Fri, Jan 18, 2019 at 11:58:01AM -0500, Jeff King wrote:

> Now, on to the actual bug. The simplest reproduction is:
> 
>   (echo "[attr]foo bar"; echo "* foo") >.gitattributes
>   git check-attr foo file

Actually, even simpler is to just "binary", which is pre-defined as a
macro. :)

> which should report "foo" as set. This bisects to 60a12722ac (attr:
> remove maybe-real, maybe-macro from git_attr, 2017-01-27), and it seems
> like an unintentional regression there. I haven't yet poked into that
> commit to see what the fix will look like.

So here's the fix I came up with. +cc Duy, as this is really tangled
with his older 06a604e670.

-- >8 --
Subject: [PATCH] attr: do not mark queried macros as unset

Since 60a12722ac (attr: remove maybe-real, maybe-macro from git_attr,
2017-01-27), we will always mark an attribute macro (e.g., "binary")
that is specifically queried for as "unspecified", even though listing
_all_ attributes would display it at set. E.g.:

  $ echo "* binary" >.gitattributes

  $ git check-attr -a file
  file: binary: set
  file: diff: unset
  file: merge: unset
  file: text: unset

  $ git check-attr binary file
  file: binary: unspecified

The problem stems from an incorrect conversion of the optimization from
06a604e670 (attr: avoid heavy work when we know the specified attr is
not defined, 2014-12-28). There we tried in collect_some_attrs() to
avoid even looking at the attr_stack when the user has asked for "foo"
and we know that "foo" did not ever appear in any .gitattributes file.

It used a flag "maybe_real" in each attribute struct, where "real" meant
that the attribute appeared in an actual file (we have to make this
distinction because we also create an attribute struct for any names
that are being queried). But as explained in that commit message, the
meaning of "real" was tangled with some special cases around macros.

When 06a604e670 later refactored the macro code, it dropped maybe_real
entirely. This missed the fact that "maybe_real" could be unset for two
reasons: because of a macro, or because it was never found during
parsing. This had two results:

  - the optimization in collect_some_attrs() ceased doing anything
    meaningful, since it no longer kept track of "was it found during
    parsing"

  - worse, it actually kicked in when the caller _did_ ask about a macro
    by name, causing us to mark it as unspecified

It should be possible to salvage this optimization, but let's start with
just removing the remnants. It hasn't been doing anything (except
creating bugs) since 60a12722ac, and nobody seems to have noticed the
performance regression. It's more important to fix the correctness
problem clearly first.

I've added two tests here. The second one actually shows off the bug.
The test of "check-attr -a" is not strictly necessary, but we currently
do not test attribute macros much, and the builtin "binary" not at all.
So this increases our general test coverage, as well as making sure we
didn't mess up this related case.

Signed-off-by: Jeff King <peff@peff.net>
---
 attr.c                | 16 +---------------
 t/t0003-attributes.sh | 20 ++++++++++++++++++++
 2 files changed, 21 insertions(+), 15 deletions(-)

diff --git a/attr.c b/attr.c
index eaece6658d..57ced792f8 100644
--- a/attr.c
+++ b/attr.c
@@ -1092,7 +1092,7 @@ static void collect_some_attrs(const struct index_state *istate,
 			       const char *path,
 			       struct attr_check *check)
 {
-	int i, pathlen, rem, dirlen;
+	int pathlen, rem, dirlen;
 	const char *cp, *last_slash = NULL;
 	int basename_offset;
 
@@ -1113,20 +1113,6 @@ static void collect_some_attrs(const struct index_state *istate,
 	all_attrs_init(&g_attr_hashmap, check);
 	determine_macros(check->all_attrs, check->stack);
 
-	if (check->nr) {
-		rem = 0;
-		for (i = 0; i < check->nr; i++) {
-			int n = check->items[i].attr->attr_nr;
-			struct all_attrs_item *item = &check->all_attrs[n];
-			if (item->macro) {
-				item->value = ATTR__UNSET;
-				rem++;
-			}
-		}
-		if (rem == check->nr)
-			return;
-	}
-
 	rem = check->all_attrs_nr;
 	fill(path, pathlen, basename_offset, check->stack, check->all_attrs, rem);
 }
diff --git a/t/t0003-attributes.sh b/t/t0003-attributes.sh
index 22499bce5f..71e63d8b50 100755
--- a/t/t0003-attributes.sh
+++ b/t/t0003-attributes.sh
@@ -322,4 +322,24 @@ test_expect_success 'bare repository: test info/attributes' '
 	)
 '
 
+test_expect_success 'binary macro expanded by -a' '
+	echo "file binary" >.gitattributes &&
+	cat >expect <<-\EOF &&
+	file: binary: set
+	file: diff: unset
+	file: merge: unset
+	file: text: unset
+	EOF
+	git check-attr -a file >actual &&
+	test_cmp expect actual
+'
+
+
+test_expect_success 'query binary macro directly' '
+	echo "file binary" >.gitattributes &&
+	echo file: binary: set >expect &&
+	git check-attr binary file >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.20.1.691.ge06e0a624f


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] attr: do not mark queried macros as unset
  2019-01-18 21:34       ` [PATCH] attr: do not mark queried macros as unset Jeff King
@ 2019-01-18 21:46         ` Jeff King
  2019-01-18 22:19           ` Stefan Beller
  2019-01-22 22:00           ` Junio C Hamano
  2019-01-21 10:05         ` Duy Nguyen
                           ` (2 subsequent siblings)
  3 siblings, 2 replies; 15+ messages in thread
From: Jeff King @ 2019-01-18 21:46 UTC (permalink / raw)
  To: Sérgio Peixoto
  Cc: Nguyễn Thái Ngọc Duy, Brandon Williams, git

On Fri, Jan 18, 2019 at 04:34:58PM -0500, Jeff King wrote:

> When 06a604e670 later refactored the macro code, it dropped maybe_real
> entirely. This missed the fact that "maybe_real" could be unset for two
> reasons: because of a macro, or because it was never found during
> parsing. This had two results:
> 
>   - the optimization in collect_some_attrs() ceased doing anything
>     meaningful, since it no longer kept track of "was it found during
>     parsing"
> 
>   - worse, it actually kicked in when the caller _did_ ask about a macro
>     by name, causing us to mark it as unspecified
> 
> It should be possible to salvage this optimization, but let's start with
> just removing the remnants. It hasn't been doing anything (except
> creating bugs) since 60a12722ac, and nobody seems to have noticed the
> performance regression. It's more important to fix the correctness
> problem clearly first.

And here's a resurrection of the optimization that _seems_ to work, but
I'm not 100% confident in.

In particular, it does not care about macros at all. It simply asks: is
this queried attribute a thing which was ever mentioned in the
attributes files (either as a path match or as a possible macro
expansion). If not, then we know we do not need to look further for it.

But that leaves me unsure why the original optimization needed to care
about macros at all. Has something changed since then with respect to
the way we expand macros since then? Or am I totally missing some case
that will cause problems?

I guess maybe what I'm missing is that asking for "diff" means that we
need to care about:

  - whether "diff" was mentioned in the stack

  - whether "binary" was mentioned in the stack

But just "binary" mentioning "diff" is not interesting without somebody
actually mentioning "binary". I.e., I don't think the patch here will
produce wrong results, but it will not kick in as often as we might
like.

I'm not sure how to do it robustly without being able to reverse-map all
of the macros after we've resolved them (i.e., to know that "diff" gets
mentioned by "binary", and then check if "binary" is actually
mentioned). I think that would be possible now, as we should know that
after determine_macros(). But I also wonder if we are hitting
diminishing returns (after all, determine_macros() is already walking
the attr stack).

I dunno. This is why I submitted the initial patch as the simplest fix. ;)

---
diff --git a/attr.c b/attr.c
index 57ced792f8..c3cbfa6501 100644
--- a/attr.c
+++ b/attr.c
@@ -31,6 +31,7 @@ static const char git_attr__unknown[] = "(builtin)unknown";
 
 struct git_attr {
 	int attr_nr; /* unique attribute number */
+	int in_stack; /* actually found in some attribute stack */
 	char name[FLEX_ARRAY]; /* attribute name */
 };
 
@@ -220,7 +221,8 @@ static void report_invalid_attr(const char *name, size_t len,
  * dictionary.  If no entry is found, create a new attribute and store it in
  * the dictionary.
  */
-static const struct git_attr *git_attr_internal(const char *name, int namelen)
+static const struct git_attr *git_attr_internal(const char *name, int namelen,
+						int in_stack)
 {
 	struct git_attr *a;
 
@@ -240,6 +242,8 @@ static const struct git_attr *git_attr_internal(const char *name, int namelen)
 		       (hashmap_get_size(&g_attr_hashmap.map) - 1));
 	}
 
+	a->in_stack |= in_stack;
+
 	hashmap_unlock(&g_attr_hashmap);
 
 	return a;
@@ -247,7 +251,7 @@ static const struct git_attr *git_attr_internal(const char *name, int namelen)
 
 const struct git_attr *git_attr(const char *name)
 {
-	return git_attr_internal(name, strlen(name));
+	return git_attr_internal(name, strlen(name), 0);
 }
 
 /* What does a matched pattern decide? */
@@ -335,7 +339,7 @@ static const char *parse_attr(const char *src, int lineno, const char *cp,
 		else {
 			e->setto = xmemdupz(equals + 1, ep - equals - 1);
 		}
-		e->attr = git_attr_internal(cp, len);
+		e->attr = git_attr_internal(cp, len, 1);
 	}
 	return ep + strspn(ep, blank);
 }
@@ -396,7 +400,7 @@ static struct match_attr *parse_attr_line(const char *line, const char *src,
 		      sizeof(struct attr_state) * num_attr +
 		      (is_macro ? 0 : namelen + 1));
 	if (is_macro) {
-		res->u.attr = git_attr_internal(name, namelen);
+		res->u.attr = git_attr_internal(name, namelen, 1);
 	} else {
 		char *p = (char *)&(res->state[num_attr]);
 		memcpy(p, name, namelen);
@@ -1093,6 +1097,7 @@ static void collect_some_attrs(const struct index_state *istate,
 			       struct attr_check *check)
 {
 	int pathlen, rem, dirlen;
+	int i;
 	const char *cp, *last_slash = NULL;
 	int basename_offset;
 
@@ -1111,6 +1116,21 @@ static void collect_some_attrs(const struct index_state *istate,
 
 	prepare_attr_stack(istate, path, dirlen, &check->stack);
 	all_attrs_init(&g_attr_hashmap, check);
+
+	if (check->nr) {
+		rem = 0;
+		for (i = 0; i < check->nr; i++) {
+			int n = check->items[i].attr->attr_nr;
+			struct all_attrs_item *item = &check->all_attrs[n];
+			if (!item->attr->in_stack) {
+				item->value = ATTR__UNSET;
+				rem++;
+			}
+		}
+		if (rem == check->nr)
+			return;
+	}
+
 	determine_macros(check->all_attrs, check->stack);
 
 	rem = check->all_attrs_nr;

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] attr: do not mark queried macros as unset
  2019-01-18 21:46         ` Jeff King
@ 2019-01-18 22:19           ` Stefan Beller
  2019-01-22  7:19             ` Jeff King
  2019-01-22 22:00           ` Junio C Hamano
  1 sibling, 1 reply; 15+ messages in thread
From: Stefan Beller @ 2019-01-18 22:19 UTC (permalink / raw)
  To: Jeff King
  Cc: Sérgio Peixoto, Nguyễn Thái Ngọc Duy,
	Brandon Williams, git

> I dunno. This is why I submitted the initial patch as the simplest fix. ;)
>

The first patch is
Reviewed-by: Stefan Beller <sbeller@google.com>

Diffing across both patches, this seems to be the relevant part:

---8<---
@@ -1111,14 +1116,13 @@ static void collect_some_attrs(const struct
index_state *istate,

        prepare_attr_stack(istate, path, dirlen, &check->stack);
        all_attrs_init(&g_attr_hashmap, check);
-       determine_macros(check->all_attrs, check->stack);

        if (check->nr) {
                rem = 0;
                for (i = 0; i < check->nr; i++) {
                        int n = check->items[i].attr->attr_nr;
                        struct all_attrs_item *item = &check->all_attrs[n];
-                       if (item->macro) {
+                       if (!item->attr->in_stack) {
                                item->value = ATTR__UNSET;
                                rem++;
                        }
@@ -1127,6 +1131,8 @@ static void collect_some_attrs(const struct
index_state *istate,
                        return;
        }

+       determine_macros(check->all_attrs, check->stack);
+
        rem = check->all_attrs_nr;
        fill(path, pathlen, basename_offset, check->stack,
check->all_attrs, rem);
 }
---8<---

which I think is correct.

Maybe we could refactor the big condition (if (check->nr)) to be
its own function and have

    if (!check_overlaps_all_attrs(check))
        return;

instead. The function would allow for a natural place to put a comment
convincing us why the optimisation works as expected. :-)

And after rereading that code, the optimisation checks
if any of the requested attributes in 'check' are touched in
all_attrs, which sounds like a natural optimisation when we assume
that filling in the actual values take a lot of time as the stack
of attribute files might be large.

I think this patch is correct, too.

Stefan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] attr: do not mark queried macros as unset
  2019-01-18 21:34       ` [PATCH] attr: do not mark queried macros as unset Jeff King
  2019-01-18 21:46         ` Jeff King
@ 2019-01-21 10:05         ` Duy Nguyen
  2019-01-22  7:21           ` Jeff King
  2019-01-22  9:34         ` Duy Nguyen
  2019-01-22 21:48         ` Junio C Hamano
  3 siblings, 1 reply; 15+ messages in thread
From: Duy Nguyen @ 2019-01-21 10:05 UTC (permalink / raw)
  To: Jeff King; +Cc: Sérgio Peixoto, Brandon Williams, Git Mailing List

On Sat, Jan 19, 2019 at 4:35 AM Jeff King <peff@peff.net> wrote:
>
> On Fri, Jan 18, 2019 at 11:58:01AM -0500, Jeff King wrote:
>
> > Now, on to the actual bug. The simplest reproduction is:
> >
> >   (echo "[attr]foo bar"; echo "* foo") >.gitattributes
> >   git check-attr foo file
>
> Actually, even simpler is to just "binary", which is pre-defined as a
> macro. :)
>
> > which should report "foo" as set. This bisects to 60a12722ac (attr:
> > remove maybe-real, maybe-macro from git_attr, 2017-01-27), and it seems
> > like an unintentional regression there. I haven't yet poked into that
> > commit to see what the fix will look like.
>
> So here's the fix I came up with. +cc Duy, as this is really tangled
> with his older 06a604e670.
>
> -- >8 --
> Subject: [PATCH] attr: do not mark queried macros as unset
>
> Since 60a12722ac (attr: remove maybe-real, maybe-macro from git_attr,
> 2017-01-27), we will always mark an attribute macro (e.g., "binary")
> that is specifically queried for as "unspecified", even though listing
> _all_ attributes would display it at set. E.g.:
>
>   $ echo "* binary" >.gitattributes
>
>   $ git check-attr -a file
>   file: binary: set
>   file: diff: unset
>   file: merge: unset
>   file: text: unset
>
>   $ git check-attr binary file
>   file: binary: unspecified
>
> The problem stems from an incorrect conversion of the optimization from
> 06a604e670 (attr: avoid heavy work when we know the specified attr is
> not defined, 2014-12-28). There we tried in collect_some_attrs() to
> avoid even looking at the attr_stack when the user has asked for "foo"
> and we know that "foo" did not ever appear in any .gitattributes file.
>
> It used a flag "maybe_real" in each attribute struct, where "real" meant
> that the attribute appeared in an actual file (we have to make this
> distinction because we also create an attribute struct for any names
> that are being queried). But as explained in that commit message, the
> meaning of "real" was tangled with some special cases around macros.
>
> When 06a604e670 later refactored the macro code, it dropped maybe_real
> entirely. This missed the fact that "maybe_real" could be unset for two
> reasons: because of a macro, or because it was never found during
> parsing. This had two results:
>
>   - the optimization in collect_some_attrs() ceased doing anything
>     meaningful, since it no longer kept track of "was it found during
>     parsing"
>
>   - worse, it actually kicked in when the caller _did_ ask about a macro
>     by name, causing us to mark it as unspecified
>
> It should be possible to salvage this optimization, but let's start with
> just removing the remnants. It hasn't been doing anything (except
> creating bugs) since 60a12722ac, and nobody seems to have noticed the
> performance regression. It's more important to fix the correctness
> problem clearly first.

But muh optimization!!! You're right of course, correctness comes
first. I did try to look at this code but it's been a while and I'm
afraid I don't have anything valuable to say. I'll dig in more in the
next couple days.
-- 
Duy

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] attr: do not mark queried macros as unset
  2019-01-18 22:19           ` Stefan Beller
@ 2019-01-22  7:19             ` Jeff King
  2019-01-22  9:50               ` Duy Nguyen
  0 siblings, 1 reply; 15+ messages in thread
From: Jeff King @ 2019-01-22  7:19 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Sérgio Peixoto, Nguyễn Thái Ngọc Duy,
	Brandon Williams, git

On Fri, Jan 18, 2019 at 02:19:55PM -0800, Stefan Beller wrote:

> > I dunno. This is why I submitted the initial patch as the simplest fix. ;)
> >
> 
> The first patch is
> Reviewed-by: Stefan Beller <sbeller@google.com>

Thanks!

> Diffing across both patches, this seems to be the relevant part:
> [...]
> 
> ---8<---
> @@ -1111,14 +1116,13 @@ static void collect_some_attrs(const struct
> index_state *istate,
> 
>         prepare_attr_stack(istate, path, dirlen, &check->stack);
>         all_attrs_init(&g_attr_hashmap, check);
> -       determine_macros(check->all_attrs, check->stack);
> 
>         if (check->nr) {
>                 rem = 0;
>                 for (i = 0; i < check->nr; i++) {
>                         int n = check->items[i].attr->attr_nr;
>                         struct all_attrs_item *item = &check->all_attrs[n];
> -                       if (item->macro) {
> +                       if (!item->attr->in_stack) {
>                                 item->value = ATTR__UNSET;
>                                 rem++;
>                         }
> @@ -1127,6 +1131,8 @@ static void collect_some_attrs(const struct
> index_state *istate,
>                         return;
>         }
> 
> +       determine_macros(check->all_attrs, check->stack);
> +
>         rem = check->all_attrs_nr;
>         fill(path, pathlen, basename_offset, check->stack,
> check->all_attrs, rem);
>  }
> ---8<---
> 
> which I think is correct.

Yes, that's the interesting part. I think I've convinced myself, too,
that it doesn't do the _wrong_ thing ever. But I think it misses the
point of the original, which is that you want common ones like "diff"
not to trigger in_stack if nobody has actually used them. And doing that
really does mean marking in_stack not just when a macro mentions it
(because clearly "binary" is going to mention it for every repo), but
waiting to see if anybody mentions that macro.

Which means we must call determine_macros(), and then propagate the
macro's in_stack to its expansion (if it's indeed called at all).

I don't think that would be _too_ hard to do. But I also wonder if
there's much point. We are trying to avoid fill(), but I think that
determine_macros() is of roughly the same complexity (look at all
matches of all stacks). I guess it does avoid path_matches(), which is a
bit more expensive. And in theory it could be cached for a particular
stack top, so the work is amortized across many path lookups (though I
think that gets even more tricky).

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] attr: do not mark queried macros as unset
  2019-01-21 10:05         ` Duy Nguyen
@ 2019-01-22  7:21           ` Jeff King
  0 siblings, 0 replies; 15+ messages in thread
From: Jeff King @ 2019-01-22  7:21 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Sérgio Peixoto, Brandon Williams, Git Mailing List

On Mon, Jan 21, 2019 at 05:05:56PM +0700, Duy Nguyen wrote:

> > It should be possible to salvage this optimization, but let's start with
> > just removing the remnants. It hasn't been doing anything (except
> > creating bugs) since 60a12722ac, and nobody seems to have noticed the
> > performance regression. It's more important to fix the correctness
> > problem clearly first.
> 
> But muh optimization!!! You're right of course, correctness comes
> first. I did try to look at this code but it's been a while and I'm
> afraid I don't have anything valuable to say. I'll dig in more in the
> next couple days.

:) See the side-thread, and the response I just wrote to Stefan. At this
point I do think it's possible, but my hope is that I could call it
quits with the bugfix portion and hand it off to somebody interested in
this area. I know you're juggling quite a few other series, though.

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] attr: do not mark queried macros as unset
  2019-01-18 21:34       ` [PATCH] attr: do not mark queried macros as unset Jeff King
  2019-01-18 21:46         ` Jeff King
  2019-01-21 10:05         ` Duy Nguyen
@ 2019-01-22  9:34         ` Duy Nguyen
  2019-01-22 21:48         ` Junio C Hamano
  3 siblings, 0 replies; 15+ messages in thread
From: Duy Nguyen @ 2019-01-22  9:34 UTC (permalink / raw)
  To: Jeff King; +Cc: Sérgio Peixoto, Brandon Williams, Git Mailing List

On Sat, Jan 19, 2019 at 4:35 AM Jeff King <peff@peff.net> wrote:
>
> On Fri, Jan 18, 2019 at 11:58:01AM -0500, Jeff King wrote:
>
> > Now, on to the actual bug. The simplest reproduction is:
> >
> >   (echo "[attr]foo bar"; echo "* foo") >.gitattributes
> >   git check-attr foo file
>
> Actually, even simpler is to just "binary", which is pre-defined as a
> macro. :)
>
> > which should report "foo" as set. This bisects to 60a12722ac (attr:
> > remove maybe-real, maybe-macro from git_attr, 2017-01-27), and it seems
> > like an unintentional regression there. I haven't yet poked into that
> > commit to see what the fix will look like.
>
> So here's the fix I came up with. +cc Duy, as this is really tangled
> with his older 06a604e670.
>
> -- >8 --
> Subject: [PATCH] attr: do not mark queried macros as unset
>
> Since 60a12722ac (attr: remove maybe-real, maybe-macro from git_attr,
> 2017-01-27), we will always mark an attribute macro (e.g., "binary")
> that is specifically queried for as "unspecified", even though listing
> _all_ attributes would display it at set. E.g.:
>
>   $ echo "* binary" >.gitattributes
>
>   $ git check-attr -a file
>   file: binary: set
>   file: diff: unset
>   file: merge: unset
>   file: text: unset
>
>   $ git check-attr binary file
>   file: binary: unspecified
>
> The problem stems from an incorrect conversion of the optimization from
> 06a604e670 (attr: avoid heavy work when we know the specified attr is
> not defined, 2014-12-28). There we tried in collect_some_attrs() to
> avoid even looking at the attr_stack when the user has asked for "foo"
> and we know that "foo" did not ever appear in any .gitattributes file.
>
> It used a flag "maybe_real" in each attribute struct, where "real" meant
> that the attribute appeared in an actual file (we have to make this
> distinction because we also create an attribute struct for any names
> that are being queried). But as explained in that commit message, the
> meaning of "real" was tangled with some special cases around macros.
>
> When 06a604e670 later refactored the macro code, it dropped maybe_real

60a12722ac or 06a604e670? I'm guessing the former.

> entirely. This missed the fact that "maybe_real" could be unset for two
> reasons: because of a macro, or because it was never found during
> parsing. This had two results:
>
>   - the optimization in collect_some_attrs() ceased doing anything
>     meaningful, since it no longer kept track of "was it found during
>     parsing"
>
>   - worse, it actually kicked in when the caller _did_ ask about a macro
>     by name, causing us to mark it as unspecified
>
> It should be possible to salvage this optimization, but let's start with
> just removing the remnants. It hasn't been doing anything (except
> creating bugs) since 60a12722ac, and nobody seems to have noticed the
> performance regression. It's more important to fix the correctness
> problem clearly first.
>
> I've added two tests here. The second one actually shows off the bug.
> The test of "check-attr -a" is not strictly necessary, but we currently
> do not test attribute macros much, and the builtin "binary" not at all.
> So this increases our general test coverage, as well as making sure we
> didn't mess up this related case.
>
> Signed-off-by: Jeff King <peff@peff.net>

Reviewed-by: me.

> ---
>  attr.c                | 16 +---------------
>  t/t0003-attributes.sh | 20 ++++++++++++++++++++
>  2 files changed, 21 insertions(+), 15 deletions(-)
>
> diff --git a/attr.c b/attr.c
> index eaece6658d..57ced792f8 100644
> --- a/attr.c
> +++ b/attr.c
> @@ -1092,7 +1092,7 @@ static void collect_some_attrs(const struct index_state *istate,
>                                const char *path,
>                                struct attr_check *check)
>  {
> -       int i, pathlen, rem, dirlen;
> +       int pathlen, rem, dirlen;
>         const char *cp, *last_slash = NULL;
>         int basename_offset;
>
> @@ -1113,20 +1113,6 @@ static void collect_some_attrs(const struct index_state *istate,
>         all_attrs_init(&g_attr_hashmap, check);
>         determine_macros(check->all_attrs, check->stack);
>
> -       if (check->nr) {
> -               rem = 0;
> -               for (i = 0; i < check->nr; i++) {
> -                       int n = check->items[i].attr->attr_nr;
> -                       struct all_attrs_item *item = &check->all_attrs[n];
> -                       if (item->macro) {
> -                               item->value = ATTR__UNSET;
> -                               rem++;
> -                       }
> -               }
> -               if (rem == check->nr)
> -                       return;
> -       }
> -
>         rem = check->all_attrs_nr;
>         fill(path, pathlen, basename_offset, check->stack, check->all_attrs, rem);
>  }
> diff --git a/t/t0003-attributes.sh b/t/t0003-attributes.sh
> index 22499bce5f..71e63d8b50 100755
> --- a/t/t0003-attributes.sh
> +++ b/t/t0003-attributes.sh
> @@ -322,4 +322,24 @@ test_expect_success 'bare repository: test info/attributes' '
>         )
>  '
>
> +test_expect_success 'binary macro expanded by -a' '
> +       echo "file binary" >.gitattributes &&
> +       cat >expect <<-\EOF &&
> +       file: binary: set
> +       file: diff: unset
> +       file: merge: unset
> +       file: text: unset
> +       EOF
> +       git check-attr -a file >actual &&
> +       test_cmp expect actual
> +'
> +
> +
> +test_expect_success 'query binary macro directly' '
> +       echo "file binary" >.gitattributes &&
> +       echo file: binary: set >expect &&
> +       git check-attr binary file >actual &&
> +       test_cmp expect actual
> +'
> +
>  test_done
> --
> 2.20.1.691.ge06e0a624f
>


-- 
Duy

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] attr: do not mark queried macros as unset
  2019-01-22  7:19             ` Jeff King
@ 2019-01-22  9:50               ` Duy Nguyen
  0 siblings, 0 replies; 15+ messages in thread
From: Duy Nguyen @ 2019-01-22  9:50 UTC (permalink / raw)
  To: Jeff King; +Cc: Stefan Beller, Sérgio Peixoto, Brandon Williams, git

On Tue, Jan 22, 2019 at 2:19 PM Jeff King <peff@peff.net> wrote:
> Yes, that's the interesting part. I think I've convinced myself, too,
> that it doesn't do the _wrong_ thing ever. But I think it misses the
> point of the original, which is that you want common ones like "diff"
> not to trigger in_stack if nobody has actually used them.

Yes. I don't think it matters much when you don't have a lot of
attributes, but if you do, the cost of lookup will be proportional to
the stack's depth even whenever you look up some attribute, even
though you don't use it. This makes code that uses attributes just a
tiny bit slower over time because I think we still add more and more
attributes.

> And doing that
> really does mean marking in_stack not just when a macro mentions it
> (because clearly "binary" is going to mention it for every repo), but
> waiting to see if anybody mentions that macro.
>
> Which means we must call determine_macros(), and then propagate the
> macro's in_stack to its expansion (if it's indeed called at all).
>
> I don't think that would be _too_ hard to do. But I also wonder if
> there's much point. We are trying to avoid fill(), but I think that
> determine_macros() is of roughly the same complexity (look at all
> matches of all stacks). I guess it does avoid path_matches(), which is a
> bit more expensive. And in theory it could be cached for a particular
> stack top, so the work is amortized across many path lookups (though I
> think that gets even more tricky).

There is a comment that got eventually removed in bw/attr, especially
the second to last sentence.

-/*
- * NEEDSWORK: maybe-real, maybe-macro are not property of
- * an attribute, as it depends on what .gitattributes are
- * read.  Once we introduce per git_attr_check attr_stack
- * and check_all_attr, the optimization based on them will
- * become unnecessary and can go away.  So is this variable.
- */
-static int cannot_trust_maybe_real;

The promise here is, after we have moved away from global attribute
stack, we can build custom stacks containing only queried attributes.
This makes attribute stacks short (in the best case, empty, which is
what my optimization is for) which means fill time (I think it's
path_matches() would dominate) becomes shorter in the _general_ case,
so this optimization "will become unnecessary". More importantly the
total number of attributes will not matter since we only look at what
we are interested. This makes attribute lookup scale much better in
the long run.

This part, building custom stacks, has not come true yet. But if we
optimize this code again, I think this is the way forward. Perhaps
this could be one of the mini projects for Matthey's students. The
scope is relatively small, and optimization is always fun.
-- 
Duy

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] attr: do not mark queried macros as unset
  2019-01-18 21:34       ` [PATCH] attr: do not mark queried macros as unset Jeff King
                           ` (2 preceding siblings ...)
  2019-01-22  9:34         ` Duy Nguyen
@ 2019-01-22 21:48         ` Junio C Hamano
  2019-01-23  5:40           ` Jeff King
  3 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2019-01-22 21:48 UTC (permalink / raw)
  To: Jeff King
  Cc: Sérgio Peixoto, Nguyễn Thái Ngọc Duy,
	Brandon Williams, git

Jeff King <peff@peff.net> writes:

> On Fri, Jan 18, 2019 at 11:58:01AM -0500, Jeff King wrote:
>
>> Now, on to the actual bug. The simplest reproduction is:
>> 
>>   (echo "[attr]foo bar"; echo "* foo") >.gitattributes
>>   git check-attr foo file
>
> Actually, even simpler is to just "binary", which is pre-defined as a
> macro. :)
>
>> which should report "foo" as set. This bisects to 60a12722ac (attr:
>> remove maybe-real, maybe-macro from git_attr, 2017-01-27), and it seems
>> like an unintentional regression there. I haven't yet poked into that
>> commit to see what the fix will look like.
>
> So here's the fix I came up with. +cc Duy, as this is really tangled
> with his older 06a604e670.
>
> -- >8 --
> Subject: [PATCH] attr: do not mark queried macros as unset
>
> Since 60a12722ac (attr: remove maybe-real, maybe-macro from git_attr,
> 2017-01-27), we will always mark an attribute macro (e.g., "binary")
> that is specifically queried for as "unspecified", even though listing
> _all_ attributes would display it at set. E.g.:
>
>   $ echo "* binary" >.gitattributes
>
>   $ git check-attr -a file
>   file: binary: set
>   file: diff: unset
>   file: merge: unset
>   file: text: unset
>
>   $ git check-attr binary file
>   file: binary: unspecified
>
> The problem stems from an incorrect conversion of the optimization from
> 06a604e670 (attr: avoid heavy work when we know the specified attr is
> not defined, 2014-12-28). There we tried in collect_some_attrs() to
> avoid even looking at the attr_stack when the user has asked for "foo"
> and we know that "foo" did not ever appear in any .gitattributes file.
>
> It used a flag "maybe_real" in each attribute struct, where "real" meant
> that the attribute appeared in an actual file (we have to make this
> distinction because we also create an attribute struct for any names
> that are being queried). But as explained in that commit message, the
> meaning of "real" was tangled with some special cases around macros.
>
> When 06a604e670 later refactored the macro code, it dropped maybe_real

I think 60a12722ac is what you meant here.

> entirely. This missed the fact that "maybe_real" could be unset for two
> reasons: because of a macro, or because it was never found during
> parsing. This had two results:
>
>   - the optimization in collect_some_attrs() ceased doing anything
>     meaningful, since it no longer kept track of "was it found during
>     parsing"
>
>   - worse, it actually kicked in when the caller _did_ ask about a macro
>     by name, causing us to mark it as unspecified


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] attr: do not mark queried macros as unset
  2019-01-18 21:46         ` Jeff King
  2019-01-18 22:19           ` Stefan Beller
@ 2019-01-22 22:00           ` Junio C Hamano
  1 sibling, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2019-01-22 22:00 UTC (permalink / raw)
  To: Jeff King
  Cc: Sérgio Peixoto, Nguyễn Thái Ngọc Duy,
	Brandon Williams, git

Jeff King <peff@peff.net> writes:

> And here's a resurrection of the optimization that _seems_ to work, but
> I'm not 100% confident in.
>
> In particular, it does not care about macros at all. It simply asks: is
> this queried attribute a thing which was ever mentioned in the
> attributes files (either as a path match or as a possible macro
> expansion). If not, then we know we do not need to look further for it.

So, if we are looking for 'diff' and we know no .gitattributes (or
$GIT_DIR/info/attributes) entry for 'diff' or any macro that expands
to touch 'diff' (e.g. 'binary') is in use, we know for any path
governed by the current attr-stack 'diff' attribute is unspecified.
But if we see an entry, say, "*.exe binary", then we do need to be
aware of the possibility that 'diff' may be unset for some paths.

Makes sense.

> I guess maybe what I'm missing is that asking for "diff" means that we
> need to care about:
>
>   - whether "diff" was mentioned in the stack
>
>   - whether "binary" was mentioned in the stack
>
> But just "binary" mentioning "diff" is not interesting without somebody
> actually mentioning "binary".

Yeah, that matches my understanding (which mostly comes from the
original design before even Duy's optimization).

> I'm not sure how to do it robustly without being able to reverse-map all
> of the macros after we've resolved them (i.e., to know that "diff" gets
> mentioned by "binary", and then check if "binary" is actually
> mentioned). I think that would be possible now, as we should know that
> after determine_macros(). But I also wonder if we are hitting
> diminishing returns (after all, determine_macros() is already walking
> the attr stack).

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] attr: do not mark queried macros as unset
  2019-01-22 21:48         ` Junio C Hamano
@ 2019-01-23  5:40           ` Jeff King
  0 siblings, 0 replies; 15+ messages in thread
From: Jeff King @ 2019-01-23  5:40 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Sérgio Peixoto, Nguyễn Thái Ngọc Duy,
	Brandon Williams, git

On Tue, Jan 22, 2019 at 01:48:10PM -0800, Junio C Hamano wrote:

> > When 06a604e670 later refactored the macro code, it dropped maybe_real
> 
> I think 60a12722ac is what you meant here.

Yes indeed, too many 6's, 0's, and a's. :) I see you marked it up when
you picked up the patch. Thanks.

I should also have added:

  Reported-by: Sérgio Peixoto <sergio.peixoto@gmail.com>

to the original, if it's not too late.

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-01-23  5:40 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-17 15:47 Change on check-attr behavior Sérgio Peixoto
2019-01-17 16:07 ` Jeff King
2019-01-18  9:41   ` Sérgio Peixoto
2019-01-18 16:58     ` Jeff King
2019-01-18 21:34       ` [PATCH] attr: do not mark queried macros as unset Jeff King
2019-01-18 21:46         ` Jeff King
2019-01-18 22:19           ` Stefan Beller
2019-01-22  7:19             ` Jeff King
2019-01-22  9:50               ` Duy Nguyen
2019-01-22 22:00           ` Junio C Hamano
2019-01-21 10:05         ` Duy Nguyen
2019-01-22  7:21           ` Jeff King
2019-01-22  9:34         ` Duy Nguyen
2019-01-22 21:48         ` Junio C Hamano
2019-01-23  5:40           ` Jeff King

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.