git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Bug report: Strange behavior with `git gc` and `reference-transaction` hook
@ 2021-11-18  0:41 Waleed Khan
  2021-11-18  0:52 ` Bryan Turner
  0 siblings, 1 reply; 5+ messages in thread
From: Waleed Khan @ 2021-11-18  0:41 UTC (permalink / raw)
  To: git

Hi all,

I'm seeing unusual behavior on Git built from source at
cd3e606211bb1cf8bc57f7d76bab98cc17a150bc, but which appears to extend
back to Git v2.29.

This is a repro script:

```
#!/bin/sh

temp_dir=$(mktemp -d)
echo "git dir is: $temp_dir"
mkdir -p "$temp_dir"
trap "rm -rf $temp_dir" EXIT

cd "$temp_dir" || exit 1
git init -q
date='Thu, 07 Apr 2005 22:13:13 +0200'
GIT_AUTHOR_DATE="$date" GIT_COMMITTER_DATE="$date" git commit -q
--allow-empty -m 'Initial commit'

mkdir -p '.git/hooks'
cat >.git/hooks/reference-transaction <<'EOT'
#!/bin/sh

[[ "$1" != 'committed' ]] && exit 0
echo 'New reference-transaction invocation'

while read old new ref; do
  echo "  old: $old, new: $new, ref: $ref"
done
EOT
chmod +x .git/hooks/reference-transaction

git gc --prune=now -q
git show-ref refs/heads/master
```

And this is the output it produces:

```
git dir is: /var/folders/gn/gdp9z_g968b9nx7c9lvgy8y00000gp/T/tmp.b3Jc6qnb
New reference-transaction invocation
  old: 0000000000000000000000000000000000000000, new:
e197d18c017d4038418be8b1cd38f4503e289165, ref: refs/heads/master
New reference-transaction invocation
  old: e197d18c017d4038418be8b1cd38f4503e289165, new:
0000000000000000000000000000000000000000, ref: refs/heads/master
e197d18c017d4038418be8b1cd38f4503e289165 refs/heads/master
```

These hooks are invoked a few milliseconds one after another.

The expected behavior would be that the latest reference transaction
hook refers to the state of the references on disk. That is, either
`master` should point to 0 (be deleted), or it should have said that
`master` pointed to `e197d1`.

But if we actually examine `master`, it's set to `e197d1`, just as you
would expect. The GC should have been a no-op overall.

Best,
Waleed

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Bug report: Strange behavior with `git gc` and `reference-transaction` hook
  2021-11-18  0:41 Bug report: Strange behavior with `git gc` and `reference-transaction` hook Waleed Khan
@ 2021-11-18  0:52 ` Bryan Turner
       [not found]   ` <CAKjfCeD1C2DTnT0cLj4hC9Nq90TOFLiPnd_SLQi0HuMQHZcCaw@mail.gmail.com>
  2021-11-18 18:08   ` Jeff King
  0 siblings, 2 replies; 5+ messages in thread
From: Bryan Turner @ 2021-11-18  0:52 UTC (permalink / raw)
  To: Waleed Khan; +Cc: git

On Wed, Nov 17, 2021 at 4:42 PM Waleed Khan <me@waleedkhan.name> wrote:
>
> Hi all,
>
> I'm seeing unusual behavior on Git built from source at
> cd3e606211bb1cf8bc57f7d76bab98cc17a150bc, but which appears to extend
> back to Git v2.29.
>
> This is a repro script:
>
> ```
> #!/bin/sh
>
> temp_dir=$(mktemp -d)
> echo "git dir is: $temp_dir"
> mkdir -p "$temp_dir"
> trap "rm -rf $temp_dir" EXIT
>
> cd "$temp_dir" || exit 1
> git init -q
> date='Thu, 07 Apr 2005 22:13:13 +0200'
> GIT_AUTHOR_DATE="$date" GIT_COMMITTER_DATE="$date" git commit -q
> --allow-empty -m 'Initial commit'
>
> mkdir -p '.git/hooks'
> cat >.git/hooks/reference-transaction <<'EOT'
> #!/bin/sh
>
> [[ "$1" != 'committed' ]] && exit 0
> echo 'New reference-transaction invocation'
>
> while read old new ref; do
>   echo "  old: $old, new: $new, ref: $ref"
> done
> EOT
> chmod +x .git/hooks/reference-transaction
>
> git gc --prune=now -q
> git show-ref refs/heads/master
> ```
>
> And this is the output it produces:
>
> ```
> git dir is: /var/folders/gn/gdp9z_g968b9nx7c9lvgy8y00000gp/T/tmp.b3Jc6qnb
> New reference-transaction invocation
>   old: 0000000000000000000000000000000000000000, new:
> e197d18c017d4038418be8b1cd38f4503e289165, ref: refs/heads/master
> New reference-transaction invocation
>   old: e197d18c017d4038418be8b1cd38f4503e289165, new:
> 0000000000000000000000000000000000000000, ref: refs/heads/master
> e197d18c017d4038418be8b1cd38f4503e289165 refs/heads/master
> ```
>
> These hooks are invoked a few milliseconds one after another.

There are two built-in "ref backends" that Git uses out of the box: A
packed backend, which manages the "packed-refs" file, and a loose
backend, which manages other files under "$GIT_DIR/refs". What you're
seeing is a reference transaction for the packed backend which is
adding the new value for "refs/heads/master" to "packed-refs" (packed
backend reference-transactions rarely, if ever, include an actual old
hash, as far as I can tell), and then a second reference transaction
for the loose backend to delete the loose "refs/heads/master" file,
now that it's packed.

>
> The expected behavior would be that the latest reference transaction
> hook refers to the state of the references on disk. That is, either
> `master` should point to 0 (be deleted), or it should have said that
> `master` pointed to `e197d1`.
>
> But if we actually examine `master`, it's set to `e197d1`, just as you
> would expect. The GC should have been a no-op overall.

One of the subtasks of "git gc" is "git pack-refs". If you inspect in
more detail, I suspect you'll find that "refs/heads/master" was loose
before "git gc" ran (as in, there was a file
"$GIT_DIR/refs/heads/master") and "packed-refs" either didn't have a
"refs/heads/master" entry or had a different hash. (Loose refs always
"win" over packed, since ref updates only write loose refs.)

Hope this helps,
Bryan Turner

>
> Best,
> Waleed

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Bug report: Strange behavior with `git gc` and `reference-transaction` hook
       [not found]   ` <CAKjfCeD1C2DTnT0cLj4hC9Nq90TOFLiPnd_SLQi0HuMQHZcCaw@mail.gmail.com>
@ 2021-11-18  1:28     ` Bryan Turner
  0 siblings, 0 replies; 5+ messages in thread
From: Bryan Turner @ 2021-11-18  1:28 UTC (permalink / raw)
  To: Waleed Khan; +Cc: git

(Please don't top-post on the list)

On Wed, Nov 17, 2021 at 5:01 PM Waleed Khan <me@waleedkhan.name> wrote:
>
> So what should users of the `reference-transaction` hook do in general? It sounds like we can never actually rely on the new value corresponding to the state of the reference on disk. Is that correct?

I wish I had a good answer here, but unfortunately I don't. My
comments are coming from a similar place: The end result of my own
experiences and learnings trying to use "reference-transaction" for
something. The dual backends, and the fact that
"reference-transaction" receives _no hints_, as far as I'm aware, of
what backend a callback is for, makes some use cases quite difficult
in practice. It would be nice if there was an environment variable, or
something, to indicate what backend a transaction was for.

But for cases like this, where a ref is being "moved" between
backends, the split nature of the backends, and the fact that each has
its own separate transaction, makes working in a "simple" script,
which has no more state than its current invocation, extremely
difficult. Of course, since Git doesn't really have an ACID mechanism
for atomically updating both "packed-refs" and loose ref files, trying
to have a single reference transaction wouldn't really work either.

>
> I suppose if we want the real new value of the reference, we should always look it up freshly? If so, the githooks documentation should be updated. Currently, it has a remark on the validity when the old reference is zero, but not on the validity of the new reference.

I'm not sure the "validity" of the new reference is the problem here.
Both reference transactions _are_ giving you the correct new value;
"packed-refs" is inserting "refs/heads/master" at e197d18, and the
loose ref for it is being deleted. To me the difficulty isn't that the
new value is _wrong_ (it's actually right), it's that, because there
are two reference transactions and no shared context between them or
hints as to what backend they're for, it's not really possible to put
2+2 together and realize that 4 is a ref being moved from loose to
packed--but otherwise unchanged.

Best regards,
Bryan Turner

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Bug report: Strange behavior with `git gc` and `reference-transaction` hook
  2021-11-18  0:52 ` Bryan Turner
       [not found]   ` <CAKjfCeD1C2DTnT0cLj4hC9Nq90TOFLiPnd_SLQi0HuMQHZcCaw@mail.gmail.com>
@ 2021-11-18 18:08   ` Jeff King
  2021-12-07  8:24     ` Patrick Steinhardt
  1 sibling, 1 reply; 5+ messages in thread
From: Jeff King @ 2021-11-18 18:08 UTC (permalink / raw)
  To: Bryan Turner; +Cc: Patrick Steinhardt, Waleed Khan, git

[+cc pks]

On Wed, Nov 17, 2021 at 04:52:46PM -0800, Bryan Turner wrote:

> > The expected behavior would be that the latest reference transaction
> > hook refers to the state of the references on disk. That is, either
> > `master` should point to 0 (be deleted), or it should have said that
> > `master` pointed to `e197d1`.
> >
> > But if we actually examine `master`, it's set to `e197d1`, just as you
> > would expect. The GC should have been a no-op overall.
> 
> One of the subtasks of "git gc" is "git pack-refs". If you inspect in
> more detail, I suspect you'll find that "refs/heads/master" was loose
> before "git gc" ran (as in, there was a file
> "$GIT_DIR/refs/heads/master") and "packed-refs" either didn't have a
> "refs/heads/master" entry or had a different hash. (Loose refs always
> "win" over packed, since ref updates only write loose refs.)

It seems totally broken to me that we'd trigger the
reference-transaction hook for ref packing. The point of the hook is to
track logical updates to the refs. But during ref packing that does not
change at all; the value remains the same. So I don't think we should be
triggering the hook at all, let alone with confusing values.

This snippet shows a simple case that I think is wrong:

-- >8 --
git init -q repo
cd repo

cat >.git/hooks/reference-transaction <<\EOF
#!/bin/sh
echo >&2 "==> reference-transaction $*"
sed 's/^/  /'
EOF
chmod +x .git/hooks/reference-transaction

echo >&2 "running commit..."
git commit --allow-empty -qm foo
echo >&2 "running pack-refs..."
git pack-refs --all --prune
-- >8 --

It produces:

  running commit...
  ==> reference-transaction prepared
    0000000000000000000000000000000000000000 77bcab0d950aee3021e8aa13a15d40e7a9a5f71b HEAD
    0000000000000000000000000000000000000000 77bcab0d950aee3021e8aa13a15d40e7a9a5f71b refs/heads/main
  ==> reference-transaction committed
    0000000000000000000000000000000000000000 77bcab0d950aee3021e8aa13a15d40e7a9a5f71b HEAD
    0000000000000000000000000000000000000000 77bcab0d950aee3021e8aa13a15d40e7a9a5f71b refs/heads/main
  running pack-refs...
  ==> reference-transaction prepared
    0000000000000000000000000000000000000000 77bcab0d950aee3021e8aa13a15d40e7a9a5f71b refs/heads/main
  ==> reference-transaction committed
    0000000000000000000000000000000000000000 77bcab0d950aee3021e8aa13a15d40e7a9a5f71b refs/heads/main
  ==> reference-transaction prepared
    77bcab0d950aee3021e8aa13a15d40e7a9a5f71b 0000000000000000000000000000000000000000 refs/heads/main
  ==> reference-transaction committed
    77bcab0d950aee3021e8aa13a15d40e7a9a5f71b 0000000000000000000000000000000000000000 refs/heads/main

I think the final four invocations should be skipped entirely. They're
pointless at best (nothing actually changed), and extremely misleading
at worst (they look like the ref ended up deleted!).

-Peff

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Bug report: Strange behavior with `git gc` and `reference-transaction` hook
  2021-11-18 18:08   ` Jeff King
@ 2021-12-07  8:24     ` Patrick Steinhardt
  0 siblings, 0 replies; 5+ messages in thread
From: Patrick Steinhardt @ 2021-12-07  8:24 UTC (permalink / raw)
  To: Jeff King; +Cc: Bryan Turner, Waleed Khan, git

[-- Attachment #1: Type: text/plain, Size: 4393 bytes --]

On Thu, Nov 18, 2021 at 01:08:41PM -0500, Jeff King wrote:
> [+cc pks]
> 
> On Wed, Nov 17, 2021 at 04:52:46PM -0800, Bryan Turner wrote:
> 
> > > The expected behavior would be that the latest reference transaction
> > > hook refers to the state of the references on disk. That is, either
> > > `master` should point to 0 (be deleted), or it should have said that
> > > `master` pointed to `e197d1`.
> > >
> > > But if we actually examine `master`, it's set to `e197d1`, just as you
> > > would expect. The GC should have been a no-op overall.
> > 
> > One of the subtasks of "git gc" is "git pack-refs". If you inspect in
> > more detail, I suspect you'll find that "refs/heads/master" was loose
> > before "git gc" ran (as in, there was a file
> > "$GIT_DIR/refs/heads/master") and "packed-refs" either didn't have a
> > "refs/heads/master" entry or had a different hash. (Loose refs always
> > "win" over packed, since ref updates only write loose refs.)
> 
> It seems totally broken to me that we'd trigger the
> reference-transaction hook for ref packing. The point of the hook is to
> track logical updates to the refs. But during ref packing that does not
> change at all; the value remains the same. So I don't think we should be
> triggering the hook at all, let alone with confusing values.
> 
> This snippet shows a simple case that I think is wrong:
> 
> -- >8 --
> git init -q repo
> cd repo
> 
> cat >.git/hooks/reference-transaction <<\EOF
> #!/bin/sh
> echo >&2 "==> reference-transaction $*"
> sed 's/^/  /'
> EOF
> chmod +x .git/hooks/reference-transaction
> 
> echo >&2 "running commit..."
> git commit --allow-empty -qm foo
> echo >&2 "running pack-refs..."
> git pack-refs --all --prune
> -- >8 --
> 
> It produces:
> 
>   running commit...
>   ==> reference-transaction prepared
>     0000000000000000000000000000000000000000 77bcab0d950aee3021e8aa13a15d40e7a9a5f71b HEAD
>     0000000000000000000000000000000000000000 77bcab0d950aee3021e8aa13a15d40e7a9a5f71b refs/heads/main
>   ==> reference-transaction committed
>     0000000000000000000000000000000000000000 77bcab0d950aee3021e8aa13a15d40e7a9a5f71b HEAD
>     0000000000000000000000000000000000000000 77bcab0d950aee3021e8aa13a15d40e7a9a5f71b refs/heads/main
>   running pack-refs...
>   ==> reference-transaction prepared
>     0000000000000000000000000000000000000000 77bcab0d950aee3021e8aa13a15d40e7a9a5f71b refs/heads/main
>   ==> reference-transaction committed
>     0000000000000000000000000000000000000000 77bcab0d950aee3021e8aa13a15d40e7a9a5f71b refs/heads/main
>   ==> reference-transaction prepared
>     77bcab0d950aee3021e8aa13a15d40e7a9a5f71b 0000000000000000000000000000000000000000 refs/heads/main
>   ==> reference-transaction committed
>     77bcab0d950aee3021e8aa13a15d40e7a9a5f71b 0000000000000000000000000000000000000000 refs/heads/main
> 
> I think the final four invocations should be skipped entirely. They're
> pointless at best (nothing actually changed), and extremely misleading
> at worst (they look like the ref ended up deleted!).
> 
> -Peff

Yeah, I agree that this is something that is totally misleading.

For what it's worth, we also hit a similar case in production at GitLab,
where we use the hook to do voting on ref updates across different
nodes. Sometimes we observed different votes on a subset of nodes, and
it took me quite some time to figure out that this was dependent on
whether a ref was packed or not. We're now filtering out transactions
which consist only of force-deletions [1], which are _likely_ to be
cleanups of such packed refs. But this is very clearly a hack, and I
agree that calling the hook for 

In the end, these really are special cases of how the "files" backend
works and thus are implementation details which shouldn't be exposed to
the user at all. With these implementation details exposed, we'll start
to see different behaviour of when the hook is executed depending on
which ref backend you use, which is even worse compared to the current
state where it's at least consistently misleading.

As Peff said, the hook should really only track logical changes and not
expose any implementation details.

Patrick

[1]: https://gitlab.com/gitlab-org/gitaly/-/blob/3ef55853e9e161204464868390d97d1a1577042d/internal/gitaly/hook/referencetransaction.go#L58

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-12-07  8:24 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-18  0:41 Bug report: Strange behavior with `git gc` and `reference-transaction` hook Waleed Khan
2021-11-18  0:52 ` Bryan Turner
     [not found]   ` <CAKjfCeD1C2DTnT0cLj4hC9Nq90TOFLiPnd_SLQi0HuMQHZcCaw@mail.gmail.com>
2021-11-18  1:28     ` Bryan Turner
2021-11-18 18:08   ` Jeff King
2021-12-07  8:24     ` Patrick Steinhardt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).