All of lore.kernel.org
 help / color / mirror / Atom feed
* `git fetch` with protocol.version=1 misses tags that point to the fetched history
@ 2024-01-23 21:08 Josh Steadmon
  2024-01-24  1:00 ` [PATCH] transport-helper: re-examine object dir after fetching Jeff King
  0 siblings, 1 reply; 3+ messages in thread
From: Josh Steadmon @ 2024-01-23 21:08 UTC (permalink / raw)
  To: git; +Cc: bcmills, peff

At $DAYJOB, we got a bug report that Git 2.21.0 breaks Go's CI due to
not fetching all tags in the history. The bug reporter (Bryan, CCed) was
kind enough to bisect this failure down to 61c7711cfe (sha1-file: use
loose object cache for quick existence check, 2018-11-12). This was only
recently discovered because Go's CI was using Git v2.17.6.

More details can be found in the original bug report [1] and Go's issue
for the CI breakage [2].

[1] https://git.g-issues.gerritcodereview.com/issues/320678525
[2] https://github.com/golang/go/issues/56881

A log of the failing Go test follows:


  vcs-test.golang.org rerouted to http://127.0.0.1:36865
  https://vcs-test.golang.org rerouted to https://127.0.0.1:43597
  === RUN   TestStat
  === PAUSE TestStat
  === CONT  TestStat
  === RUN   TestStat/gitrepo1/v1.2.4-annotated
  === PAUSE TestStat/gitrepo1/v1.2.4-annotated
  === CONT  TestStat/gitrepo1/v1.2.4-annotated
      git_test.go:166: mkdir -p /tmp/gitrepo-test-767558581/modcache/cache/vcs # git3 http://127.0.0.1:36865/git/gitrepo1
      git_test.go:166: # lock /tmp/gitrepo-test-767558581/modcache/cache/vcs/72d702b2b3eb7cb41c516773b2d675c9ee1480e2be1059cbf3bd15f9cfac2bf0.lock
      git_test.go:166: mkdir -p /tmp/gitrepo-test-767558581/modcache/cache/vcs/72d702b2b3eb7cb41c516773b2d675c9ee1480e2be1059cbf3bd15f9cfac2bf0 # git3 http://127.0.0.1:36865/git/gitrepo1
      git_test.go:166: cd /tmp/gitrepo-test-767558581/modcache/cache/vcs/72d702b2b3eb7cb41c516773b2d675c9ee1480e2be1059cbf3bd15f9cfac2bf0; git init --bare
      git_test.go:166: 0.011s # cd /tmp/gitrepo-test-767558581/modcache/cache/vcs/72d702b2b3eb7cb41c516773b2d675c9ee1480e2be1059cbf3bd15f9cfac2bf0; git init --bare
      git_test.go:166: cd /tmp/gitrepo-test-767558581/modcache/cache/vcs/72d702b2b3eb7cb41c516773b2d675c9ee1480e2be1059cbf3bd15f9cfac2bf0; git remote add origin -- http://127.0.0.1:36865/git/gitrepo1
      git_test.go:166: 0.008s # cd /tmp/gitrepo-test-767558581/modcache/cache/vcs/72d702b2b3eb7cb41c516773b2d675c9ee1480e2be1059cbf3bd15f9cfac2bf0; git remote add origin -- http://127.0.0.1:36865/git/gitrepo1
      git_test.go:166: cd /tmp/gitrepo-test-767558581/modcache/cache/vcs/72d702b2b3eb7cb41c516773b2d675c9ee1480e2be1059cbf3bd15f9cfac2bf0; git tag -l
      git_test.go:166: 0.008s # cd /tmp/gitrepo-test-767558581/modcache/cache/vcs/72d702b2b3eb7cb41c516773b2d675c9ee1480e2be1059cbf3bd15f9cfac2bf0; git tag -l
      git_test.go:166: cd /tmp/gitrepo-test-767558581/modcache/cache/vcs/72d702b2b3eb7cb41c516773b2d675c9ee1480e2be1059cbf3bd15f9cfac2bf0; git ls-remote -q origin
  2024/01/17 14:11:49 serving /git/gitrepo1/info/refs?service=git-upload-pack
  2024/01/17 14:11:49 gitrepo1.txt:
  > handle git
  > env GIT_AUTHOR_NAME='Russ Cox'
  > env GIT_AUTHOR_EMAIL='rsc@golang.org'
  > env GIT_COMMITTER_NAME=$GIT_AUTHOR_NAME
  > env GIT_COMMITTER_EMAIL=$GIT_AUTHOR_EMAIL
  > git init
  [stdout]
  Initialized empty Git repository in /tmp/vcstest1128851619/git/gitrepo1/.git/
  > at 2018-04-17T15:43:22-04:00
  > unquote ''
  > cp stdout README
  > git add README
  > git commit -a -m 'empty README'
  [stdout]
  [main (root-commit) ede458d] empty README
   1 file changed, 0 insertions(+), 0 deletions(-)
   create mode 100644 README
  > git branch -m master
  > git tag v1.2.3
  > at 2018-04-17T15:45:48-04:00
  > git branch v2
  > git checkout v2
  [stderr]
  Switched to branch 'v2'
  > echo 'v2'
  [stdout]
  v2
  > cp stdout v2
  > git add v2
  > git commit -a -m 'v2'
  [stdout]
  [v2 76a00fb] v2
   1 file changed, 1 insertion(+)
   create mode 100644 v2
  > git tag v2.3
  > git tag v2.0.1
  > git branch v2.3.4
  > at 2018-04-17T16:00:19-04:00
  > echo 'intermediate'
  [stdout]
  intermediate
  > cp stdout foo.txt
  > git add foo.txt
  > git commit -a -m 'intermediate'
  [stdout]
  [v2 97f6aa5] intermediate
   1 file changed, 1 insertion(+)
   create mode 100644 foo.txt
  > at 2018-04-17T16:00:32-04:00
  > echo 'another'
  [stdout]
  another
  > cp stdout another.txt
  > git add another.txt
  > git commit -a -m 'another'
  [stdout]
  [v2 9d02800] another
   1 file changed, 1 insertion(+)
   create mode 100644 another.txt
  > git tag v2.0.2
  > at 2018-04-17T16:16:52-04:00
  > git checkout master
  [stderr]
  Switched to branch 'master'
  > git branch v3
  > git checkout v3
  [stderr]
  Switched to branch 'v3'
  > mkdir v3/sub/dir
  > echo 'v3/sub/dir/file'
  [stdout]
  v3/sub/dir/file
  > cp stdout v3/sub/dir/file.txt
  > git add v3
  > git commit -a -m 'add v3/sub/dir/file.txt'
  [stdout]
  [v3 a8205f8] add v3/sub/dir/file.txt
   1 file changed, 1 insertion(+)
   create mode 100644 v3/sub/dir/file.txt
  > at 2018-04-17T22:23:00-04:00
  > git checkout master
  [stderr]
  Switched to branch 'master'
  > git tag -a v1.2.4-annotated -m 'v1.2.4-annotated'
  > git show-ref --tags --heads
  [stdout]
  ede458df7cd0fdca520df19a33158086a8a68e81 refs/heads/master
  9d02800338b8a55be062c838d1f02e0c5780b9eb refs/heads/v2
  76a00fb249b7f93091bc2c89a789dab1fc1bc26f refs/heads/v2.3.4
  a8205f853c297ad2c3c502ba9a355b35b7dd3ca5 refs/heads/v3
  ede458df7cd0fdca520df19a33158086a8a68e81 refs/tags/v1.2.3
  b004e48a345a86ed7a2fb7debfa7e0b2f9b0dd91 refs/tags/v1.2.4-annotated
  76a00fb249b7f93091bc2c89a789dab1fc1bc26f refs/tags/v2.0.1
  9d02800338b8a55be062c838d1f02e0c5780b9eb refs/tags/v2.0.2
  76a00fb249b7f93091bc2c89a789dab1fc1bc26f refs/tags/v2.3
  > cmp stdout .git-refs

      git_test.go:166: 0.194s # cd /tmp/gitrepo-test-767558581/modcache/cache/vcs/72d702b2b3eb7cb41c516773b2d675c9ee1480e2be1059cbf3bd15f9cfac2bf0; git ls-remote -q origin
      git_test.go:166: cd /tmp/gitrepo-test-767558581/modcache/cache/vcs/72d702b2b3eb7cb41c516773b2d675c9ee1480e2be1059cbf3bd15f9cfac2bf0; git -c log.showsignature=false log --no-decorate -n1 '--format=format:%H %ct %D' ede458df7cd0fdca520df19a33158086a8a68e81 --
      git_test.go:166: 0.006s # cd /tmp/gitrepo-test-767558581/modcache/cache/vcs/72d702b2b3eb7cb41c516773b2d675c9ee1480e2be1059cbf3bd15f9cfac2bf0; git -c log.showsignature=false log --no-decorate -n1 '--format=format:%H %ct %D' ede458df7cd0fdca520df19a33158086a8a68e81 --
      git_test.go:166: cd /tmp/gitrepo-test-767558581/modcache/cache/vcs/72d702b2b3eb7cb41c516773b2d675c9ee1480e2be1059cbf3bd15f9cfac2bf0; git fetch -f --depth=1 origin refs/tags/v1.2.4-annotated:refs/tags/v1.2.4-annotated
  2024/01/17 14:11:49 serving /git/gitrepo1/info/refs?service=git-upload-pack
  2024/01/17 14:11:49 serving /git/gitrepo1/git-upload-pack
  2024/01/17 14:11:49 serving /git/gitrepo1/git-upload-pack
      git_test.go:166: 0.113s # cd /tmp/gitrepo-test-767558581/modcache/cache/vcs/72d702b2b3eb7cb41c516773b2d675c9ee1480e2be1059cbf3bd15f9cfac2bf0; git fetch -f --depth=1 origin refs/tags/v1.2.4-annotated:refs/tags/v1.2.4-annotated
      git_test.go:166: cd /tmp/gitrepo-test-767558581/modcache/cache/vcs/72d702b2b3eb7cb41c516773b2d675c9ee1480e2be1059cbf3bd15f9cfac2bf0; git -c log.showsignature=false log --no-decorate -n1 '--format=format:%H %ct %D' refs/tags/v1.2.4-annotated --
      git_test.go:166: 0.007s # cd /tmp/gitrepo-test-767558581/modcache/cache/vcs/72d702b2b3eb7cb41c516773b2d675c9ee1480e2be1059cbf3bd15f9cfac2bf0; git -c log.showsignature=false log --no-decorate -n1 '--format=format:%H %ct %D' refs/tags/v1.2.4-annotated --
      git_test.go:661: Stat: incorrect info
          have {Origin:<nil> Name:ede458df7cd0fdca520df19a33158086a8a68e81 Short:ede458df7cd0 Version:v1.2.4-annotated Time:2018-04-17 19:43:22 +0000 UTC Tags:[v1.2.4-annotated]}
          want {Origin:<nil> Name:ede458df7cd0fdca520df19a33158086a8a68e81 Short:ede458df7cd0 Version:v1.2.4-annotated Time:2018-04-17 19:43:22 +0000 UTC Tags:[v1.2.3 v1.2.4-annotated]}
  --- FAIL: TestStat (0.00s)
      --- FAIL: TestStat/gitrepo1/v1.2.4-annotated (0.35s)
  FAIL
  FAIL	cmd/go/internal/modfetch/codehost	0.398s
  FAIL

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH] transport-helper: re-examine object dir after fetching
  2024-01-23 21:08 `git fetch` with protocol.version=1 misses tags that point to the fetched history Josh Steadmon
@ 2024-01-24  1:00 ` Jeff King
  2024-01-24 19:31   ` Junio C Hamano
  0 siblings, 1 reply; 3+ messages in thread
From: Jeff King @ 2024-01-24  1:00 UTC (permalink / raw)
  To: Josh Steadmon; +Cc: git, bcmills

On Tue, Jan 23, 2024 at 01:08:01PM -0800, Josh Steadmon wrote:

> At $DAYJOB, we got a bug report that Git 2.21.0 breaks Go's CI due to
> not fetching all tags in the history. The bug reporter (Bryan, CCed) was
> kind enough to bisect this failure down to 61c7711cfe (sha1-file: use
> loose object cache for quick existence check, 2018-11-12). This was only
> recently discovered because Go's CI was using Git v2.17.6.
> 
> More details can be found in the original bug report [1] and Go's issue
> for the CI breakage [2].

Thanks, I was able to reproduce it. Besides using the v0 protocol, two
key elements are that the server is http and the use of --depth.

The patch below explains what's going on and should fix it. I prepared
the patch on top of 'master', but it can also be applied directly on
61c7711cfe or on v2.21.0, modulo some minor textual conflicts in the
test script (modern t5551 has some more tests, and no longer calls
stop_httpd manually).

-- >8 --
Subject: transport-helper: re-examine object dir after fetching

This patch fixes a bug where fetch over http (or any helper) using the
v0 protocol may sometimes fail to auto-follow tags. The bug comes from
61c7711cfe (sha1-file: use loose object cache for quick existence check,
2018-11-12). But to explain why (and why this is the right fix), let's
take a step back.

After fetching a pack, the object database has changed, but we may still
hold in-memory caches that are now out of date. Traditionally this was
just the packed_git list, but 61c7711cfe started using a loose-object
cache, as well.

Usually these caches are invalidated automatically. When an expected
object cannot be found, the low-level object lookup routines call
reprepare_packed_git(), which re-scans the set of packs (and thanks to
some preparatory patches ahead of 61c7711cfe, throws away the loose
object cache). But not all calls do this! In some cases we expect that
the object might not exist, and pass OBJECT_INFO_QUICK to tell the
low-level routines not to bother re-scanning. And the tag auto-following
code is one such caller, since we are asking about oids that the other
side has (but we might not have locally).

To deal with this, we explicitly call reprepare_packed_git() ourselves
after fetching a pack; this goes all the way back to 48ec3e5c07
(Incorporate fetched packs in future object traversal, 2008-06-15). But
that only helps if we call fetch_pack() in the main fetch process. When
we're using a transport helper, it happens in a separate sub-process,
and the parent process is left with old values. So this is only a
problem with protocols which require a separate helper process (like
http).

This patch fixes it by teaching the parent process in the transport
helper relationship to make that same reprepare call after the helper
finishes fetching.

You might be left with some lingering questions, like:

  1. Why only the v0 protocol, and not v2? It's because in v2 the child
     helper doesn't actually run fetch_pack(); it merely establishes a
     tunnel over which the main process can talk to the remote side (so
     the fetch_pack() and reprepare happen in the main process).

  2. Wouldn't we have the same bug even before the 61c7711cfe added
     the loose object cache? For example, when we store the fetch as a
     pack locally, wouldn't our packed_git list still be out of date?

     If we store a pack, everything works because other parts of the
     fetch process happen to trigger a call to reprepare_packed_git().
     In particular, before storing whatever ref was originally
     requested, we'll make sure we have the pointed-to object, and that
     call happens without the QUICK flag. So in that case we'll see that
     we don't know about it, reprepare, and then repeat our lookup. And
     now we _do_ know about the pack, and further calls with QUICK will
     find its contents.

     Whereas when we unpack the result into loose objects, we never get
     that same invalidation trigger. We didn't have packs before, and we
     don't after. But when we do the loose object lookup, we find the
     object. There's no way to realize that we didn't have the object
     before the pack, and that having it now means things have changed
     (in theory we could do a superfluous cache lookup to see that it
     was missing from the old cache; but depending on the tags the other
     side showed us, we might not even have filled in that part of the
     cache earlier).

  3. Why does the included test use "--depth 1"? This is important
     because without it, we happen to invalidate the cache as a side
     effect of other parts of the fetch process. What happens in a
     non-shallow fetch is something like this:

        1. we call find_non_local_tags() once before actually getting the
           pack, to see if there are any tags we can fill in from what we
           already have. This fills in the cache (which is obviously
           missing objects we're about to fetch).

        2. before fetching the actual pack, fetch_and_consume_refs()
           calls check_exist_and_connected(), to see if we even need to
           fetch a pack at all. This doesn't use QUICK (though arguably
           it could, as it's purely an optimization). And since it sees
           there are objects we are indeed missing, that triggers a
           reprepare_packed_git() call, which throws out the loose object
           cache.

        3. after fetching, now we call find_non_local_tags() again. And
           since step (2) invalidated our loose object cache, we find
           the new objects and create the tags.

     So everything works, but mostly due to luck. Whereas in a fetch
     with --depth, we skip step 2 entirely, and thus the out-of-date
     cache is still in place for step 3, giving us the wrong answer.

So the test works with a small "--depth 1" fetch, which makes sure that
we don't store the pack from the other side, and that we don't trigger
the accidental cache invalidation. And of course it forces the use of
v0 along with using the http protocol.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t5551-http-fetch-smart.sh | 18 ++++++++++++++++++
 transport-helper.c          |  3 +++
 2 files changed, 21 insertions(+)

diff --git a/t/t5551-http-fetch-smart.sh b/t/t5551-http-fetch-smart.sh
index e069737b80..a623a1058c 100755
--- a/t/t5551-http-fetch-smart.sh
+++ b/t/t5551-http-fetch-smart.sh
@@ -733,4 +733,22 @@ test_expect_success 'no empty path components' '
 	! grep "//" log
 '
 
+test_expect_success 'tag following always works over v0 http' '
+	upstream=$HTTPD_DOCUMENT_ROOT_PATH/tags &&
+	git init "$upstream" &&
+	(
+		cd "$upstream" &&
+		git commit --allow-empty -m base &&
+		git tag not-annotated &&
+		git tag -m foo annotated
+	) &&
+	git init tags &&
+	git -C tags -c protocol.version=0 \
+		fetch --depth 1 $HTTPD_URL/smart/tags \
+		refs/tags/annotated:refs/tags/annotated &&
+	git -C "$upstream" for-each-ref refs/tags >expect &&
+	git -C tags for-each-ref >actual &&
+	test_cmp expect actual
+'
+
 test_done
diff --git a/transport-helper.c b/transport-helper.c
index e34a8f47cf..07e42e239a 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -17,6 +17,7 @@
 #include "refspec.h"
 #include "transport-internal.h"
 #include "protocol.h"
+#include "packfile.h"
 
 static int debug;
 
@@ -432,6 +433,8 @@ static int fetch_with_fetch(struct transport *transport,
 			warning(_("%s unexpectedly said: '%s'"), data->name, buf.buf);
 	}
 	strbuf_release(&buf);
+
+	reprepare_packed_git(the_repository);
 	return 0;
 }
 
-- 
2.43.0.721.gf5abbd674f


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] transport-helper: re-examine object dir after fetching
  2024-01-24  1:00 ` [PATCH] transport-helper: re-examine object dir after fetching Jeff King
@ 2024-01-24 19:31   ` Junio C Hamano
  0 siblings, 0 replies; 3+ messages in thread
From: Junio C Hamano @ 2024-01-24 19:31 UTC (permalink / raw)
  To: Jeff King; +Cc: Josh Steadmon, git, bcmills

Jeff King <peff@peff.net> writes:

> Thanks, I was able to reproduce it. Besides using the v0 protocol, two
> key elements are that the server is http and the use of --depth.
>
> The patch below explains what's going on and should fix it. I prepared
> the patch on top of 'master', but it can also be applied directly on
> 61c7711cfe or on v2.21.0, modulo some minor textual conflicts in the
> test script (modern t5551 has some more tests, and no longer calls
> stop_httpd manually).

Thanks.

The usual "one liner fix that requires two-page explanation"
that only you can produce ;-).

> ...
>      So everything works, but mostly due to luck. Whereas in a fetch
>      with --depth, we skip step 2 entirely, and thus the out-of-date
>      cache is still in place for step 3, giving us the wrong answer.
>
> So the test works with a small "--depth 1" fetch, which makes sure that
> we don't store the pack from the other side, and that we don't trigger
> the accidental cache invalidation. And of course it forces the use of
> v0 along with using the http protocol.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  t/t5551-http-fetch-smart.sh | 18 ++++++++++++++++++
>  transport-helper.c          |  3 +++
>  2 files changed, 21 insertions(+)
>
> diff --git a/t/t5551-http-fetch-smart.sh b/t/t5551-http-fetch-smart.sh
> index e069737b80..a623a1058c 100755
> --- a/t/t5551-http-fetch-smart.sh
> +++ b/t/t5551-http-fetch-smart.sh
> @@ -733,4 +733,22 @@ test_expect_success 'no empty path components' '
>  	! grep "//" log
>  '
>  
> +test_expect_success 'tag following always works over v0 http' '
> +	upstream=$HTTPD_DOCUMENT_ROOT_PATH/tags &&
> +	git init "$upstream" &&
> +	(
> +		cd "$upstream" &&
> +		git commit --allow-empty -m base &&
> +		git tag not-annotated &&
> +		git tag -m foo annotated
> +	) &&
> +	git init tags &&
> +	git -C tags -c protocol.version=0 \
> +		fetch --depth 1 $HTTPD_URL/smart/tags \
> +		refs/tags/annotated:refs/tags/annotated &&
> +	git -C "$upstream" for-each-ref refs/tags >expect &&
> +	git -C tags for-each-ref >actual &&
> +	test_cmp expect actual
> +'
> +
>  test_done
> diff --git a/transport-helper.c b/transport-helper.c
> index e34a8f47cf..07e42e239a 100644
> --- a/transport-helper.c
> +++ b/transport-helper.c
> @@ -17,6 +17,7 @@
>  #include "refspec.h"
>  #include "transport-internal.h"
>  #include "protocol.h"
> +#include "packfile.h"
>  
>  static int debug;
>  
> @@ -432,6 +433,8 @@ static int fetch_with_fetch(struct transport *transport,
>  			warning(_("%s unexpectedly said: '%s'"), data->name, buf.buf);
>  	}
>  	strbuf_release(&buf);
> +
> +	reprepare_packed_git(the_repository);
>  	return 0;
>  }

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-01-24 19:31 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-23 21:08 `git fetch` with protocol.version=1 misses tags that point to the fetched history Josh Steadmon
2024-01-24  1:00 ` [PATCH] transport-helper: re-examine object dir after fetching Jeff King
2024-01-24 19:31   ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.