git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] bundle-uri: refresh packed_git if unbundle succeed
@ 2024-05-15  3:01 blanet via GitGitGadget
  2024-05-17  5:00 ` Patrick Steinhardt
                   ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: blanet via GitGitGadget @ 2024-05-15  3:01 UTC (permalink / raw)
  To: git; +Cc: blanet, Xing Xin

From: Xing Xin <xingxin.xx@bytedance.com>

When using the bundle-uri mechanism with a bundle list containing
multiple interrelated bundles, we encountered a bug where tips from
downloaded bundles were not being discovered, resulting in rather slow
clones. This was particularly problematic when employing the heuristic
`creationTokens`.

And this is easy to reproduce. Suppose we have a repository with a
single branch `main` pointing to commit `A`, firstly we create a base
bundle with

  git bundle create base.bundle main

Then let's add a new commit `B` on top of `A`, so that an incremental
bundle for `main` can be created with

  git bundle create incr.bundle A..main

Now we can generate a bundle list with the following content:

  [bundle]
      version = 1
      mode = all
      heuristic = creationToken

  [bundle "base"]
      uri = base.bundle
      creationToken = 1

  [bundle "incr"]
      uri = incr.bundle
      creationToken = 2

A fresh clone with the bundle list above would give the expected
`refs/bundles/main` pointing at `B` in new repository, in other words we
already had everything locally from the bundles, but git would still
download everything from server as if we got nothing.

So why the `refs/bundles/main` is not discovered? After some digging I
found that:

1. when unbundling a downloaded bundle, a `verify_bundle` is called to
   check its prerequisites if any. The verify procedure would find oids
   so `packed_git` is initialized.

2. after unbundled all bundles, we would enter `do_fetch_pack_v2`,
   during which `mark_complete_and_common_ref` and `mark_tips` would
   find oids with `OBJECT_INFO_QUICK` flag set, so no new packs would be
   enlisted if `packed_git` has already initialized in 1.

Back to the example above, when unbunding `incr.bundle`, `base.pack` is
enlisted to `packed_git` bacause of the prerequisites to verify. Then we
can not find `B` for negotiation at a latter time bacause `B` exists in
`incr.pack` which is not enlisted in `packed_git`.

This commit fixes this by adding a `reprepare_packed_git` call for every
successfully unbundled bundle, which ensures to enlist all generated
packs from bundle uri. And a set of negotiation related tests are added.

Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
---
    bundle-uri: refresh packed_git if unbundle succeed

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1730%2Fblanet%2Fxx%2Fbundle-uri-bug-using-bundle-list-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1730/blanet/xx/bundle-uri-bug-using-bundle-list-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1730

 bundle-uri.c                |   3 +
 t/t5558-clone-bundle-uri.sh | 129 ++++++++++++++++++++++++++++++++++++
 2 files changed, 132 insertions(+)

diff --git a/bundle-uri.c b/bundle-uri.c
index ca32050a78f..2b9d36cfd8e 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -7,6 +7,7 @@
 #include "refs.h"
 #include "run-command.h"
 #include "hashmap.h"
+#include "packfile.h"
 #include "pkt-line.h"
 #include "config.h"
 #include "remote.h"
@@ -376,6 +377,8 @@ static int unbundle_from_file(struct repository *r, const char *file)
 			       VERIFY_BUNDLE_QUIET)))
 		return 1;
 
+	reprepare_packed_git(r);
+
 	/*
 	 * Convert all refs/heads/ from the bundle into refs/bundles/
 	 * in the local repository.
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index 1ca5f745e73..a5b04d6f187 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -20,7 +20,10 @@ test_expect_success 'fail to clone from non-bundle file' '
 test_expect_success 'create bundle' '
 	git init clone-from &&
 	git -C clone-from checkout -b topic &&
+
 	test_commit -C clone-from A &&
+	git -C clone-from bundle create A.bundle topic &&
+
 	test_commit -C clone-from B &&
 	git -C clone-from bundle create B.bundle topic
 '
@@ -259,6 +262,132 @@ test_expect_success 'clone bundle list (file, any mode, all failures)' '
 	! grep "refs/bundles/" refs
 '
 
+#########################################################################
+# Clone negotiation related tests begin here
+
+test_expect_success 'negotiation: bundle with part of wanted commits' '
+	test_when_finished rm -rf trace*.txt &&
+	GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+	git clone --no-local --bundle-uri="clone-from/A.bundle" \
+		clone-from nego-bundle-part &&
+	git -C nego-bundle-part for-each-ref --format="%(refname)" >refs &&
+	grep "refs/bundles/" refs >actual &&
+	cat >expect <<-\EOF &&
+	refs/bundles/topic
+	EOF
+	test_cmp expect actual &&
+	# Ensure that refs/bundles/topic are sent as "have".
+	grep "clone> have $(git -C clone-from rev-parse A)" trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle with all wanted commits' '
+	test_when_finished rm -rf trace*.txt &&
+	GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+	git clone --no-local --single-branch --branch=topic --no-tags \
+		--bundle-uri="clone-from/B.bundle" \
+		clone-from nego-bundle-all &&
+	git -C nego-bundle-all for-each-ref --format="%(refname)" >refs &&
+	grep "refs/bundles/" refs >actual &&
+	cat >expect <<-\EOF &&
+	refs/bundles/topic
+	EOF
+	test_cmp expect actual &&
+	# We already have all needed commits so no "want" needed.
+	! grep "clone> want " trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle list (no heuristic)' '
+	test_when_finished rm -f trace*.txt &&
+	cat >bundle-list <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+
+	[bundle "bundle-1"]
+		uri = file://$(pwd)/clone-from/bundle-1.bundle
+
+	[bundle "bundle-2"]
+		uri = file://$(pwd)/clone-from/bundle-2.bundle
+	EOF
+
+	GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+	git clone --no-local --bundle-uri="file://$(pwd)/bundle-list" \
+		clone-from nego-bundle-list-no-heuristic &&
+
+	git -C nego-bundle-list-no-heuristic for-each-ref --format="%(refname)" >refs &&
+	grep "refs/bundles/" refs >actual &&
+	cat >expect <<-\EOF &&
+	refs/bundles/base
+	refs/bundles/left
+	EOF
+	test_cmp expect actual &&
+	grep "clone> have $(git -C nego-bundle-list-no-heuristic rev-parse refs/bundles/left)" trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle list (creationToken)' '
+	test_when_finished rm -f trace*.txt &&
+	cat >bundle-list <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "bundle-1"]
+		uri = file://$(pwd)/clone-from/bundle-1.bundle
+		creationToken = 1
+
+	[bundle "bundle-2"]
+		uri = file://$(pwd)/clone-from/bundle-2.bundle
+		creationToken = 2
+	EOF
+
+	GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+	git clone --no-local --bundle-uri="file://$(pwd)/bundle-list" \
+		clone-from nego-bundle-list-heuristic &&
+
+	git -C nego-bundle-list-heuristic for-each-ref --format="%(refname)" >refs &&
+	grep "refs/bundles/" refs >actual &&
+	cat >expect <<-\EOF &&
+	refs/bundles/base
+	refs/bundles/left
+	EOF
+	test_cmp expect actual &&
+	grep "clone> have $(git -C nego-bundle-list-heuristic rev-parse refs/bundles/left)" trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle list with all wanted commits' '
+	test_when_finished rm -f trace*.txt &&
+	cat >bundle-list <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "bundle-1"]
+		uri = file://$(pwd)/clone-from/bundle-1.bundle
+		creationToken = 1
+
+	[bundle "bundle-2"]
+		uri = file://$(pwd)/clone-from/bundle-2.bundle
+		creationToken = 2
+	EOF
+
+	GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+	git clone --no-local --single-branch --branch=left --no-tags \
+		--bundle-uri="file://$(pwd)/bundle-list" \
+		clone-from nego-bundle-list-all &&
+
+	git -C nego-bundle-list-all for-each-ref --format="%(refname)" >refs &&
+	grep "refs/bundles/" refs >actual &&
+	cat >expect <<-\EOF &&
+	refs/bundles/base
+	refs/bundles/left
+	EOF
+	test_cmp expect actual &&
+	# We already have all needed commits so no "want" needed.
+	! grep "clone> want " trace-packet.txt
+'
+
 #########################################################################
 # HTTP tests begin here
 

base-commit: 83f1add914c6b4682de1e944ec0d1ac043d53d78
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH] bundle-uri: refresh packed_git if unbundle succeed
  2024-05-15  3:01 [PATCH] bundle-uri: refresh packed_git if unbundle succeed blanet via GitGitGadget
@ 2024-05-17  5:00 ` Patrick Steinhardt
  2024-05-17 16:14   ` Junio C Hamano
  2024-05-20  9:41   ` Xing Xin
  2024-05-17  7:36 ` Karthik Nayak
  2024-05-20 12:36 ` [PATCH v2] bundle-uri: verify oid before writing refs blanet via GitGitGadget
  2 siblings, 2 replies; 24+ messages in thread
From: Patrick Steinhardt @ 2024-05-17  5:00 UTC (permalink / raw)
  To: blanet via GitGitGadget; +Cc: git, blanet, Xing Xin

[-- Attachment #1: Type: text/plain, Size: 4820 bytes --]

On Wed, May 15, 2024 at 03:01:09AM +0000, blanet via GitGitGadget wrote:
> From: Xing Xin <xingxin.xx@bytedance.com>

Long time no see :)

> When using the bundle-uri mechanism with a bundle list containing
> multiple interrelated bundles, we encountered a bug where tips from
> downloaded bundles were not being discovered, resulting in rather slow
> clones. This was particularly problematic when employing the heuristic
> `creationTokens`.
> 
> And this is easy to reproduce. Suppose we have a repository with a
> single branch `main` pointing to commit `A`, firstly we create a base
> bundle with
> 
>   git bundle create base.bundle main
> 
> Then let's add a new commit `B` on top of `A`, so that an incremental
> bundle for `main` can be created with
> 
>   git bundle create incr.bundle A..main
> 
> Now we can generate a bundle list with the following content:
> 
>   [bundle]
>       version = 1
>       mode = all
>       heuristic = creationToken
> 
>   [bundle "base"]
>       uri = base.bundle
>       creationToken = 1
> 
>   [bundle "incr"]
>       uri = incr.bundle
>       creationToken = 2
> 
> A fresh clone with the bundle list above would give the expected
> `refs/bundles/main` pointing at `B` in new repository, in other words we
> already had everything locally from the bundles, but git would still
> download everything from server as if we got nothing.
> 
> So why the `refs/bundles/main` is not discovered? After some digging I
> found that:
> 
> 1. when unbundling a downloaded bundle, a `verify_bundle` is called to
>    check its prerequisites if any. The verify procedure would find oids
>    so `packed_git` is initialized.
> 
> 2. after unbundled all bundles, we would enter `do_fetch_pack_v2`,
>    during which `mark_complete_and_common_ref` and `mark_tips` would
>    find oids with `OBJECT_INFO_QUICK` flag set, so no new packs would be
>    enlisted if `packed_git` has already initialized in 1.

And I assume we do not want it to not use `OBJECT_INFO_QUICK`?

> Back to the example above, when unbunding `incr.bundle`, `base.pack` is
> enlisted to `packed_git` bacause of the prerequisites to verify. Then we
> can not find `B` for negotiation at a latter time bacause `B` exists in
> `incr.pack` which is not enlisted in `packed_git`.

Okay, the explanation feels sensible.

> This commit fixes this by adding a `reprepare_packed_git` call for every
> successfully unbundled bundle, which ensures to enlist all generated
> packs from bundle uri. And a set of negotiation related tests are added.

This makes me wonder though. Do we really need to call
`reprepare_packed_git()` once for every bundle, or can't we instead call
it once at the end once we have fetched all bundles? Reading on.

> Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
> ---
>     bundle-uri: refresh packed_git if unbundle succeed
> 
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1730%2Fblanet%2Fxx%2Fbundle-uri-bug-using-bundle-list-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1730/blanet/xx/bundle-uri-bug-using-bundle-list-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1730
> 
>  bundle-uri.c                |   3 +
>  t/t5558-clone-bundle-uri.sh | 129 ++++++++++++++++++++++++++++++++++++
>  2 files changed, 132 insertions(+)
> 
> diff --git a/bundle-uri.c b/bundle-uri.c
> index ca32050a78f..2b9d36cfd8e 100644
> --- a/bundle-uri.c
> +++ b/bundle-uri.c
> @@ -7,6 +7,7 @@
>  #include "refs.h"
>  #include "run-command.h"
>  #include "hashmap.h"
> +#include "packfile.h"
>  #include "pkt-line.h"
>  #include "config.h"
>  #include "remote.h"
> @@ -376,6 +377,8 @@ static int unbundle_from_file(struct repository *r, const char *file)
>  			       VERIFY_BUNDLE_QUIET)))
>  		return 1;
>  
> +	reprepare_packed_git(r);
> +

So what's hidden here is that `unbundle_from_file()` will also try to
access the bundle's refs right away. Surprisingly, we do so by calling
`refs_update_ref()` with `REF_SKIP_OID_VERIFICATION`, which has the
effect that we basically accept arbitrary object IDs here even if we do
not know those. That's why we didn't have to `reprepare_packed_git()`
before this change.

Now there are two conflicting thoughts here:

  - Either we can now drop `REF_SKIP_OID_VERIFICATION` as the object IDs
    should now be accessible.

  - Or we can avoid calling `reprepare_packed_git()` inside the loop and
    instead call it once after we have fetched all bundles.

The second one feels a bit like premature optimization to me. But the
first item does feel like it could help us to catch broken bundles
because we wouldn't end up creating refs for objects that neither we nor
the bundle have.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] bundle-uri: refresh packed_git if unbundle succeed
  2024-05-15  3:01 [PATCH] bundle-uri: refresh packed_git if unbundle succeed blanet via GitGitGadget
  2024-05-17  5:00 ` Patrick Steinhardt
@ 2024-05-17  7:36 ` Karthik Nayak
  2024-05-20 10:19   ` Xing Xin
  2024-05-20 12:36 ` [PATCH v2] bundle-uri: verify oid before writing refs blanet via GitGitGadget
  2 siblings, 1 reply; 24+ messages in thread
From: Karthik Nayak @ 2024-05-17  7:36 UTC (permalink / raw)
  To: blanet via GitGitGadget, git; +Cc: blanet, Xing Xin

[-- Attachment #1: Type: text/plain, Size: 4706 bytes --]

"blanet via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Xing Xin <xingxin.xx@bytedance.com>>
> When using the bundle-uri mechanism with a bundle list containing
> multiple interrelated bundles, we encountered a bug where tips from
> downloaded bundles were not being discovered, resulting in rather slow
> clones. This was particularly problematic when employing the heuristic
> `creationTokens`.
>
> And this is easy to reproduce. Suppose we have a repository with a
> single branch `main` pointing to commit `A`, firstly we create a base
> bundle with
>
>   git bundle create base.bundle main
>
> Then let's add a new commit `B` on top of `A`, so that an incremental
> bundle for `main` can be created with
>
>   git bundle create incr.bundle A..main
>
> Now we can generate a bundle list with the following content:
>
>   [bundle]
>       version = 1
>       mode = all
>       heuristic = creationToken
>
>   [bundle "base"]
>       uri = base.bundle
>       creationToken = 1
>
>   [bundle "incr"]
>       uri = incr.bundle
>       creationToken = 2
>
> A fresh clone with the bundle list above would give the expected
> `refs/bundles/main` pointing at `B` in new repository, in other words we
> already had everything locally from the bundles, but git would still
> download everything from server as if we got nothing.
>
> So why the `refs/bundles/main` is not discovered? After some digging I
> found that:
>
> 1. when unbundling a downloaded bundle, a `verify_bundle` is called to

s/a//

>    check its prerequisites if any. The verify procedure would find oids
>    so `packed_git` is initialized.
>

So the flow is:
1. `fetch_bundle_list` fetches all the bundles advertised via
`download_bundle_list` to local files.
2. It then calls `unbundle_all_bundles` to unbundle all the bundles.
3. Each bundle is then unbundled using `unbundle_from_file`.
4. Here, we first read the bundle header to get all the prerequisites
for the bundle, this is done in `read_bundle_header`.
5. Then we call `unbundle`, which calls `verify_bundle` to ensure that
the repository does indeed contain the prerequisites mentioned in the
bundle. Then it creates the index from the bundle file.

So because the objects are being checked, the `prepare_packed_git`
function is eventually called, which means that the
`raw_object_store->packed_git` data gets filled in and
`packed_git_initialized` is set.

This means consecutive calls to `prepare_packed_git` doesn't
re-initiate `raw_object_store->packed_git` since
`packed_git_initialized` already is set.

So your explanation makes sense, as a _nit_ I would perhaps add the part
about why consecutive calls to `prepare_packed_git` are ineffective.

> 2. after unbundled all bundles, we would enter `do_fetch_pack_v2`,

s/unbundled/unbundling

>    during which `mark_complete_and_common_ref` and `mark_tips` would
>    find oids with `OBJECT_INFO_QUICK` flag set, so no new packs would be
>    enlisted if `packed_git` has already initialized in 1.
> Back to the example above, when unbunding `incr.bundle`, `base.pack` is
> enlisted to `packed_git` bacause of the prerequisites to verify. Then we
> can not find `B` for negotiation at a latter time bacause `B` exists in
> `incr.pack` which is not enlisted in `packed_git`.
>
> This commit fixes this by adding a `reprepare_packed_git` call for every
> successfully unbundled bundle, which ensures to enlist all generated
> packs from bundle uri. And a set of negotiation related tests are added.
>

The solution makes sense.

> Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
> ---
>     bundle-uri: refresh packed_git if unbundle succeed
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1730%2Fblanet%2Fxx%2Fbundle-uri-bug-using-bundle-list-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1730/blanet/xx/bundle-uri-bug-using-bundle-list-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/1730
>
>  bundle-uri.c                |   3 +
>  t/t5558-clone-bundle-uri.sh | 129 ++++++++++++++++++++++++++++++++++++
>  2 files changed, 132 insertions(+)
>
> diff --git a/bundle-uri.c b/bundle-uri.c
> index ca32050a78f..2b9d36cfd8e 100644
> --- a/bundle-uri.c
> +++ b/bundle-uri.c
> @@ -7,6 +7,7 @@
>  #include "refs.h"
>  #include "run-command.h"
>  #include "hashmap.h"
> +#include "packfile.h"
>  #include "pkt-line.h"
>  #include "config.h"
>  #include "remote.h"
> @@ -376,6 +377,8 @@ static int unbundle_from_file(struct repository *r, const char *file)
>  			       VERIFY_BUNDLE_QUIET)))
>  		return 1;
>
> +	reprepare_packed_git(r);
> +
>

Would it make sense to move this to `bundle.c:unbundle()`, since that is
also where the idx is created?

[snip]

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH] bundle-uri: refresh packed_git if unbundle succeed
  2024-05-17  5:00 ` Patrick Steinhardt
@ 2024-05-17 16:14   ` Junio C Hamano
  2024-05-20 11:48     ` Xing Xin
  2024-05-20  9:41   ` Xing Xin
  1 sibling, 1 reply; 24+ messages in thread
From: Junio C Hamano @ 2024-05-17 16:14 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: blanet via GitGitGadget, git, blanet, Xing Xin

Patrick Steinhardt <ps@pks.im> writes:

> Now there are two conflicting thoughts here:
>
>   - Either we can now drop `REF_SKIP_OID_VERIFICATION` as the object IDs
>     should now be accessible.
>
>   - Or we can avoid calling `reprepare_packed_git()` inside the loop and
>     instead call it once after we have fetched all bundles.
>
> The second one feels a bit like premature optimization to me. But the
> first item does feel like it could help us to catch broken bundles
> because we wouldn't end up creating refs for objects that neither we nor
> the bundle have.

I like the way your thoughts are structured around here.

I do agree that the latter is a wrong approach---we shouldn't be
trusting what came from elsewhere over the network without first
checking.  We should probably be running the "index-pack --fix-thin"
the unbundling process runs with also the "--fsck-objects" option if
we are not doing so already, and even then, we should make sure that
the object we are making our ref point at have everything behind it.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re:Re: [PATCH] bundle-uri: refresh packed_git if unbundle succeed
  2024-05-17  5:00 ` Patrick Steinhardt
  2024-05-17 16:14   ` Junio C Hamano
@ 2024-05-20  9:41   ` Xing Xin
  1 sibling, 0 replies; 24+ messages in thread
From: Xing Xin @ 2024-05-20  9:41 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: blanet via GitGitGadget, git, Xing Xin

At 2024-05-17 13:00:49, "Patrick Steinhardt" <ps@pks.im> wrote:
>On Wed, May 15, 2024 at 03:01:09AM +0000, blanet via GitGitGadget wrote:
>> From: Xing Xin <xingxin.xx@bytedance.com>
>
>Long time no see :)

Glad to see you again here :)

>> 
>> So why the `refs/bundles/main` is not discovered? After some digging I
>> found that:
>> 
>> 1. when unbundling a downloaded bundle, a `verify_bundle` is called to
>>    check its prerequisites if any. The verify procedure would find oids
>>    so `packed_git` is initialized.
>> 
>> 2. after unbundled all bundles, we would enter `do_fetch_pack_v2`,
>>    during which `mark_complete_and_common_ref` and `mark_tips` would
>>    find oids with `OBJECT_INFO_QUICK` flag set, so no new packs would be
>>    enlisted if `packed_git` has already initialized in 1.
>
>And I assume we do not want it to not use `OBJECT_INFO_QUICK`?

I think so. For clones or fetches without using bundle-uri, we can hardly
encounter the case that new packs are added during the negotiation process.
So using `OBJECT_INFO_QUICK` can get some performance gain.

>
>> Back to the example above, when unbunding `incr.bundle`, `base.pack` is
>> enlisted to `packed_git` bacause of the prerequisites to verify. Then we
>> can not find `B` for negotiation at a latter time bacause `B` exists in
>> `incr.pack` which is not enlisted in `packed_git`.
>
>Okay, the explanation feels sensible.
>
>> This commit fixes this by adding a `reprepare_packed_git` call for every
>> successfully unbundled bundle, which ensures to enlist all generated
>> packs from bundle uri. And a set of negotiation related tests are added.
>
>This makes me wonder though. Do we really need to call
>`reprepare_packed_git()` once for every bundle, or can't we instead call
>it once at the end once we have fetched all bundles? Reading on.
>
>> Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
>> ---
>>     bundle-uri: refresh packed_git if unbundle succeed
>> 
>> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1730%2Fblanet%2Fxx%2Fbundle-uri-bug-using-bundle-list-v1
>> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1730/blanet/xx/bundle-uri-bug-using-bundle-list-v1
>> Pull-Request: https://github.com/gitgitgadget/git/pull/1730
>> 
>>  bundle-uri.c                |   3 +
>>  t/t5558-clone-bundle-uri.sh | 129 ++++++++++++++++++++++++++++++++++++
>>  2 files changed, 132 insertions(+)
>> 
>> diff --git a/bundle-uri.c b/bundle-uri.c
>> index ca32050a78f..2b9d36cfd8e 100644
>> --- a/bundle-uri.c
>> +++ b/bundle-uri.c
>> @@ -7,6 +7,7 @@
>>  #include "refs.h"
>>  #include "run-command.h"
>>  #include "hashmap.h"
>> +#include "packfile.h"
>>  #include "pkt-line.h"
>>  #include "config.h"
>>  #include "remote.h"
>> @@ -376,6 +377,8 @@ static int unbundle_from_file(struct repository *r, const char *file)
>>  			       VERIFY_BUNDLE_QUIET)))
>>  		return 1;
>>  
>> +	reprepare_packed_git(r);
>> +
>
>So what's hidden here is that `unbundle_from_file()` will also try to
>access the bundle's refs right away. Surprisingly, we do so by calling
>`refs_update_ref()` with `REF_SKIP_OID_VERIFICATION`, which has the
>effect that we basically accept arbitrary object IDs here even if we do
>not know those. That's why we didn't have to `reprepare_packed_git()`
>before this change.

You are right! I tried dropping this `REF_SKIP_OID_VERIFICATION` flag and
the negotiation works as expected.

After some further digging I find that without `REF_SKIP_OID_VERIFICATION`,
both `write_ref_to_lockfile` for files backend and `reftable_be_transaction_prepare`
for reftable backend would call `parse_object` to check the oid. `parse_object`
can help refresh `packed_git` via `reprepare_packed_git`.

>
>Now there are two conflicting thoughts here:
>
>  - Either we can now drop `REF_SKIP_OID_VERIFICATION` as the object IDs
>    should now be accessible.
>
>  - Or we can avoid calling `reprepare_packed_git()` inside the loop and
>    instead call it once after we have fetched all bundles.
>
>The second one feels a bit like premature optimization to me. But the
>first item does feel like it could help us to catch broken bundles
>because we wouldn't end up creating refs for objects that neither we nor
>the bundle have.

I favor the first approach because a validation on the object IDs we are
writing is a safe guard . And the flag itself was designed to be used in
testing scenarios.

/*
 * Blindly write an object_id. This is useful for testing data corruption
 * scenarios.
 */
#define REF_SKIP_OID_VERIFICATION (1 << 10)



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re:Re: [PATCH] bundle-uri: refresh packed_git if unbundle succeed
  2024-05-17  7:36 ` Karthik Nayak
@ 2024-05-20 10:19   ` Xing Xin
  0 siblings, 0 replies; 24+ messages in thread
From: Xing Xin @ 2024-05-20 10:19 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: blanet via GitGitGadget, git, Xing Xin

At 2024-05-17 15:36:53, "Karthik Nayak" <karthik.188@gmail.com> wrote:
>"blanet via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> From: Xing Xin <xingxin.xx@bytedance.com>>
>> When using the bundle-uri mechanism with a bundle list containing
>> multiple interrelated bundles, we encountered a bug where tips from
>> downloaded bundles were not being discovered, resulting in rather slow
>> clones. This was particularly problematic when employing the heuristic
>> `creationTokens`.
>>
>> And this is easy to reproduce. Suppose we have a repository with a
>> single branch `main` pointing to commit `A`, firstly we create a base
>> bundle with
>>
>>   git bundle create base.bundle main
>>
>> Then let's add a new commit `B` on top of `A`, so that an incremental
>> bundle for `main` can be created with
>>
>>   git bundle create incr.bundle A..main
>>
>> Now we can generate a bundle list with the following content:
>>
>>   [bundle]
>>       version = 1
>>       mode = all
>>       heuristic = creationToken
>>
>>   [bundle "base"]
>>       uri = base.bundle
>>       creationToken = 1
>>
>>   [bundle "incr"]
>>       uri = incr.bundle
>>       creationToken = 2
>>
>> A fresh clone with the bundle list above would give the expected
>> `refs/bundles/main` pointing at `B` in new repository, in other words we
>> already had everything locally from the bundles, but git would still
>> download everything from server as if we got nothing.
>>
>> So why the `refs/bundles/main` is not discovered? After some digging I
>> found that:
>>
>> 1. when unbundling a downloaded bundle, a `verify_bundle` is called to
>
>s/a//

Thanks!

>
>>    check its prerequisites if any. The verify procedure would find oids
>>    so `packed_git` is initialized.
>>
>
>So the flow is:
>1. `fetch_bundle_list` fetches all the bundles advertised via
>`download_bundle_list` to local files.
>2. It then calls `unbundle_all_bundles` to unbundle all the bundles.
>3. Each bundle is then unbundled using `unbundle_from_file`.
>4. Here, we first read the bundle header to get all the prerequisites
>for the bundle, this is done in `read_bundle_header`.
>5. Then we call `unbundle`, which calls `verify_bundle` to ensure that
>the repository does indeed contain the prerequisites mentioned in the
>bundle. Then it creates the index from the bundle file.
>
>So because the objects are being checked, the `prepare_packed_git`
>function is eventually called, which means that the
>`raw_object_store->packed_git` data gets filled in and
>`packed_git_initialized` is set.
>
>This means consecutive calls to `prepare_packed_git` doesn't
>re-initiate `raw_object_store->packed_git` since
>`packed_git_initialized` already is set.
>
>So your explanation makes sense, as a _nit_ I would perhaps add the part
>about why consecutive calls to `prepare_packed_git` are ineffective.

Thanks my friend, you have expressed this issue more clearly. I will
post a new description based on your explanation with the creationToken
case covered.

>
>> 2. after unbundled all bundles, we would enter `do_fetch_pack_v2`,
>
>s/unbundled/unbundling

Copy that.

>
>> +#include "packfile.h"
>>  #include "pkt-line.h"
>>  #include "config.h"
>>  #include "remote.h"
>> @@ -376,6 +377,8 @@ static int unbundle_from_file(struct repository *r, const char *file)
>>  			       VERIFY_BUNDLE_QUIET)))
>>  		return 1;
>>
>> +	reprepare_packed_git(r);
>> +
>>
>
>Would it make sense to move this to `bundle.c:unbundle()`, since that is
>also where the idx is created?
>

I wonder if we need a mental model that we should `reprepare_packed_git`
that when a new pack and its corresponding idx is generated? Currently
whether to call `reprepare_packed_git` is determined by the caller.

But within the scope of this bug, I tend to remove the
`REF_SKIP_OID_VERIFICATION` flag when writing refs as Patrick suggested.

Xing Xin


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re:Re: [PATCH] bundle-uri: refresh packed_git if unbundle succeed
  2024-05-17 16:14   ` Junio C Hamano
@ 2024-05-20 11:48     ` Xing Xin
  2024-05-20 17:19       ` Junio C Hamano
  0 siblings, 1 reply; 24+ messages in thread
From: Xing Xin @ 2024-05-20 11:48 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Patrick Steinhardt, blanet via GitGitGadget, git, Xing Xin

At 2024-05-18 00:14:47, "Junio C Hamano" <gitster@pobox.com> wrote:
>Patrick Steinhardt <ps@pks.im> writes:
>
>> Now there are two conflicting thoughts here:
>>
>>   - Either we can now drop `REF_SKIP_OID_VERIFICATION` as the object IDs
>>     should now be accessible.
>>
>>   - Or we can avoid calling `reprepare_packed_git()` inside the loop and
>>     instead call it once after we have fetched all bundles.
>>
>> The second one feels a bit like premature optimization to me. But the
>> first item does feel like it could help us to catch broken bundles
>> because we wouldn't end up creating refs for objects that neither we nor
>> the bundle have.
>
>I like the way your thoughts are structured around here.
>
>I do agree that the latter is a wrong approach---we shouldn't be
>trusting what came from elsewhere over the network without first
>checking.  We should probably be running the "index-pack --fix-thin"
>the unbundling process runs with also the "--fsck-objects" option if
>we are not doing so already, and even then, we should make sure that
>the object we are making our ref point at have everything behind it.

Currently `unbundle` and all its callers are not adding "--fsck-objects".
There is a param `extra_index_pack_args` for `unbundle` but it is
mainly used for progress related options.

Personally I think data from bundles and data received via network
should be treated equally. For "fetch-pack" we now have some configs
such as  "fetch.fsckobjects" and "transfer.fsckobjects" to decide the
behavior, these configs are invisible when we are fetching bundles.

So for bundles I think we have some different ways to go:

  - Acknowledge the configs mentioned above and behave like
    "fetch-pack".

  - Add "--fsck-objects" as a default in `unbundle`.

  - In `unbundle_from_file`, pass in "--fsck-objects" as
    `extra_index_pack_args` for `unbundle` so this only affect the
    bundle-uri related process.

What do you think?

Xing Xin

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2] bundle-uri: verify oid before writing refs
  2024-05-15  3:01 [PATCH] bundle-uri: refresh packed_git if unbundle succeed blanet via GitGitGadget
  2024-05-17  5:00 ` Patrick Steinhardt
  2024-05-17  7:36 ` Karthik Nayak
@ 2024-05-20 12:36 ` blanet via GitGitGadget
  2024-05-21 15:41   ` Karthik Nayak
  2024-05-27 15:41   ` [PATCH v3 0/4] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
  2 siblings, 2 replies; 24+ messages in thread
From: blanet via GitGitGadget @ 2024-05-20 12:36 UTC (permalink / raw)
  To: git; +Cc: blanet, Xing Xin

From: Xing Xin <xingxin.xx@bytedance.com>

When using the bundle-uri mechanism with a bundle list containing
multiple interrelated bundles, we encountered a bug where tips from
downloaded bundles were not discovered, thus resulting in rather slow
clones. This was particularly problematic when employing the heuristic
`creationTokens`.

And this is easy to reproduce. Suppose we have a repository with a
single branch `main` pointing to commit `A`, firstly we create a base
bundle with

  git bundle create base.bundle main

Then let's add a new commit `B` on top of `A`, so that an incremental
bundle for `main` can be created with

  git bundle create incr.bundle A..main

Now we can generate a bundle list with the following content:

  [bundle]
      version = 1
      mode = all
      heuristic = creationToken

  [bundle "base"]
      uri = base.bundle
      creationToken = 1

  [bundle "incr"]
      uri = incr.bundle
      creationToken = 2

A fresh clone with the bundle list above would give the expected
`refs/bundles/main` pointing at `B` in new repository, in other words we
already had everything locally from the bundles, but git would still
download everything from server as if we got nothing.

So why the `refs/bundles/main` is not discovered? After some digging I
found that:

1. Bundles in bundle list are downloaded to local files via
   `download_bundle_list` or via `fetch_bundles_by_token` for the
   creationToken heuristic case.
2. Then it tries to unbundle each bundle via `unbundle_from_file`, which
   is called by `unbundle_all_bundles` or called within
   `fetch_bundles_by_token` for the creationToken heuristic case.
3. Here, we first read the bundle header to get all the prerequisites
   for the bundle, this is done in `read_bundle_header`.
4. Then we call `unbundle`, which calls `verify_bundle` to ensure that
   the repository does indeed contain the prerequisites mentioned in the
   bundle.
5. The `verify_bundle` will call `parse_object`, within which the
   `prepare_packed_git` or `reprepare_packed_git` is eventually called,
   which means that the `raw_object_store->packed_git` data gets filled
   in and ``packed_git_initialized` is set. This also means consecutive
   calls to `prepare_packed_git` doesn't re-initiate
   `raw_object_store->packed_git` since `packed_git_initialized` already
   is set.
6. If `unbundle` succeeds, it writes some refs via `refs_update_ref`
   with `REF_SKIP_OID_VERIFICATION` set. So the bundle refs which can
   target arbitrary objects are written to the repository.
7. Finally in `do_fetch_pack_v2`, `mark_complete_and_common_ref` and
   `mark_tips` are called with `OBJECT_INFO_QUICK` set to find local
   tips. Here it won't call `reprepare_packed_git` anymore so it would
   fail to parse oids that only reside in the last bundle.

Back to the example above, when unbunding `incr.bundle`, `base.pack` is
enlisted to `packed_git` bacause of the prerequisites to verify. While
we can not find `B` for negotiation at a latter time because `B` exists
in `incr.pack` which is not enlisted in `packed_git`.

This commit fixes this bug by dropping the `REF_SKIP_OID_VERIFICATION`
flag when writing bundle refs, so we can:

1. Ensure that the bundle refs we are writing are pointing to valid
   objects.
2. Ensure all the tips from bundle refs can be correctly parsed.

And a set of negotiation related tests for bundle-uri are added.

Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
---
    bundle-uri: refresh packed_git if unbundle succeed
    
    cc: Patrick Steinhardt ps@pks.im cc: Karthik Nayak karthik.188@gmail.com

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1730%2Fblanet%2Fxx%2Fbundle-uri-bug-using-bundle-list-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1730/blanet/xx/bundle-uri-bug-using-bundle-list-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1730

Range-diff vs v1:

 1:  d4841eea556 ! 1:  8bdeacf1360 bundle-uri: refresh packed_git if unbundle succeed
     @@ Metadata
      Author: Xing Xin <xingxin.xx@bytedance.com>
      
       ## Commit message ##
     -    bundle-uri: refresh packed_git if unbundle succeed
     +    bundle-uri: verify oid before writing refs
      
          When using the bundle-uri mechanism with a bundle list containing
          multiple interrelated bundles, we encountered a bug where tips from
     -    downloaded bundles were not being discovered, resulting in rather slow
     +    downloaded bundles were not discovered, thus resulting in rather slow
          clones. This was particularly problematic when employing the heuristic
          `creationTokens`.
      
     @@ Commit message
          So why the `refs/bundles/main` is not discovered? After some digging I
          found that:
      
     -    1. when unbundling a downloaded bundle, a `verify_bundle` is called to
     -       check its prerequisites if any. The verify procedure would find oids
     -       so `packed_git` is initialized.
     -
     -    2. after unbundled all bundles, we would enter `do_fetch_pack_v2`,
     -       during which `mark_complete_and_common_ref` and `mark_tips` would
     -       find oids with `OBJECT_INFO_QUICK` flag set, so no new packs would be
     -       enlisted if `packed_git` has already initialized in 1.
     +    1. Bundles in bundle list are downloaded to local files via
     +       `download_bundle_list` or via `fetch_bundles_by_token` for the
     +       creationToken heuristic case.
     +    2. Then it tries to unbundle each bundle via `unbundle_from_file`, which
     +       is called by `unbundle_all_bundles` or called within
     +       `fetch_bundles_by_token` for the creationToken heuristic case.
     +    3. Here, we first read the bundle header to get all the prerequisites
     +       for the bundle, this is done in `read_bundle_header`.
     +    4. Then we call `unbundle`, which calls `verify_bundle` to ensure that
     +       the repository does indeed contain the prerequisites mentioned in the
     +       bundle.
     +    5. The `verify_bundle` will call `parse_object`, within which the
     +       `prepare_packed_git` or `reprepare_packed_git` is eventually called,
     +       which means that the `raw_object_store->packed_git` data gets filled
     +       in and ``packed_git_initialized` is set. This also means consecutive
     +       calls to `prepare_packed_git` doesn't re-initiate
     +       `raw_object_store->packed_git` since `packed_git_initialized` already
     +       is set.
     +    6. If `unbundle` succeeds, it writes some refs via `refs_update_ref`
     +       with `REF_SKIP_OID_VERIFICATION` set. So the bundle refs which can
     +       target arbitrary objects are written to the repository.
     +    7. Finally in `do_fetch_pack_v2`, `mark_complete_and_common_ref` and
     +       `mark_tips` are called with `OBJECT_INFO_QUICK` set to find local
     +       tips. Here it won't call `reprepare_packed_git` anymore so it would
     +       fail to parse oids that only reside in the last bundle.
      
          Back to the example above, when unbunding `incr.bundle`, `base.pack` is
     -    enlisted to `packed_git` bacause of the prerequisites to verify. Then we
     -    can not find `B` for negotiation at a latter time bacause `B` exists in
     -    `incr.pack` which is not enlisted in `packed_git`.
     +    enlisted to `packed_git` bacause of the prerequisites to verify. While
     +    we can not find `B` for negotiation at a latter time because `B` exists
     +    in `incr.pack` which is not enlisted in `packed_git`.
     +
     +    This commit fixes this bug by dropping the `REF_SKIP_OID_VERIFICATION`
     +    flag when writing bundle refs, so we can:
      
     -    This commit fixes this by adding a `reprepare_packed_git` call for every
     -    successfully unbundled bundle, which ensures to enlist all generated
     -    packs from bundle uri. And a set of negotiation related tests are added.
     +    1. Ensure that the bundle refs we are writing are pointing to valid
     +       objects.
     +    2. Ensure all the tips from bundle refs can be correctly parsed.
     +
     +    And a set of negotiation related tests for bundle-uri are added.
      
          Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
      
       ## bundle-uri.c ##
     -@@
     - #include "refs.h"
     - #include "run-command.h"
     - #include "hashmap.h"
     -+#include "packfile.h"
     - #include "pkt-line.h"
     - #include "config.h"
     - #include "remote.h"
      @@ bundle-uri.c: static int unbundle_from_file(struct repository *r, const char *file)
     - 			       VERIFY_BUNDLE_QUIET)))
     - 		return 1;
     + 		refs_update_ref(get_main_ref_store(the_repository),
     + 				"fetched bundle", bundle_ref.buf, oid,
     + 				has_old ? &old_oid : NULL,
     +-				REF_SKIP_OID_VERIFICATION,
     +-				UPDATE_REFS_MSG_ON_ERR);
     ++				0, UPDATE_REFS_MSG_ON_ERR);
     + 	}
       
     -+	reprepare_packed_git(r);
     -+
     - 	/*
     - 	 * Convert all refs/heads/ from the bundle into refs/bundles/
     - 	 * in the local repository.
     + 	bundle_header_release(&header);
      
       ## t/t5558-clone-bundle-uri.sh ##
      @@ t/t5558-clone-bundle-uri.sh: test_expect_success 'fail to clone from non-bundle file' '


 bundle-uri.c                |   3 +-
 t/t5558-clone-bundle-uri.sh | 129 ++++++++++++++++++++++++++++++++++++
 2 files changed, 130 insertions(+), 2 deletions(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index 91b3319a5c1..65666a11d9c 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -400,8 +400,7 @@ static int unbundle_from_file(struct repository *r, const char *file)
 		refs_update_ref(get_main_ref_store(the_repository),
 				"fetched bundle", bundle_ref.buf, oid,
 				has_old ? &old_oid : NULL,
-				REF_SKIP_OID_VERIFICATION,
-				UPDATE_REFS_MSG_ON_ERR);
+				0, UPDATE_REFS_MSG_ON_ERR);
 	}
 
 	bundle_header_release(&header);
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index 1ca5f745e73..a5b04d6f187 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -20,7 +20,10 @@ test_expect_success 'fail to clone from non-bundle file' '
 test_expect_success 'create bundle' '
 	git init clone-from &&
 	git -C clone-from checkout -b topic &&
+
 	test_commit -C clone-from A &&
+	git -C clone-from bundle create A.bundle topic &&
+
 	test_commit -C clone-from B &&
 	git -C clone-from bundle create B.bundle topic
 '
@@ -259,6 +262,132 @@ test_expect_success 'clone bundle list (file, any mode, all failures)' '
 	! grep "refs/bundles/" refs
 '
 
+#########################################################################
+# Clone negotiation related tests begin here
+
+test_expect_success 'negotiation: bundle with part of wanted commits' '
+	test_when_finished rm -rf trace*.txt &&
+	GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+	git clone --no-local --bundle-uri="clone-from/A.bundle" \
+		clone-from nego-bundle-part &&
+	git -C nego-bundle-part for-each-ref --format="%(refname)" >refs &&
+	grep "refs/bundles/" refs >actual &&
+	cat >expect <<-\EOF &&
+	refs/bundles/topic
+	EOF
+	test_cmp expect actual &&
+	# Ensure that refs/bundles/topic are sent as "have".
+	grep "clone> have $(git -C clone-from rev-parse A)" trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle with all wanted commits' '
+	test_when_finished rm -rf trace*.txt &&
+	GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+	git clone --no-local --single-branch --branch=topic --no-tags \
+		--bundle-uri="clone-from/B.bundle" \
+		clone-from nego-bundle-all &&
+	git -C nego-bundle-all for-each-ref --format="%(refname)" >refs &&
+	grep "refs/bundles/" refs >actual &&
+	cat >expect <<-\EOF &&
+	refs/bundles/topic
+	EOF
+	test_cmp expect actual &&
+	# We already have all needed commits so no "want" needed.
+	! grep "clone> want " trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle list (no heuristic)' '
+	test_when_finished rm -f trace*.txt &&
+	cat >bundle-list <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+
+	[bundle "bundle-1"]
+		uri = file://$(pwd)/clone-from/bundle-1.bundle
+
+	[bundle "bundle-2"]
+		uri = file://$(pwd)/clone-from/bundle-2.bundle
+	EOF
+
+	GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+	git clone --no-local --bundle-uri="file://$(pwd)/bundle-list" \
+		clone-from nego-bundle-list-no-heuristic &&
+
+	git -C nego-bundle-list-no-heuristic for-each-ref --format="%(refname)" >refs &&
+	grep "refs/bundles/" refs >actual &&
+	cat >expect <<-\EOF &&
+	refs/bundles/base
+	refs/bundles/left
+	EOF
+	test_cmp expect actual &&
+	grep "clone> have $(git -C nego-bundle-list-no-heuristic rev-parse refs/bundles/left)" trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle list (creationToken)' '
+	test_when_finished rm -f trace*.txt &&
+	cat >bundle-list <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "bundle-1"]
+		uri = file://$(pwd)/clone-from/bundle-1.bundle
+		creationToken = 1
+
+	[bundle "bundle-2"]
+		uri = file://$(pwd)/clone-from/bundle-2.bundle
+		creationToken = 2
+	EOF
+
+	GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+	git clone --no-local --bundle-uri="file://$(pwd)/bundle-list" \
+		clone-from nego-bundle-list-heuristic &&
+
+	git -C nego-bundle-list-heuristic for-each-ref --format="%(refname)" >refs &&
+	grep "refs/bundles/" refs >actual &&
+	cat >expect <<-\EOF &&
+	refs/bundles/base
+	refs/bundles/left
+	EOF
+	test_cmp expect actual &&
+	grep "clone> have $(git -C nego-bundle-list-heuristic rev-parse refs/bundles/left)" trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle list with all wanted commits' '
+	test_when_finished rm -f trace*.txt &&
+	cat >bundle-list <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "bundle-1"]
+		uri = file://$(pwd)/clone-from/bundle-1.bundle
+		creationToken = 1
+
+	[bundle "bundle-2"]
+		uri = file://$(pwd)/clone-from/bundle-2.bundle
+		creationToken = 2
+	EOF
+
+	GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+	git clone --no-local --single-branch --branch=left --no-tags \
+		--bundle-uri="file://$(pwd)/bundle-list" \
+		clone-from nego-bundle-list-all &&
+
+	git -C nego-bundle-list-all for-each-ref --format="%(refname)" >refs &&
+	grep "refs/bundles/" refs >actual &&
+	cat >expect <<-\EOF &&
+	refs/bundles/base
+	refs/bundles/left
+	EOF
+	test_cmp expect actual &&
+	# We already have all needed commits so no "want" needed.
+	! grep "clone> want " trace-packet.txt
+'
+
 #########################################################################
 # HTTP tests begin here
 

base-commit: d8ab1d464d07baa30e5a180eb33b3f9aa5c93adf
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH] bundle-uri: refresh packed_git if unbundle succeed
  2024-05-20 11:48     ` Xing Xin
@ 2024-05-20 17:19       ` Junio C Hamano
  2024-05-27 16:04         ` Xing Xin
  0 siblings, 1 reply; 24+ messages in thread
From: Junio C Hamano @ 2024-05-20 17:19 UTC (permalink / raw)
  To: Xing Xin; +Cc: Patrick Steinhardt, blanet via GitGitGadget, git, Xing Xin

"Xing Xin" <bupt_xingxin@163.com> writes:

> Personally I think data from bundles and data received via network
> should be treated equally.

Yup, that is not personal ;-) but is universally accepted as a good
discipline.  In the case of bundle-uri, the bundle came over the
network so it is even more true that they should be treated the
same.

> For "fetch-pack" we now have some configs
> such as  "fetch.fsckobjects" and "transfer.fsckobjects" to decide the
> behavior, these configs are invisible when we are fetching bundles.

When fetching over network, transport.c:fetch_refs_via_pack() calls
fetch_pack.c:fetch_pack(), which eventually calls get_pack() and the
configuration variables are honored there.  It appears that the
transport layer is unaware of the .fsckobjects configuration knobs.

When fetching from a bundle, transport.c:fetch_refs_from_bundle()
calls bundle.c:unbundle().  This function has three callers, i.e.
"git bundle unbundle", normal fetching from a bundle, and more
recently added bundle-uri codepaths.  

I think one reasonable approach to take is to add an extra parameter
that takes one of three values: (never, use-config, always), and
conditionally add "--fsck-objects" to the command line of the
index-pack.  Teach "git bundle unbundle" the "--fsck-objects" option
so that it can pass 'never' or 'always' from the command line, and
pass 'use-config' from the code paths for normal fetching from a
budnle and bundle-uri.

To implement use-config, you'd probably need to refactor a small
part of fetch-pack.c:get_pack()

	if (fetch_fsck_objects >= 0
	    ? fetch_fsck_objects
	    : transfer_fsck_objects >= 0
	    ? transfer_fsck_objects
	    : 0)
		fsck_objects = 1;

into a public function (to support a caller like unbundle() that
comes from sideways, the new function may also need to call
fetch_pack_setup() to prime them).

A patch series may take a structure like so:

 * define enum { UNBUNDLE_FSCK_NEVER, UNBUNDLE_FSCK_ALWAYS } in
   bundle.h, have bundle.c:unbundle() accept a new parameter of that
   type, and conditionally add "--fsck-objects" to its call to
   "index-pack".  "git bundle unbundle" can pass 'never' to its
   invocation to unbundle() as an easy way to test it.  For the
   other two callers, we can start by passing 'always'.

 * (optional) teach "git bundle unbundle" a new "--fsck-objects"
   option to allow passing 'always' to its call to unbundle().  With
   that, add tests to feed it a bundle with questionable objects in
   it and make sure that unbundling notices.

 * refactor fetch-pack.c:get_pack() to make the fetch-then-transfer
   configuration logic available to external callers.

 * Add UNBUNDLE_FSCK_USE_CONFIG to the enum, enhance unbundle() to
   react to the value by calling the helper function you introduced
   in the previous step.

Thanks.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2] bundle-uri: verify oid before writing refs
  2024-05-20 12:36 ` [PATCH v2] bundle-uri: verify oid before writing refs blanet via GitGitGadget
@ 2024-05-21 15:41   ` Karthik Nayak
  2024-05-27 15:41   ` [PATCH v3 0/4] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
  1 sibling, 0 replies; 24+ messages in thread
From: Karthik Nayak @ 2024-05-21 15:41 UTC (permalink / raw)
  To: blanet via GitGitGadget, git; +Cc: blanet, Xing Xin

[-- Attachment #1: Type: text/plain, Size: 8781 bytes --]

"blanet via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Xing Xin <xingxin.xx@bytedance.com>
>
> When using the bundle-uri mechanism with a bundle list containing
> multiple interrelated bundles, we encountered a bug where tips from
> downloaded bundles were not discovered, thus resulting in rather slow
> clones. This was particularly problematic when employing the heuristic
> `creationTokens`.
>
> And this is easy to reproduce. Suppose we have a repository with a
> single branch `main` pointing to commit `A`, firstly we create a base
> bundle with
>
>   git bundle create base.bundle main
>
> Then let's add a new commit `B` on top of `A`, so that an incremental
> bundle for `main` can be created with
>
>   git bundle create incr.bundle A..main
>
> Now we can generate a bundle list with the following content:
>
>   [bundle]
>       version = 1
>       mode = all
>       heuristic = creationToken
>
>   [bundle "base"]
>       uri = base.bundle
>       creationToken = 1
>
>   [bundle "incr"]
>       uri = incr.bundle
>       creationToken = 2
>
> A fresh clone with the bundle list above would give the expected
> `refs/bundles/main` pointing at `B` in new repository, in other words we
> already had everything locally from the bundles, but git would still
> download everything from server as if we got nothing.
>
> So why the `refs/bundles/main` is not discovered? After some digging I
> found that:
>
> 1. Bundles in bundle list are downloaded to local files via
>    `download_bundle_list` or via `fetch_bundles_by_token` for the
>    creationToken heuristic case.
> 2. Then it tries to unbundle each bundle via `unbundle_from_file`, which
>    is called by `unbundle_all_bundles` or called within
>    `fetch_bundles_by_token` for the creationToken heuristic case.
> 3. Here, we first read the bundle header to get all the prerequisites
>    for the bundle, this is done in `read_bundle_header`.
> 4. Then we call `unbundle`, which calls `verify_bundle` to ensure that
>    the repository does indeed contain the prerequisites mentioned in the
>    bundle.
> 5. The `verify_bundle` will call `parse_object`, within which the
>    `prepare_packed_git` or `reprepare_packed_git` is eventually called,
>    which means that the `raw_object_store->packed_git` data gets filled
>    in and ``packed_git_initialized` is set. This also means consecutive
>    calls to `prepare_packed_git` doesn't re-initiate
>    `raw_object_store->packed_git` since `packed_git_initialized` already
>    is set.
> 6. If `unbundle` succeeds, it writes some refs via `refs_update_ref`
>    with `REF_SKIP_OID_VERIFICATION` set. So the bundle refs which can
>    target arbitrary objects are written to the repository.
> 7. Finally in `do_fetch_pack_v2`, `mark_complete_and_common_ref` and
>    `mark_tips` are called with `OBJECT_INFO_QUICK` set to find local
>    tips. Here it won't call `reprepare_packed_git` anymore so it would
>    fail to parse oids that only reside in the last bundle.
>
> Back to the example above, when unbunding `incr.bundle`, `base.pack` is
> enlisted to `packed_git` bacause of the prerequisites to verify. While

s/bacause/because

[snip]

> diff --git a/bundle-uri.c b/bundle-uri.c
> index 91b3319a5c1..65666a11d9c 100644
> --- a/bundle-uri.c
> +++ b/bundle-uri.c
> @@ -400,8 +400,7 @@ static int unbundle_from_file(struct repository *r, const char *file)
>  		refs_update_ref(get_main_ref_store(the_repository),
>  				"fetched bundle", bundle_ref.buf, oid,
>  				has_old ? &old_oid : NULL,
> -				REF_SKIP_OID_VERIFICATION,
> -				UPDATE_REFS_MSG_ON_ERR);
> +				0, UPDATE_REFS_MSG_ON_ERR);
>  	}
>
>  	bundle_header_release(&header);
> diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
> index 1ca5f745e73..a5b04d6f187 100755
> --- a/t/t5558-clone-bundle-uri.sh
> +++ b/t/t5558-clone-bundle-uri.sh
> @@ -20,7 +20,10 @@ test_expect_success 'fail to clone from non-bundle file' '
>  test_expect_success 'create bundle' '
>  	git init clone-from &&
>  	git -C clone-from checkout -b topic &&
> +
>  	test_commit -C clone-from A &&
> +	git -C clone-from bundle create A.bundle topic &&
> +
>  	test_commit -C clone-from B &&
>  	git -C clone-from bundle create B.bundle topic
>  '
> @@ -259,6 +262,132 @@ test_expect_success 'clone bundle list (file, any mode, all failures)' '
>  	! grep "refs/bundles/" refs
>  '
>
> +#########################################################################
> +# Clone negotiation related tests begin here
> +
> +test_expect_success 'negotiation: bundle with part of wanted commits' '
> +	test_when_finished rm -rf trace*.txt &&
> +	GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
> +	git clone --no-local --bundle-uri="clone-from/A.bundle" \
> +		clone-from nego-bundle-part &&
> +	git -C nego-bundle-part for-each-ref --format="%(refname)" >refs &&
> +	grep "refs/bundles/" refs >actual &&
> +	cat >expect <<-\EOF &&
> +	refs/bundles/topic
> +	EOF
> +	test_cmp expect actual &&
> +	# Ensure that refs/bundles/topic are sent as "have".
> +	grep "clone> have $(git -C clone-from rev-parse A)" trace-packet.txt
> +'
> +
> +test_expect_success 'negotiation: bundle with all wanted commits' '
> +	test_when_finished rm -rf trace*.txt &&
> +	GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
> +	git clone --no-local --single-branch --branch=topic --no-tags \
> +		--bundle-uri="clone-from/B.bundle" \
> +		clone-from nego-bundle-all &&
> +	git -C nego-bundle-all for-each-ref --format="%(refname)" >refs &&
> +	grep "refs/bundles/" refs >actual &&
> +	cat >expect <<-\EOF &&
> +	refs/bundles/topic
> +	EOF
> +	test_cmp expect actual &&
> +	# We already have all needed commits so no "want" needed.
> +	! grep "clone> want " trace-packet.txt
> +'
> +
> +test_expect_success 'negotiation: bundle list (no heuristic)' '
> +	test_when_finished rm -f trace*.txt &&
> +	cat >bundle-list <<-EOF &&
> +	[bundle]
> +		version = 1
> +		mode = all
> +
> +	[bundle "bundle-1"]
> +		uri = file://$(pwd)/clone-from/bundle-1.bundle
> +
> +	[bundle "bundle-2"]
> +		uri = file://$(pwd)/clone-from/bundle-2.bundle
> +	EOF
> +
> +	GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
> +	git clone --no-local --bundle-uri="file://$(pwd)/bundle-list" \
> +		clone-from nego-bundle-list-no-heuristic &&
> +
> +	git -C nego-bundle-list-no-heuristic for-each-ref --format="%(refname)" >refs &&
> +	grep "refs/bundles/" refs >actual &&
> +	cat >expect <<-\EOF &&
> +	refs/bundles/base
> +	refs/bundles/left
> +	EOF
> +	test_cmp expect actual &&
> +	grep "clone> have $(git -C nego-bundle-list-no-heuristic rev-parse refs/bundles/left)" trace-packet.txt
> +'
> +
> +test_expect_success 'negotiation: bundle list (creationToken)' '
> +	test_when_finished rm -f trace*.txt &&
> +	cat >bundle-list <<-EOF &&
> +	[bundle]
> +		version = 1
> +		mode = all
> +		heuristic = creationToken
> +
> +	[bundle "bundle-1"]
> +		uri = file://$(pwd)/clone-from/bundle-1.bundle
> +		creationToken = 1
> +
> +	[bundle "bundle-2"]
> +		uri = file://$(pwd)/clone-from/bundle-2.bundle
> +		creationToken = 2
> +	EOF
> +
> +	GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
> +	git clone --no-local --bundle-uri="file://$(pwd)/bundle-list" \
> +		clone-from nego-bundle-list-heuristic &&
> +
> +	git -C nego-bundle-list-heuristic for-each-ref --format="%(refname)" >refs &&
> +	grep "refs/bundles/" refs >actual &&
> +	cat >expect <<-\EOF &&
> +	refs/bundles/base
> +	refs/bundles/left
> +	EOF
> +	test_cmp expect actual &&
> +	grep "clone> have $(git -C nego-bundle-list-heuristic rev-parse refs/bundles/left)" trace-packet.txt
> +'
> +
> +test_expect_success 'negotiation: bundle list with all wanted commits' '
> +	test_when_finished rm -f trace*.txt &&
> +	cat >bundle-list <<-EOF &&
> +	[bundle]
> +		version = 1
> +		mode = all
> +		heuristic = creationToken
> +
> +	[bundle "bundle-1"]
> +		uri = file://$(pwd)/clone-from/bundle-1.bundle
> +		creationToken = 1
> +
> +	[bundle "bundle-2"]
> +		uri = file://$(pwd)/clone-from/bundle-2.bundle
> +		creationToken = 2
> +	EOF
> +
> +	GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
> +	git clone --no-local --single-branch --branch=left --no-tags \
> +		--bundle-uri="file://$(pwd)/bundle-list" \
> +		clone-from nego-bundle-list-all &&
> +
> +	git -C nego-bundle-list-all for-each-ref --format="%(refname)" >refs &&
> +	grep "refs/bundles/" refs >actual &&
> +	cat >expect <<-\EOF &&
> +	refs/bundles/base
> +	refs/bundles/left
> +	EOF
> +	test_cmp expect actual &&
> +	# We already have all needed commits so no "want" needed.
> +	! grep "clone> want " trace-packet.txt
> +'
> +
>  #########################################################################
>  # HTTP tests begin here
>
>
> base-commit: d8ab1d464d07baa30e5a180eb33b3f9aa5c93adf
> --
> gitgitgadget

This update looks good to me, Thanks.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v3 0/4] object checking related additions and fixes for bundles in fetches
  2024-05-20 12:36 ` [PATCH v2] bundle-uri: verify oid before writing refs blanet via GitGitGadget
  2024-05-21 15:41   ` Karthik Nayak
@ 2024-05-27 15:41   ` blanet via GitGitGadget
  2024-05-27 15:41     ` [PATCH v3 1/4] bundle-uri: verify oid before writing refs Xing Xin via GitGitGadget
                       ` (3 more replies)
  1 sibling, 4 replies; 24+ messages in thread
From: blanet via GitGitGadget @ 2024-05-27 15:41 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, blanet

While attempting to fix a reference negotiation bug in bundle-uri, we
discovered that the fetch process are missing some helpful object validation
logic when processing bundles. The main issues are:

 * In the bundle-uri scenario, we did not validate object IDs before writing
   bundle references. This is the root cause of the original negotiation bug
   in bundle-uri, and can cause potential repository corruption.
 * The existing fetch.fsckObjects and transfer.fsckObjects are not detected
   when directly fetching bundles. In fact there is no object validation
   support for unbundle.

The first patch fixes the bundle-uri negotiation issue by dropping the
REF_SKIP_OID_VERIFICATION flag when writing bundle references.

Patch 2~4 extend bundle.c:unbundle with a unbundle_fsck_flags to control
object fscking in different scenarios, the implementation mainly follows
what Junio suggested on the mailing list.

Xing Xin (4):
  bundle-uri: verify oid before writing refs
  unbundle: introduce unbundle_fsck_flags for fsckobjects handling
  fetch-pack: expose fsckObjects configuration logic
  unbundle: introduce new option UNBUNDLE_FSCK_FOLLOW_FETCH

 builtin/bundle.c            |   2 +-
 bundle-uri.c                |   5 +-
 bundle.c                    |  20 ++++-
 bundle.h                    |   9 +-
 fetch-pack.c                |  18 ++--
 fetch-pack.h                |   2 +
 t/t5558-clone-bundle-uri.sh | 163 +++++++++++++++++++++++++++++++++++-
 t/t5607-clone-bundle.sh     |  23 +++++
 transport.c                 |   2 +-
 9 files changed, 227 insertions(+), 17 deletions(-)


base-commit: b9cfe4845cb2562584837bc0101c0ab76490a239
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1730%2Fblanet%2Fxx%2Fbundle-uri-bug-using-bundle-list-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1730/blanet/xx/bundle-uri-bug-using-bundle-list-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/1730

Range-diff vs v2:

 1:  8bdeacf1360 = 1:  8f488a5eeaa bundle-uri: verify oid before writing refs
 -:  ----------- > 2:  057c697970f unbundle: introduce unbundle_fsck_flags for fsckobjects handling
 -:  ----------- > 3:  67401d4fbcb fetch-pack: expose fsckObjects configuration logic
 -:  ----------- > 4:  c19b8f633cb unbundle: introduce new option UNBUNDLE_FSCK_FOLLOW_FETCH

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v3 1/4] bundle-uri: verify oid before writing refs
  2024-05-27 15:41   ` [PATCH v3 0/4] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
@ 2024-05-27 15:41     ` Xing Xin via GitGitGadget
  2024-05-28 11:55       ` Patrick Steinhardt
  2024-05-27 15:41     ` [PATCH v3 2/4] unbundle: introduce unbundle_fsck_flags for fsckobjects handling Xing Xin via GitGitGadget
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 24+ messages in thread
From: Xing Xin via GitGitGadget @ 2024-05-27 15:41 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, blanet, Xing Xin

From: Xing Xin <xingxin.xx@bytedance.com>

When using the bundle-uri mechanism with a bundle list containing
multiple interrelated bundles, we encountered a bug where tips from
downloaded bundles were not discovered, thus resulting in rather slow
clones. This was particularly problematic when employing the heuristic
`creationTokens`.

And this is easy to reproduce. Suppose we have a repository with a
single branch `main` pointing to commit `A`, firstly we create a base
bundle with

  git bundle create base.bundle main

Then let's add a new commit `B` on top of `A`, so that an incremental
bundle for `main` can be created with

  git bundle create incr.bundle A..main

Now we can generate a bundle list with the following content:

  [bundle]
      version = 1
      mode = all
      heuristic = creationToken

  [bundle "base"]
      uri = base.bundle
      creationToken = 1

  [bundle "incr"]
      uri = incr.bundle
      creationToken = 2

A fresh clone with the bundle list above would give the expected
`refs/bundles/main` pointing at `B` in new repository, in other words we
already had everything locally from the bundles, but git would still
download everything from server as if we got nothing.

So why the `refs/bundles/main` is not discovered? After some digging I
found that:

1. Bundles in bundle list are downloaded to local files via
   `download_bundle_list` or via `fetch_bundles_by_token` for the
   creationToken heuristic case.
2. Then it tries to unbundle each bundle via `unbundle_from_file`, which
   is called by `unbundle_all_bundles` or called within
   `fetch_bundles_by_token` for the creationToken heuristic case.
3. Here, we first read the bundle header to get all the prerequisites
   for the bundle, this is done in `read_bundle_header`.
4. Then we call `unbundle`, which calls `verify_bundle` to ensure that
   the repository does indeed contain the prerequisites mentioned in the
   bundle.
5. The `verify_bundle` will call `parse_object`, within which the
   `prepare_packed_git` or `reprepare_packed_git` is eventually called,
   which means that the `raw_object_store->packed_git` data gets filled
   in and ``packed_git_initialized` is set. This also means consecutive
   calls to `prepare_packed_git` doesn't re-initiate
   `raw_object_store->packed_git` since `packed_git_initialized` already
   is set.
6. If `unbundle` succeeds, it writes some refs via `refs_update_ref`
   with `REF_SKIP_OID_VERIFICATION` set. So the bundle refs which can
   target arbitrary objects are written to the repository.
7. Finally in `do_fetch_pack_v2`, `mark_complete_and_common_ref` and
   `mark_tips` are called with `OBJECT_INFO_QUICK` set to find local
   tips. Here it won't call `reprepare_packed_git` anymore so it would
   fail to parse oids that only reside in the last bundle.

Back to the example above, when unbunding `incr.bundle`, `base.pack` is
enlisted to `packed_git` bacause of the prerequisites to verify. While
we can not find `B` for negotiation at a latter time because `B` exists
in `incr.pack` which is not enlisted in `packed_git`.

This commit fixes this bug by dropping the `REF_SKIP_OID_VERIFICATION`
flag when writing bundle refs, so we can:

1. Ensure that the bundle refs we are writing are pointing to valid
   objects.
2. Ensure all the tips from bundle refs can be correctly parsed.

And a set of negotiation related tests for bundle-uri are added.

Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
---
 bundle-uri.c                |   3 +-
 t/t5558-clone-bundle-uri.sh | 129 ++++++++++++++++++++++++++++++++++++
 2 files changed, 130 insertions(+), 2 deletions(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index 91b3319a5c1..65666a11d9c 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -400,8 +400,7 @@ static int unbundle_from_file(struct repository *r, const char *file)
 		refs_update_ref(get_main_ref_store(the_repository),
 				"fetched bundle", bundle_ref.buf, oid,
 				has_old ? &old_oid : NULL,
-				REF_SKIP_OID_VERIFICATION,
-				UPDATE_REFS_MSG_ON_ERR);
+				0, UPDATE_REFS_MSG_ON_ERR);
 	}
 
 	bundle_header_release(&header);
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index 1ca5f745e73..a5b04d6f187 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -20,7 +20,10 @@ test_expect_success 'fail to clone from non-bundle file' '
 test_expect_success 'create bundle' '
 	git init clone-from &&
 	git -C clone-from checkout -b topic &&
+
 	test_commit -C clone-from A &&
+	git -C clone-from bundle create A.bundle topic &&
+
 	test_commit -C clone-from B &&
 	git -C clone-from bundle create B.bundle topic
 '
@@ -259,6 +262,132 @@ test_expect_success 'clone bundle list (file, any mode, all failures)' '
 	! grep "refs/bundles/" refs
 '
 
+#########################################################################
+# Clone negotiation related tests begin here
+
+test_expect_success 'negotiation: bundle with part of wanted commits' '
+	test_when_finished rm -rf trace*.txt &&
+	GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+	git clone --no-local --bundle-uri="clone-from/A.bundle" \
+		clone-from nego-bundle-part &&
+	git -C nego-bundle-part for-each-ref --format="%(refname)" >refs &&
+	grep "refs/bundles/" refs >actual &&
+	cat >expect <<-\EOF &&
+	refs/bundles/topic
+	EOF
+	test_cmp expect actual &&
+	# Ensure that refs/bundles/topic are sent as "have".
+	grep "clone> have $(git -C clone-from rev-parse A)" trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle with all wanted commits' '
+	test_when_finished rm -rf trace*.txt &&
+	GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+	git clone --no-local --single-branch --branch=topic --no-tags \
+		--bundle-uri="clone-from/B.bundle" \
+		clone-from nego-bundle-all &&
+	git -C nego-bundle-all for-each-ref --format="%(refname)" >refs &&
+	grep "refs/bundles/" refs >actual &&
+	cat >expect <<-\EOF &&
+	refs/bundles/topic
+	EOF
+	test_cmp expect actual &&
+	# We already have all needed commits so no "want" needed.
+	! grep "clone> want " trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle list (no heuristic)' '
+	test_when_finished rm -f trace*.txt &&
+	cat >bundle-list <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+
+	[bundle "bundle-1"]
+		uri = file://$(pwd)/clone-from/bundle-1.bundle
+
+	[bundle "bundle-2"]
+		uri = file://$(pwd)/clone-from/bundle-2.bundle
+	EOF
+
+	GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+	git clone --no-local --bundle-uri="file://$(pwd)/bundle-list" \
+		clone-from nego-bundle-list-no-heuristic &&
+
+	git -C nego-bundle-list-no-heuristic for-each-ref --format="%(refname)" >refs &&
+	grep "refs/bundles/" refs >actual &&
+	cat >expect <<-\EOF &&
+	refs/bundles/base
+	refs/bundles/left
+	EOF
+	test_cmp expect actual &&
+	grep "clone> have $(git -C nego-bundle-list-no-heuristic rev-parse refs/bundles/left)" trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle list (creationToken)' '
+	test_when_finished rm -f trace*.txt &&
+	cat >bundle-list <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "bundle-1"]
+		uri = file://$(pwd)/clone-from/bundle-1.bundle
+		creationToken = 1
+
+	[bundle "bundle-2"]
+		uri = file://$(pwd)/clone-from/bundle-2.bundle
+		creationToken = 2
+	EOF
+
+	GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+	git clone --no-local --bundle-uri="file://$(pwd)/bundle-list" \
+		clone-from nego-bundle-list-heuristic &&
+
+	git -C nego-bundle-list-heuristic for-each-ref --format="%(refname)" >refs &&
+	grep "refs/bundles/" refs >actual &&
+	cat >expect <<-\EOF &&
+	refs/bundles/base
+	refs/bundles/left
+	EOF
+	test_cmp expect actual &&
+	grep "clone> have $(git -C nego-bundle-list-heuristic rev-parse refs/bundles/left)" trace-packet.txt
+'
+
+test_expect_success 'negotiation: bundle list with all wanted commits' '
+	test_when_finished rm -f trace*.txt &&
+	cat >bundle-list <<-EOF &&
+	[bundle]
+		version = 1
+		mode = all
+		heuristic = creationToken
+
+	[bundle "bundle-1"]
+		uri = file://$(pwd)/clone-from/bundle-1.bundle
+		creationToken = 1
+
+	[bundle "bundle-2"]
+		uri = file://$(pwd)/clone-from/bundle-2.bundle
+		creationToken = 2
+	EOF
+
+	GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
+	git clone --no-local --single-branch --branch=left --no-tags \
+		--bundle-uri="file://$(pwd)/bundle-list" \
+		clone-from nego-bundle-list-all &&
+
+	git -C nego-bundle-list-all for-each-ref --format="%(refname)" >refs &&
+	grep "refs/bundles/" refs >actual &&
+	cat >expect <<-\EOF &&
+	refs/bundles/base
+	refs/bundles/left
+	EOF
+	test_cmp expect actual &&
+	# We already have all needed commits so no "want" needed.
+	! grep "clone> want " trace-packet.txt
+'
+
 #########################################################################
 # HTTP tests begin here
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 2/4] unbundle: introduce unbundle_fsck_flags for fsckobjects handling
  2024-05-27 15:41   ` [PATCH v3 0/4] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
  2024-05-27 15:41     ` [PATCH v3 1/4] bundle-uri: verify oid before writing refs Xing Xin via GitGitGadget
@ 2024-05-27 15:41     ` Xing Xin via GitGitGadget
  2024-05-28 12:03       ` Patrick Steinhardt
  2024-05-27 15:41     ` [PATCH v3 3/4] fetch-pack: expose fsckObjects configuration logic Xing Xin via GitGitGadget
  2024-05-27 15:41     ` [PATCH v3 4/4] unbundle: introduce new option UNBUNDLE_FSCK_FOLLOW_FETCH Xing Xin via GitGitGadget
  3 siblings, 1 reply; 24+ messages in thread
From: Xing Xin via GitGitGadget @ 2024-05-27 15:41 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, blanet, Xing Xin

From: Xing Xin <xingxin.xx@bytedance.com>

This commit adds a new enum `unbundle_fsck_flags` which is designed to
control the fsck behavior when unbundling. `unbundle` can use this newly
passed in enum to further decide whether to enable `--fsck-objects` for
"git-index-pack".

Currently only `UNBUNDLE_FSCK_NEVER` and `UNBUNDLE_FSCK_ALWAYS` are
supported as the very basic options. Another interesting option would be
added in later commits.

Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
---
 builtin/bundle.c |  2 +-
 bundle-uri.c     |  2 +-
 bundle.c         | 12 +++++++++++-
 bundle.h         |  8 +++++++-
 transport.c      |  2 +-
 5 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/builtin/bundle.c b/builtin/bundle.c
index 3ad11dc5d05..6c10961c640 100644
--- a/builtin/bundle.c
+++ b/builtin/bundle.c
@@ -212,7 +212,7 @@ static int cmd_bundle_unbundle(int argc, const char **argv, const char *prefix)
 		strvec_pushl(&extra_index_pack_args, "-v", "--progress-title",
 			     _("Unbundling objects"), NULL);
 	ret = !!unbundle(the_repository, &header, bundle_fd,
-			 &extra_index_pack_args, 0) ||
+			 &extra_index_pack_args, 0, UNBUNDLE_FSCK_NEVER) ||
 		list_bundle_refs(&header, argc, argv);
 	bundle_header_release(&header);
 cleanup:
diff --git a/bundle-uri.c b/bundle-uri.c
index 65666a11d9c..80f02aac6f1 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -373,7 +373,7 @@ static int unbundle_from_file(struct repository *r, const char *file)
 	 * the prerequisite commits.
 	 */
 	if ((result = unbundle(r, &header, bundle_fd, NULL,
-			       VERIFY_BUNDLE_QUIET)))
+			       VERIFY_BUNDLE_QUIET, UNBUNDLE_FSCK_ALWAYS)))
 		return 1;
 
 	/*
diff --git a/bundle.c b/bundle.c
index 95367c2d0a0..a922d592782 100644
--- a/bundle.c
+++ b/bundle.c
@@ -612,7 +612,8 @@ int create_bundle(struct repository *r, const char *path,
 
 int unbundle(struct repository *r, struct bundle_header *header,
 	     int bundle_fd, struct strvec *extra_index_pack_args,
-	     enum verify_bundle_flags flags)
+	     enum verify_bundle_flags flags,
+	     enum unbundle_fsck_flags fsck_flags)
 {
 	struct child_process ip = CHILD_PROCESS_INIT;
 
@@ -625,6 +626,15 @@ int unbundle(struct repository *r, struct bundle_header *header,
 	if (header->filter.choice)
 		strvec_push(&ip.args, "--promisor=from-bundle");
 
+	switch (fsck_flags) {
+	case UNBUNDLE_FSCK_ALWAYS:
+		strvec_push(&ip.args, "--fsck-objects");
+		break;
+	case UNBUNDLE_FSCK_NEVER:
+	default:
+		break;
+	}
+
 	if (extra_index_pack_args) {
 		strvec_pushv(&ip.args, extra_index_pack_args->v);
 		strvec_clear(extra_index_pack_args);
diff --git a/bundle.h b/bundle.h
index 021adbdcbb3..cfa9daddda6 100644
--- a/bundle.h
+++ b/bundle.h
@@ -30,6 +30,11 @@ int create_bundle(struct repository *r, const char *path,
 		  int argc, const char **argv, struct strvec *pack_options,
 		  int version);
 
+enum unbundle_fsck_flags {
+	UNBUNDLE_FSCK_NEVER = 0,
+	UNBUNDLE_FSCK_ALWAYS,
+};
+
 enum verify_bundle_flags {
 	VERIFY_BUNDLE_VERBOSE = (1 << 0),
 	VERIFY_BUNDLE_QUIET = (1 << 1),
@@ -53,7 +58,8 @@ int verify_bundle(struct repository *r, struct bundle_header *header,
  */
 int unbundle(struct repository *r, struct bundle_header *header,
 	     int bundle_fd, struct strvec *extra_index_pack_args,
-	     enum verify_bundle_flags flags);
+	     enum verify_bundle_flags flags,
+	     enum unbundle_fsck_flags fsck_flags);
 int list_bundle_refs(struct bundle_header *header,
 		int argc, const char **argv);
 
diff --git a/transport.c b/transport.c
index 0ad04b77fd2..6799988f10c 100644
--- a/transport.c
+++ b/transport.c
@@ -184,7 +184,7 @@ static int fetch_refs_from_bundle(struct transport *transport,
 	if (!data->get_refs_from_bundle_called)
 		get_refs_from_bundle_inner(transport);
 	ret = unbundle(the_repository, &data->header, data->fd,
-		       &extra_index_pack_args, 0);
+		       &extra_index_pack_args, 0, UNBUNDLE_FSCK_ALWAYS);
 	transport->hash_algo = data->header.hash_algo;
 	return ret;
 }
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 3/4] fetch-pack: expose fsckObjects configuration logic
  2024-05-27 15:41   ` [PATCH v3 0/4] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
  2024-05-27 15:41     ` [PATCH v3 1/4] bundle-uri: verify oid before writing refs Xing Xin via GitGitGadget
  2024-05-27 15:41     ` [PATCH v3 2/4] unbundle: introduce unbundle_fsck_flags for fsckobjects handling Xing Xin via GitGitGadget
@ 2024-05-27 15:41     ` Xing Xin via GitGitGadget
  2024-05-28 12:03       ` Patrick Steinhardt
  2024-05-27 15:41     ` [PATCH v3 4/4] unbundle: introduce new option UNBUNDLE_FSCK_FOLLOW_FETCH Xing Xin via GitGitGadget
  3 siblings, 1 reply; 24+ messages in thread
From: Xing Xin via GitGitGadget @ 2024-05-27 15:41 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, blanet, Xing Xin

From: Xing Xin <xingxin.xx@bytedance.com>

Currently we can use "transfer.fsckObjects" or "fetch.fsckObjects" to
control whether to enable checks for broken objects during fetching. But
these configs are only acknowledged by `fetch-pack.c:get_pack` and do
not make sense when fetching from bundles or using bundle-uris.

This commit exposed the fetch-then-transfer configuration logic by
adding a new function `fetch_pack_fsck_objects` in fetch-pack.h. In next
commit, this new function will be used by `unbundle` in fetching
scenarios.

Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
---
 fetch-pack.c | 18 ++++++++++++------
 fetch-pack.h |  2 ++
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index 7d2aef21add..81a64be6951 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -954,12 +954,7 @@ static int get_pack(struct fetch_pack_args *args,
 		strvec_push(&cmd.args, alternate_shallow_file);
 	}
 
-	if (fetch_fsck_objects >= 0
-	    ? fetch_fsck_objects
-	    : transfer_fsck_objects >= 0
-	    ? transfer_fsck_objects
-	    : 0)
-		fsck_objects = 1;
+	fsck_objects = fetch_pack_fsck_objects();
 
 	if (do_keep || args->from_promisor || index_pack_args || fsck_objects) {
 		if (pack_lockfiles || fsck_objects)
@@ -2046,6 +2041,17 @@ static const struct object_id *iterate_ref_map(void *cb_data)
 	return &ref->old_oid;
 }
 
+int fetch_pack_fsck_objects(void)
+{
+	fetch_pack_setup();
+
+	return fetch_fsck_objects >= 0
+	       ? fetch_fsck_objects
+	       : transfer_fsck_objects >= 0
+	       ? transfer_fsck_objects
+	       : 0;
+}
+
 struct ref *fetch_pack(struct fetch_pack_args *args,
 		       int fd[],
 		       const struct ref *ref,
diff --git a/fetch-pack.h b/fetch-pack.h
index 6775d265175..38956d9b748 100644
--- a/fetch-pack.h
+++ b/fetch-pack.h
@@ -101,4 +101,6 @@ void negotiate_using_fetch(const struct oid_array *negotiation_tips,
  */
 int report_unmatched_refs(struct ref **sought, int nr_sought);
 
+int fetch_pack_fsck_objects(void);
+
 #endif
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 4/4] unbundle: introduce new option UNBUNDLE_FSCK_FOLLOW_FETCH
  2024-05-27 15:41   ` [PATCH v3 0/4] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
                       ` (2 preceding siblings ...)
  2024-05-27 15:41     ` [PATCH v3 3/4] fetch-pack: expose fsckObjects configuration logic Xing Xin via GitGitGadget
@ 2024-05-27 15:41     ` Xing Xin via GitGitGadget
  2024-05-28 12:05       ` Patrick Steinhardt
  3 siblings, 1 reply; 24+ messages in thread
From: Xing Xin via GitGitGadget @ 2024-05-27 15:41 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, blanet, Xing Xin

From: Xing Xin <xingxin.xx@bytedance.com>

This commit adds a new option `UNBUNDLE_FSCK_FOLLOW_FETCH` to
`unbundle_fsck_flags`, this new flag is currently used in the _fetch_
process by

- `transport.c:fetch_refs_from_bundle` for fetching directly from a
  bundle.
- `bundle-uri.c:unbundle_from_file` for unbundling bundles downloaded
  from bundle-uri.

So we now have a relatively consistent logic for checking objects during
fetching. Add tests for the above two situations are added.

Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
---
 bundle-uri.c                |  2 +-
 bundle.c                    | 10 +++++++++-
 bundle.h                    |  1 +
 t/t5558-clone-bundle-uri.sh | 36 +++++++++++++++++++++++++++++++-----
 t/t5607-clone-bundle.sh     | 23 +++++++++++++++++++++++
 transport.c                 |  2 +-
 6 files changed, 66 insertions(+), 8 deletions(-)

diff --git a/bundle-uri.c b/bundle-uri.c
index 80f02aac6f1..0da3e5a61b9 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -373,7 +373,7 @@ static int unbundle_from_file(struct repository *r, const char *file)
 	 * the prerequisite commits.
 	 */
 	if ((result = unbundle(r, &header, bundle_fd, NULL,
-			       VERIFY_BUNDLE_QUIET, UNBUNDLE_FSCK_ALWAYS)))
+			       VERIFY_BUNDLE_QUIET, UNBUNDLE_FSCK_FOLLOW_FETCH)))
 		return 1;
 
 	/*
diff --git a/bundle.c b/bundle.c
index a922d592782..c7344543aa4 100644
--- a/bundle.c
+++ b/bundle.c
@@ -17,6 +17,7 @@
 #include "list-objects-filter-options.h"
 #include "connected.h"
 #include "write-or-die.h"
+#include "fetch-pack.h"
 
 static const char v2_bundle_signature[] = "# v2 git bundle\n";
 static const char v3_bundle_signature[] = "# v3 git bundle\n";
@@ -616,6 +617,7 @@ int unbundle(struct repository *r, struct bundle_header *header,
 	     enum unbundle_fsck_flags fsck_flags)
 {
 	struct child_process ip = CHILD_PROCESS_INIT;
+	int fsck_objects = 0;
 
 	if (verify_bundle(r, header, flags))
 		return -1;
@@ -628,13 +630,19 @@ int unbundle(struct repository *r, struct bundle_header *header,
 
 	switch (fsck_flags) {
 	case UNBUNDLE_FSCK_ALWAYS:
-		strvec_push(&ip.args, "--fsck-objects");
+		fsck_objects = 1;
+		break;
+	case UNBUNDLE_FSCK_FOLLOW_FETCH:
+		fsck_objects = fetch_pack_fsck_objects();
 		break;
 	case UNBUNDLE_FSCK_NEVER:
 	default:
 		break;
 	}
 
+	if (fsck_objects)
+		strvec_push(&ip.args, "--fsck-objects");
+
 	if (extra_index_pack_args) {
 		strvec_pushv(&ip.args, extra_index_pack_args->v);
 		strvec_clear(extra_index_pack_args);
diff --git a/bundle.h b/bundle.h
index cfa9daddda6..c46488422ce 100644
--- a/bundle.h
+++ b/bundle.h
@@ -33,6 +33,7 @@ int create_bundle(struct repository *r, const char *path,
 enum unbundle_fsck_flags {
 	UNBUNDLE_FSCK_NEVER = 0,
 	UNBUNDLE_FSCK_ALWAYS,
+	UNBUNDLE_FSCK_FOLLOW_FETCH,
 };
 
 enum verify_bundle_flags {
diff --git a/t/t5558-clone-bundle-uri.sh b/t/t5558-clone-bundle-uri.sh
index a5b04d6f187..3df4d44e78f 100755
--- a/t/t5558-clone-bundle-uri.sh
+++ b/t/t5558-clone-bundle-uri.sh
@@ -19,13 +19,30 @@ test_expect_success 'fail to clone from non-bundle file' '
 
 test_expect_success 'create bundle' '
 	git init clone-from &&
-	git -C clone-from checkout -b topic &&
+	(
+		cd clone-from &&
+		git checkout -b topic &&
+
+		test_commit A &&
+		git bundle create A.bundle topic &&
+
+		test_commit B &&
+		git bundle create B.bundle topic &&
+
+		cat >data <<-EOF &&
+		tree $(git rev-parse HEAD^{tree})
+		parent $(git rev-parse HEAD)
+		author A U Thor
+		committer A U Thor
 
-	test_commit -C clone-from A &&
-	git -C clone-from bundle create A.bundle topic &&
+		commit: this is a commit with bad emails
 
-	test_commit -C clone-from B &&
-	git -C clone-from bundle create B.bundle topic
+		EOF
+		git hash-object --literally -t commit -w --stdin <data >commit &&
+		git branch bad $(cat commit) &&
+		git bundle create bad.bundle bad &&
+		git update-ref -d refs/heads/bad
+	)
 '
 
 test_expect_success 'clone with path bundle' '
@@ -36,6 +53,15 @@ test_expect_success 'clone with path bundle' '
 	test_cmp expect actual
 '
 
+test_expect_success 'clone with bad bundle' '
+	git -c fetch.fsckObjects=true clone --bundle-uri="clone-from/bad.bundle" \
+		clone-from clone-bad 2>err &&
+	# Unbundle fails, but clone can still proceed.
+	test_grep "missingEmail" err &&
+	git -C clone-bad for-each-ref --format="%(refname)" >refs &&
+	! grep "refs/bundles/" refs
+'
+
 test_expect_success 'clone with path bundle and non-default hash' '
 	test_when_finished "rm -rf clone-path-non-default-hash" &&
 	GIT_DEFAULT_HASH=sha256 git clone --bundle-uri="clone-from/B.bundle" \
diff --git a/t/t5607-clone-bundle.sh b/t/t5607-clone-bundle.sh
index 0d1e92d9963..423b35ac237 100755
--- a/t/t5607-clone-bundle.sh
+++ b/t/t5607-clone-bundle.sh
@@ -138,6 +138,29 @@ test_expect_success 'fetch SHA-1 from bundle' '
 	git fetch --no-tags foo/tip.bundle "$(cat hash)"
 '
 
+test_expect_success 'clone bundle with fetch.fsckObjects' '
+	test_create_repo bundle-fsck &&
+	(
+		cd bundle-fsck &&
+		test_commit first &&
+		cat >data <<-EOF &&
+		tree $(git rev-parse HEAD^{tree})
+		parent $(git rev-parse HEAD)
+		author A U Thor
+		committer A U Thor
+
+		commit: this is a commit with bad emails
+
+		EOF
+		git hash-object --literally -t commit -w --stdin <data >commit &&
+		git branch bad $(cat commit) &&
+		git bundle create bad.bundle bad
+	) &&
+	test_must_fail git -c fetch.fsckObjects=true \
+		clone bundle-fsck/bad.bundle bundle-fsck-clone 2>err &&
+	test_grep "missingEmail" err
+'
+
 test_expect_success 'git bundle uses expected default format' '
 	git bundle create bundle HEAD^.. &&
 	cat >expect <<-EOF &&
diff --git a/transport.c b/transport.c
index 6799988f10c..a140d4b03c0 100644
--- a/transport.c
+++ b/transport.c
@@ -184,7 +184,7 @@ static int fetch_refs_from_bundle(struct transport *transport,
 	if (!data->get_refs_from_bundle_called)
 		get_refs_from_bundle_inner(transport);
 	ret = unbundle(the_repository, &data->header, data->fd,
-		       &extra_index_pack_args, 0, UNBUNDLE_FSCK_ALWAYS);
+		       &extra_index_pack_args, 0, UNBUNDLE_FSCK_FOLLOW_FETCH);
 	transport->hash_algo = data->header.hash_algo;
 	return ret;
 }
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re:Re: [PATCH] bundle-uri: refresh packed_git if unbundle succeed
  2024-05-20 17:19       ` Junio C Hamano
@ 2024-05-27 16:04         ` Xing Xin
  0 siblings, 0 replies; 24+ messages in thread
From: Xing Xin @ 2024-05-27 16:04 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Patrick Steinhardt, blanet via GitGitGadget, git, Xing Xin

At 2024-05-21 01:19:02, "Junio C Hamano" <gitster@pobox.com> wrote:
>"Xing Xin" <bupt_xingxin@163.com> writes:
>
>> Personally I think data from bundles and data received via network
>> should be treated equally.
>
>Yup, that is not personal ;-) but is universally accepted as a good
>discipline.  In the case of bundle-uri, the bundle came over the
>network so it is even more true that they should be treated the
>same.
>
>> For "fetch-pack" we now have some configs
>> such as  "fetch.fsckobjects" and "transfer.fsckobjects" to decide the
>> behavior, these configs are invisible when we are fetching bundles.
>
>When fetching over network, transport.c:fetch_refs_via_pack() calls
>fetch_pack.c:fetch_pack(), which eventually calls get_pack() and the
>configuration variables are honored there.  It appears that the
>transport layer is unaware of the .fsckobjects configuration knobs.
>
>When fetching from a bundle, transport.c:fetch_refs_from_bundle()
>calls bundle.c:unbundle().  This function has three callers, i.e.
>"git bundle unbundle", normal fetching from a bundle, and more
>recently added bundle-uri codepaths.  
>
>I think one reasonable approach to take is to add an extra parameter
>that takes one of three values: (never, use-config, always), and
>conditionally add "--fsck-objects" to the command line of the
>index-pack.  Teach "git bundle unbundle" the "--fsck-objects" option
>so that it can pass 'never' or 'always' from the command line, and
>pass 'use-config' from the code paths for normal fetching from a
>budnle and bundle-uri.
>
>To implement use-config, you'd probably need to refactor a small
>part of fetch-pack.c:get_pack()
>
>	if (fetch_fsck_objects >= 0
>	    ? fetch_fsck_objects
>	    : transfer_fsck_objects >= 0
>	    ? transfer_fsck_objects
>	    : 0)
>		fsck_objects = 1;
>
>into a public function (to support a caller like unbundle() that
>comes from sideways, the new function may also need to call
>fetch_pack_setup() to prime them).
>
>A patch series may take a structure like so:
>
> * define enum { UNBUNDLE_FSCK_NEVER, UNBUNDLE_FSCK_ALWAYS } in
>   bundle.h, have bundle.c:unbundle() accept a new parameter of that
>   type, and conditionally add "--fsck-objects" to its call to
>   "index-pack".  "git bundle unbundle" can pass 'never' to its
>   invocation to unbundle() as an easy way to test it.  For the
>   other two callers, we can start by passing 'always'.
>
> * (optional) teach "git bundle unbundle" a new "--fsck-objects"
>   option to allow passing 'always' to its call to unbundle().  With
>   that, add tests to feed it a bundle with questionable objects in
>   it and make sure that unbundling notices.

I just submitted a new series mainly focusing on the unbundle handling during
fetches. I would like to submit a new one for teaching "git bundle unbundle" a
"--fsck-objects" option after this to make changes more targeted.

> * refactor fetch-pack.c:get_pack() to make the fetch-then-transfer
>   configuration logic available to external callers.
>
> * Add UNBUNDLE_FSCK_USE_CONFIG to the enum, enhance unbundle() to

I tend to use `UNBUNDLE_FSCK_FOLLOW_FETCH` because this option is only
used in fetches, though the current implementation is indeed reading configs.

>   react to the value by calling the helper function you introduced
>   in the previous step.

The new patch series is constructed right as you suggested, thanks a lot for
your help. 

Xing Xin


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 1/4] bundle-uri: verify oid before writing refs
  2024-05-27 15:41     ` [PATCH v3 1/4] bundle-uri: verify oid before writing refs Xing Xin via GitGitGadget
@ 2024-05-28 11:55       ` Patrick Steinhardt
  0 siblings, 0 replies; 24+ messages in thread
From: Patrick Steinhardt @ 2024-05-28 11:55 UTC (permalink / raw)
  To: Xing Xin via GitGitGadget; +Cc: git, Karthik Nayak, blanet, Xing Xin

[-- Attachment #1: Type: text/plain, Size: 1895 bytes --]

On Mon, May 27, 2024 at 03:41:54PM +0000, Xing Xin via GitGitGadget wrote:
> From: Xing Xin <xingxin.xx@bytedance.com>
[snip]
> 5. The `verify_bundle` will call `parse_object`, within which the
>    `prepare_packed_git` or `reprepare_packed_git` is eventually called,
>    which means that the `raw_object_store->packed_git` data gets filled
>    in and ``packed_git_initialized` is set. This also means consecutive

s/``/`/

[snip]
> This commit fixes this bug by dropping the `REF_SKIP_OID_VERIFICATION`
> flag when writing bundle refs, so we can:
> 
> 1. Ensure that the bundle refs we are writing are pointing to valid
>    objects.
> 2. Ensure all the tips from bundle refs can be correctly parsed.

I think one angle that your explanation doesn't cover is why exactly
dropping the flag fixes the observed issue.

> And a set of negotiation related tests for bundle-uri are added.

s/And/Add/

[snip]
> +#########################################################################
> +# Clone negotiation related tests begin here
> +
> +test_expect_success 'negotiation: bundle with part of wanted commits' '
> +	test_when_finished rm -rf trace*.txt &&
> +	GIT_TRACE_PACKET="$(pwd)/trace-packet.txt" \
> +	git clone --no-local --bundle-uri="clone-from/A.bundle" \
> +		clone-from nego-bundle-part &&
> +	git -C nego-bundle-part for-each-ref --format="%(refname)" >refs &&
> +	grep "refs/bundles/" refs >actual &&
> +	cat >expect <<-\EOF &&
> +	refs/bundles/topic
> +	EOF
> +	test_cmp expect actual &&
> +	# Ensure that refs/bundles/topic are sent as "have".
> +	grep "clone> have $(git -C clone-from rev-parse A)" trace-packet.txt
> +'

As far as I can see there is no test that verifies the case where the
bundle contains refs, but misses the objects to satisfy the refs. Can we
craft such a bundle and exercise this new failure mode?

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/4] fetch-pack: expose fsckObjects configuration logic
  2024-05-27 15:41     ` [PATCH v3 3/4] fetch-pack: expose fsckObjects configuration logic Xing Xin via GitGitGadget
@ 2024-05-28 12:03       ` Patrick Steinhardt
  2024-05-28 17:10         ` Junio C Hamano
  0 siblings, 1 reply; 24+ messages in thread
From: Patrick Steinhardt @ 2024-05-28 12:03 UTC (permalink / raw)
  To: Xing Xin via GitGitGadget; +Cc: git, Karthik Nayak, blanet, Xing Xin

[-- Attachment #1: Type: text/plain, Size: 2849 bytes --]

On Mon, May 27, 2024 at 03:41:56PM +0000, Xing Xin via GitGitGadget wrote:
> From: Xing Xin <xingxin.xx@bytedance.com>
> 
> Currently we can use "transfer.fsckObjects" or "fetch.fsckObjects" to
> control whether to enable checks for broken objects during fetching. But
> these configs are only acknowledged by `fetch-pack.c:get_pack` and do
> not make sense when fetching from bundles or using bundle-uris.

Do they not make sense, or are they not effective? I assume you mean the
latter, right?

> This commit exposed the fetch-then-transfer configuration logic by

s/exposed/exposes/

> adding a new function `fetch_pack_fsck_objects` in fetch-pack.h. In next
> commit, this new function will be used by `unbundle` in fetching
> scenarios.
> 
> Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
> ---
>  fetch-pack.c | 18 ++++++++++++------
>  fetch-pack.h |  2 ++
>  2 files changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/fetch-pack.c b/fetch-pack.c
> index 7d2aef21add..81a64be6951 100644
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -954,12 +954,7 @@ static int get_pack(struct fetch_pack_args *args,
>  		strvec_push(&cmd.args, alternate_shallow_file);
>  	}
>  
> -	if (fetch_fsck_objects >= 0
> -	    ? fetch_fsck_objects
> -	    : transfer_fsck_objects >= 0
> -	    ? transfer_fsck_objects
> -	    : 0)
> -		fsck_objects = 1;

This statement is really weird to read, but that is certainly not the
fault of this patch, but...

> +	fsck_objects = fetch_pack_fsck_objects();
>  
>  	if (do_keep || args->from_promisor || index_pack_args || fsck_objects) {
>  		if (pack_lockfiles || fsck_objects)
> @@ -2046,6 +2041,17 @@ static const struct object_id *iterate_ref_map(void *cb_data)
>  	return &ref->old_oid;
>  }
>  
> +int fetch_pack_fsck_objects(void)
> +{
> +	fetch_pack_setup();
> +
> +	return fetch_fsck_objects >= 0
> +	       ? fetch_fsck_objects
> +	       : transfer_fsck_objects >= 0
> +	       ? transfer_fsck_objects
> +	       : 0;
> +}

... can we maybe rewrite it to something more customary here? The
following is way easier to read, at least for me.

	int fetch_pack_fsck_objects(void)
	{
		fetch_pack_setup();
		if (fetch_fsck_objects >= 0 ||
		    transfer_fsck_objects >= 0)
			return 1;
		return 0;
	}

>  struct ref *fetch_pack(struct fetch_pack_args *args,
>  		       int fd[],
>  		       const struct ref *ref,
> diff --git a/fetch-pack.h b/fetch-pack.h
> index 6775d265175..38956d9b748 100644
> --- a/fetch-pack.h
> +++ b/fetch-pack.h
> @@ -101,4 +101,6 @@ void negotiate_using_fetch(const struct oid_array *negotiation_tips,
>   */
>  int report_unmatched_refs(struct ref **sought, int nr_sought);
>  
> +int fetch_pack_fsck_objects(void);

Let's add a comment here saying what this function does.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 2/4] unbundle: introduce unbundle_fsck_flags for fsckobjects handling
  2024-05-27 15:41     ` [PATCH v3 2/4] unbundle: introduce unbundle_fsck_flags for fsckobjects handling Xing Xin via GitGitGadget
@ 2024-05-28 12:03       ` Patrick Steinhardt
  0 siblings, 0 replies; 24+ messages in thread
From: Patrick Steinhardt @ 2024-05-28 12:03 UTC (permalink / raw)
  To: Xing Xin via GitGitGadget; +Cc: git, Karthik Nayak, blanet, Xing Xin

[-- Attachment #1: Type: text/plain, Size: 694 bytes --]

On Mon, May 27, 2024 at 03:41:55PM +0000, Xing Xin via GitGitGadget wrote:
[snip]
> diff --git a/bundle.h b/bundle.h
> index 021adbdcbb3..cfa9daddda6 100644
> --- a/bundle.h
> +++ b/bundle.h
> @@ -30,6 +30,11 @@ int create_bundle(struct repository *r, const char *path,
>  		  int argc, const char **argv, struct strvec *pack_options,
>  		  int version);
>  
> +enum unbundle_fsck_flags {
> +	UNBUNDLE_FSCK_NEVER = 0,
> +	UNBUNDLE_FSCK_ALWAYS,
> +};
> +
>  enum verify_bundle_flags {
>  	VERIFY_BUNDLE_VERBOSE = (1 << 0),
>  	VERIFY_BUNDLE_QUIET = (1 << 1),

Wouldn't this have been a natural fit for the new flag, e.g. via
something like `VERIFY_BUNDLE_FSCK`?

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 4/4] unbundle: introduce new option UNBUNDLE_FSCK_FOLLOW_FETCH
  2024-05-27 15:41     ` [PATCH v3 4/4] unbundle: introduce new option UNBUNDLE_FSCK_FOLLOW_FETCH Xing Xin via GitGitGadget
@ 2024-05-28 12:05       ` Patrick Steinhardt
  0 siblings, 0 replies; 24+ messages in thread
From: Patrick Steinhardt @ 2024-05-28 12:05 UTC (permalink / raw)
  To: Xing Xin via GitGitGadget; +Cc: git, Karthik Nayak, blanet, Xing Xin

[-- Attachment #1: Type: text/plain, Size: 1191 bytes --]

On Mon, May 27, 2024 at 03:41:57PM +0000, Xing Xin via GitGitGadget wrote:
> From: Xing Xin <xingxin.xx@bytedance.com>
> diff --git a/t/t5607-clone-bundle.sh b/t/t5607-clone-bundle.sh
> index 0d1e92d9963..423b35ac237 100755
> --- a/t/t5607-clone-bundle.sh
> +++ b/t/t5607-clone-bundle.sh
> @@ -138,6 +138,29 @@ test_expect_success 'fetch SHA-1 from bundle' '
>  	git fetch --no-tags foo/tip.bundle "$(cat hash)"
>  '
>  
> +test_expect_success 'clone bundle with fetch.fsckObjects' '
> +	test_create_repo bundle-fsck &&
> +	(
> +		cd bundle-fsck &&
> +		test_commit first &&
> +		cat >data <<-EOF &&
> +		tree $(git rev-parse HEAD^{tree})
> +		parent $(git rev-parse HEAD)
> +		author A U Thor
> +		committer A U Thor
> +
> +		commit: this is a commit with bad emails
> +
> +		EOF
> +		git hash-object --literally -t commit -w --stdin <data >commit &&
> +		git branch bad $(cat commit) &&
> +		git bundle create bad.bundle bad
> +	) &&
> +	test_must_fail git -c fetch.fsckObjects=true \
> +		clone bundle-fsck/bad.bundle bundle-fsck-clone 2>err &&
> +	test_grep "missingEmail" err
> +'

Do we also want to have a test for `transfer.fsckObjects`?

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/4] fetch-pack: expose fsckObjects configuration logic
  2024-05-28 12:03       ` Patrick Steinhardt
@ 2024-05-28 17:10         ` Junio C Hamano
  2024-05-28 17:24           ` Junio C Hamano
  2024-05-29  5:52           ` Patrick Steinhardt
  0 siblings, 2 replies; 24+ messages in thread
From: Junio C Hamano @ 2024-05-28 17:10 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: Xing Xin via GitGitGadget, git, Karthik Nayak, blanet, Xing Xin

Patrick Steinhardt <ps@pks.im> writes:

>> +int fetch_pack_fsck_objects(void)
>> +{
>> +	fetch_pack_setup();
>> +
>> +	return fetch_fsck_objects >= 0
>> +	       ? fetch_fsck_objects
>> +	       : transfer_fsck_objects >= 0
>> +	       ? transfer_fsck_objects
>> +	       : 0;
>> +}
>
> ... can we maybe rewrite it to something more customary here? The
> following is way easier to read, at least for me.
>
> 	int fetch_pack_fsck_objects(void)
> 	{
> 		fetch_pack_setup();
> 		if (fetch_fsck_objects >= 0 ||
> 		    transfer_fsck_objects >= 0)
> 			return 1;
> 		return 0;
> 	}

But do they mean the same thing?  In a repository where

    [fetch] fsckobjects = no

is set, no matter what transfer.fsckobjects says (or left unspecified),
we want to return "no, we are not doing fsck".

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/4] fetch-pack: expose fsckObjects configuration logic
  2024-05-28 17:10         ` Junio C Hamano
@ 2024-05-28 17:24           ` Junio C Hamano
  2024-05-29  5:52             ` Patrick Steinhardt
  2024-05-29  5:52           ` Patrick Steinhardt
  1 sibling, 1 reply; 24+ messages in thread
From: Junio C Hamano @ 2024-05-28 17:24 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: Xing Xin via GitGitGadget, git, Karthik Nayak, blanet, Xing Xin

Junio C Hamano <gitster@pobox.com> writes:

> Patrick Steinhardt <ps@pks.im> writes:
>
>>> +int fetch_pack_fsck_objects(void)
>>> +{
>>> +	fetch_pack_setup();
>>> +
>>> +	return fetch_fsck_objects >= 0
>>> +	       ? fetch_fsck_objects
>>> +	       : transfer_fsck_objects >= 0
>>> +	       ? transfer_fsck_objects
>>> +	       : 0;
>>> +}
>>
>> ... can we maybe rewrite it to something more customary here? The
>> following is way easier to read, at least for me.
>>
>> 	int fetch_pack_fsck_objects(void)
>> 	{
>> 		fetch_pack_setup();
>> 		if (fetch_fsck_objects >= 0 ||
>> 		    transfer_fsck_objects >= 0)
>> 			return 1;
>> 		return 0;
>> 	}
>
> But do they mean the same thing?  In a repository where
>
>     [fetch] fsckobjects = no
>
> is set, no matter what transfer.fsckobjects says (or left unspecified),
> we want to return "no, we are not doing fsck".

The original before it was made into a helper function was written
as a cascade of ?: operators, because it had to be a single
expression.  As the body of a helper function, we now can sprinkle
multiple return statements in it.  I think the way that is easiest
to understand is

	/* the most specific, if specified */
	if (fetch_fsck_objects >= 0)
		return fetch_fsck_objects;
	/* the less specific, catch-all for both directions */
	if (transfer_fsck_objects >= 0)
        	return transfer_fsck_objects;
	/* the fallback hardcoded default */
	return 0;

without the /* comments */.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/4] fetch-pack: expose fsckObjects configuration logic
  2024-05-28 17:10         ` Junio C Hamano
  2024-05-28 17:24           ` Junio C Hamano
@ 2024-05-29  5:52           ` Patrick Steinhardt
  1 sibling, 0 replies; 24+ messages in thread
From: Patrick Steinhardt @ 2024-05-29  5:52 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Xing Xin via GitGitGadget, git, Karthik Nayak, blanet, Xing Xin

[-- Attachment #1: Type: text/plain, Size: 1278 bytes --]

On Tue, May 28, 2024 at 10:10:46AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> >> +int fetch_pack_fsck_objects(void)
> >> +{
> >> +	fetch_pack_setup();
> >> +
> >> +	return fetch_fsck_objects >= 0
> >> +	       ? fetch_fsck_objects
> >> +	       : transfer_fsck_objects >= 0
> >> +	       ? transfer_fsck_objects
> >> +	       : 0;
> >> +}
> >
> > ... can we maybe rewrite it to something more customary here? The
> > following is way easier to read, at least for me.
> >
> > 	int fetch_pack_fsck_objects(void)
> > 	{
> > 		fetch_pack_setup();
> > 		if (fetch_fsck_objects >= 0 ||
> > 		    transfer_fsck_objects >= 0)
> > 			return 1;
> > 		return 0;
> > 	}
> 
> But do they mean the same thing?  In a repository where
> 
>     [fetch] fsckobjects = no
> 
> is set, no matter what transfer.fsckobjects says (or left unspecified),
> we want to return "no, we are not doing fsck".

Oh, of course they don't. This here would be a faithful conversion:

	int fetch_pack_fsck_objects(void)
	{
		fetch_pack_setup();
		if (fetch_fsck_objects >= 0)
			return fetch_fsck_objects;
		if (transfer_fsck_objects >= 0)
			return transfer_fsck_objects;
		return 0;
	}

Still easier to read in my opinion.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 3/4] fetch-pack: expose fsckObjects configuration logic
  2024-05-28 17:24           ` Junio C Hamano
@ 2024-05-29  5:52             ` Patrick Steinhardt
  0 siblings, 0 replies; 24+ messages in thread
From: Patrick Steinhardt @ 2024-05-29  5:52 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Xing Xin via GitGitGadget, git, Karthik Nayak, blanet, Xing Xin

[-- Attachment #1: Type: text/plain, Size: 1787 bytes --]

On Tue, May 28, 2024 at 10:24:35AM -0700, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
> > Patrick Steinhardt <ps@pks.im> writes:
> >
> >>> +int fetch_pack_fsck_objects(void)
> >>> +{
> >>> +	fetch_pack_setup();
> >>> +
> >>> +	return fetch_fsck_objects >= 0
> >>> +	       ? fetch_fsck_objects
> >>> +	       : transfer_fsck_objects >= 0
> >>> +	       ? transfer_fsck_objects
> >>> +	       : 0;
> >>> +}
> >>
> >> ... can we maybe rewrite it to something more customary here? The
> >> following is way easier to read, at least for me.
> >>
> >> 	int fetch_pack_fsck_objects(void)
> >> 	{
> >> 		fetch_pack_setup();
> >> 		if (fetch_fsck_objects >= 0 ||
> >> 		    transfer_fsck_objects >= 0)
> >> 			return 1;
> >> 		return 0;
> >> 	}
> >
> > But do they mean the same thing?  In a repository where
> >
> >     [fetch] fsckobjects = no
> >
> > is set, no matter what transfer.fsckobjects says (or left unspecified),
> > we want to return "no, we are not doing fsck".
> 
> The original before it was made into a helper function was written
> as a cascade of ?: operators, because it had to be a single
> expression.  As the body of a helper function, we now can sprinkle
> multiple return statements in it.  I think the way that is easiest
> to understand is
> 
> 	/* the most specific, if specified */
> 	if (fetch_fsck_objects >= 0)
> 		return fetch_fsck_objects;
> 	/* the less specific, catch-all for both directions */
> 	if (transfer_fsck_objects >= 0)
>         	return transfer_fsck_objects;
> 	/* the fallback hardcoded default */
> 	return 0;
> 
> without the /* comments */.

Ah, right, didn't see this mail. My revised version looks the same as
yours, except for the added comments.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2024-05-29  5:52 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-15  3:01 [PATCH] bundle-uri: refresh packed_git if unbundle succeed blanet via GitGitGadget
2024-05-17  5:00 ` Patrick Steinhardt
2024-05-17 16:14   ` Junio C Hamano
2024-05-20 11:48     ` Xing Xin
2024-05-20 17:19       ` Junio C Hamano
2024-05-27 16:04         ` Xing Xin
2024-05-20  9:41   ` Xing Xin
2024-05-17  7:36 ` Karthik Nayak
2024-05-20 10:19   ` Xing Xin
2024-05-20 12:36 ` [PATCH v2] bundle-uri: verify oid before writing refs blanet via GitGitGadget
2024-05-21 15:41   ` Karthik Nayak
2024-05-27 15:41   ` [PATCH v3 0/4] object checking related additions and fixes for bundles in fetches blanet via GitGitGadget
2024-05-27 15:41     ` [PATCH v3 1/4] bundle-uri: verify oid before writing refs Xing Xin via GitGitGadget
2024-05-28 11:55       ` Patrick Steinhardt
2024-05-27 15:41     ` [PATCH v3 2/4] unbundle: introduce unbundle_fsck_flags for fsckobjects handling Xing Xin via GitGitGadget
2024-05-28 12:03       ` Patrick Steinhardt
2024-05-27 15:41     ` [PATCH v3 3/4] fetch-pack: expose fsckObjects configuration logic Xing Xin via GitGitGadget
2024-05-28 12:03       ` Patrick Steinhardt
2024-05-28 17:10         ` Junio C Hamano
2024-05-28 17:24           ` Junio C Hamano
2024-05-29  5:52             ` Patrick Steinhardt
2024-05-29  5:52           ` Patrick Steinhardt
2024-05-27 15:41     ` [PATCH v3 4/4] unbundle: introduce new option UNBUNDLE_FSCK_FOLLOW_FETCH Xing Xin via GitGitGadget
2024-05-28 12:05       ` Patrick Steinhardt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).