All of lore.kernel.org
 help / color / mirror / Atom feed
* "bad revision" fetch error when fetching missing objects from partial clones
@ 2021-05-07 12:56 Bagas Sanjaya
  2021-05-11 18:44 ` Jeff Hostetler
  0 siblings, 1 reply; 12+ messages in thread
From: Bagas Sanjaya @ 2021-05-07 12:56 UTC (permalink / raw)
  To: Git Users; +Cc: brian m . carlson

Hi,

I have a copy of Gitea application repository [1] on my local Git
server on my computer. I was playing with partial clones using that
repository as remote.

I began with blobless clone the repo by:

$ git clone https://<myhost>/bagas/gitea.git gitea --filter=blob:none

Then I tried to fetch missing objects. First, I gathered list of them:

$ git rev-list --objects --all --missing=print | grep -o -P '^\?\K\w+' > .git/missing.list

I had asked how to properly fetch objects from the list above before on
this list, and brian m. carlson (CC'ed) suggested [2] that I should
use xargs:

$ xargs git fetch origin < .git/missing.list

I expected that I received all missing objects. However, the error message
I got was something like below, repeated:

> remote: ...<skipped>
> Receiving objects: 100% (64/64), 154.49 KiB | 2.97 MiB/s, done.
> remote: ...<skipped>
> Receiving objects: 100% (37/37), 168.35 KiB | 4.95 MiB/s, done.
> Resolving deltas: 100% (5/5), done.
> Resolving deltas: 100% (49/49), completed with 47 local objects.
> fatal: bad revision 'd5e9cd36ab21839af3d116eff3221c53f6ca7fd6'
> error: https://<myhost>/bagas/gitea.git did not send all necessary objects

Then I repeated the object list gathering and fetching above, and the error
still occured. Even I tried fetching only one of problematic object above
and still errored.

When I inspected that object with `git cat-file -p` from my other, full
clone version (which I used for submitting changes upstream), the object
is properly displayed (in this case as source code in Go).

Note that I have uploadpack.allowfilter=true config on the server.

Am I missing something?

[1]: https://github.com/go-gitea/gitea
[2]: https://lore.kernel.org/git/YD7bczBsIR5rkqfc@camp.crustytoothpaste.net/

-- 
An old man doll... just what I always wanted! - Clara

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: "bad revision" fetch error when fetching missing objects from partial clones
  2021-05-07 12:56 "bad revision" fetch error when fetching missing objects from partial clones Bagas Sanjaya
@ 2021-05-11 18:44 ` Jeff Hostetler
  2021-05-13  9:57   ` Jeff King
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff Hostetler @ 2021-05-11 18:44 UTC (permalink / raw)
  To: Bagas Sanjaya, Git Users; +Cc: brian m . carlson



On 5/7/21 8:56 AM, Bagas Sanjaya wrote:
> Hi,
> 
> I have a copy of Gitea application repository [1] on my local Git
> server on my computer. I was playing with partial clones using that
> repository as remote.
> 
> I began with blobless clone the repo by:
> 
> $ git clone https://<myhost>/bagas/gitea.git gitea --filter=blob:none
> 
> Then I tried to fetch missing objects. First, I gathered list of them:
> 
> $ git rev-list --objects --all --missing=print | grep -o -P '^\?\K\w+' > 
> .git/missing.list
> 
> I had asked how to properly fetch objects from the list above before on
> this list, and brian m. carlson (CC'ed) suggested [2] that I should
> use xargs:
> 
> $ xargs git fetch origin < .git/missing.list
> 
> I expected that I received all missing objects. However, the error message
> I got was something like below, repeated:
> 
>> remote: ...<skipped>
>> Receiving objects: 100% (64/64), 154.49 KiB | 2.97 MiB/s, done.
>> remote: ...<skipped>
>> Receiving objects: 100% (37/37), 168.35 KiB | 4.95 MiB/s, done.
>> Resolving deltas: 100% (5/5), done.
>> Resolving deltas: 100% (49/49), completed with 47 local objects.
>> fatal: bad revision 'd5e9cd36ab21839af3d116eff3221c53f6ca7fd6'
>> error: https://<myhost>/bagas/gitea.git did not send all necessary 
>> objects
> 
> Then I repeated the object list gathering and fetching above, and the error
> still occured. Even I tried fetching only one of problematic object above
> and still errored.
> 
> When I inspected that object with `git cat-file -p` from my other, full
> clone version (which I used for submitting changes upstream), the object
> is properly displayed (in this case as source code in Go).
> 
> Note that I have uploadpack.allowfilter=true config on the server.
> 
> Am I missing something?
> 
> [1]: https://github.com/go-gitea/gitea
> [2]: 
> https://lore.kernel.org/git/YD7bczBsIR5rkqfc@camp.crustytoothpaste.net/
> 

I'm not sure why you're getting that error, but you might
also try enabling `uploadpack.allowAnySHA1InWant`
(or one of the other "allow*SHS1InWant" values) and see if
that helps.

https://git-scm.com/docs/git-config#Documentation/git-config.txt-uploadpackallowAnySHA1InWant

Jeff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: "bad revision" fetch error when fetching missing objects from partial clones
  2021-05-11 18:44 ` Jeff Hostetler
@ 2021-05-13  9:57   ` Jeff King
  2021-05-13 10:53     ` Jeff King
  2021-05-15  6:52     ` Bagas Sanjaya
  0 siblings, 2 replies; 12+ messages in thread
From: Jeff King @ 2021-05-13  9:57 UTC (permalink / raw)
  To: Jeff Hostetler; +Cc: Taylor Blau, Bagas Sanjaya, Git Users, brian m . carlson

On Tue, May 11, 2021 at 02:44:52PM -0400, Jeff Hostetler wrote:

> > > remote: ...<skipped>
> > > Receiving objects: 100% (64/64), 154.49 KiB | 2.97 MiB/s, done.
> > > remote: ...<skipped>
> > > Receiving objects: 100% (37/37), 168.35 KiB | 4.95 MiB/s, done.
> > > Resolving deltas: 100% (5/5), done.
> > > Resolving deltas: 100% (49/49), completed with 47 local objects.
> > > fatal: bad revision 'd5e9cd36ab21839af3d116eff3221c53f6ca7fd6'
> > > error: https://<myhost>/bagas/gitea.git did not send all necessary
> > > objects
> [...]
> 
> I'm not sure why you're getting that error, but you might
> also try enabling `uploadpack.allowAnySHA1InWant`
> (or one of the other "allow*SHS1InWant" values) and see if
> that helps.

That shouldn't be necessary these days, since the v2 protocol allows
arbitrary objects to be fetched.

I think it is actually a bug with pack-objects not sending the object,
but it only seems to trigger with bitmaps. This works:

  git init repo
  cd repo

  echo content >file
  git add file
  git commit -m base

  git config uploadpack.allowfilter true
  git clone --no-local --bare --filter=blob:none . clone

  cd clone
  git fetch origin $(git rev-parse HEAD:file)

But if I add a "git repack -adb" in the parent repository before the "cd
clone", then I get:

  remote: Total 0 (delta 0), reused 0 (delta 0), pack-reused 0
  fatal: bad revision 'd95f3ad14dee633a758d2e331151e950dd13e4ed'
  error: /home/peff/tmp/repo/. did not send all necessary objects

So presumably this is a bug in the bitmap-aware filtering code that is
not present in the regular filter-traversing code. But what really
puzzles me is that the result seems totally broken. Yet the test
coverage in t5310 passes, and nobody has noticed on major sites like
GitHub (which supports partial clones and most certainly has bitmaps
enabled).

So I think it will require some digging. My _guess_ is that it has to do
with the "never filter out an object that was explicitly requested" rule
not being consistently followed. But we'll see.

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: "bad revision" fetch error when fetching missing objects from partial clones
  2021-05-13  9:57   ` Jeff King
@ 2021-05-13 10:53     ` Jeff King
  2021-05-14  7:27       ` Jeff King
  2021-05-15  6:52     ` Bagas Sanjaya
  1 sibling, 1 reply; 12+ messages in thread
From: Jeff King @ 2021-05-13 10:53 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: Patrick Steinhardt, Taylor Blau, Bagas Sanjaya, Git Users,
	brian m . carlson

[+cc Patrick, who also worked in this area recently and is probably
interested to hear about this bug]

On Thu, May 13, 2021 at 05:57:57AM -0400, Jeff King wrote:

> I think it is actually a bug with pack-objects not sending the object,
> but it only seems to trigger with bitmaps. This works:
> 
>   git init repo
>   cd repo
> 
>   echo content >file
>   git add file
>   git commit -m base
> 
>   git config uploadpack.allowfilter true
>   git clone --no-local --bare --filter=blob:none . clone
> 
>   cd clone
>   git fetch origin $(git rev-parse HEAD:file)
> 
> But if I add a "git repack -adb" in the parent repository before the "cd
> clone", then I get:
> 
>   remote: Total 0 (delta 0), reused 0 (delta 0), pack-reused 0
>   fatal: bad revision 'd95f3ad14dee633a758d2e331151e950dd13e4ed'
>   error: /home/peff/tmp/repo/. did not send all necessary objects
> 
> So presumably this is a bug in the bitmap-aware filtering code that is
> not present in the regular filter-traversing code. But what really
> puzzles me is that the result seems totally broken. Yet the test
> coverage in t5310 passes, and nobody has noticed on major sites like
> GitHub (which supports partial clones and most certainly has bitmaps
> enabled).
> 
> So I think it will require some digging. My _guess_ is that it has to do
> with the "never filter out an object that was explicitly requested" rule
> not being consistently followed. But we'll see.

OK, I think I've unraveled the mysteries.

It is indeed a problem with the "never filter out an object that was
explicitly requested" rule. But really, that rule is stronger: it is
"always include an explicitly requested object". I.e., it must override
even the usual "you said commit X was uninteresting, so we did not
include objects reachable from X" logic.

And that is what was happening here. In that final fetch, the client
says: "I have the commit pointed to by HEAD, but I need the blob from
HEAD:file". So the server's bitmap traversal, even before we hit the
blob:none filtering, has cleared the bit for HEAD:file's object! So
there is nothing to filter, but the filtering code must _add it back in_
to the result.

And indeed, if we are adding it back in anyway, we do not need to
exclude it when filtering in the first place. So I think the fix looks
something like this:

diff --git a/pack-bitmap.c b/pack-bitmap.c
index d90e1d9d8c..8458eedbdb 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -797,8 +797,6 @@ static void filter_bitmap_exclude_type(struct bitmap_index *bitmap_git,
 	for (i = 0, init_type_iterator(&it, bitmap_git, type);
 	     i < to_filter->word_alloc && ewah_iterator_next(&mask, &it);
 	     i++) {
-		if (i < tips->word_alloc)
-			mask &= ~tips->words[i];
 		to_filter->words[i] &= ~mask;
 	}
 
@@ -810,11 +808,18 @@ static void filter_bitmap_exclude_type(struct bitmap_index *bitmap_git,
 	for (i = 0; i < eindex->count; i++) {
 		uint32_t pos = i + bitmap_git->pack->num_objects;
 		if (eindex->objects[i]->type == type &&
-		    bitmap_get(to_filter, pos) &&
-		    !bitmap_get(tips, pos))
+		    bitmap_get(to_filter, pos))
 			bitmap_unset(to_filter, pos);
 	}
 
+	/*
+	 * Add back in any bits that were explicitly asked for (we may have
+	 * cleared them as part of our filtering just now, or they may
+	 * have been dropped by because they were reachable from the "have"
+	 * side of the bitmap).
+	 */
+	bitmap_or(to_filter, tips);
+
 	bitmap_free(tips);
 }
 
diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index b02838750e..d34f086228 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -214,6 +214,16 @@ test_expect_success 'partial clone from bitmapped repository' '
 	)
 '
 
+test_expect_success 'follow-up fetch from bitmapped repository' '
+	test_config uploadpack.allowfilter true &&
+	(
+		cd partial-clone.git &&
+		blob=$(git rev-parse HEAD:file-1.t) &&
+		git fetch origin "$blob" &&
+		git cat-file -t "$blob"
+	)
+'
+
 test_expect_success 'setup further non-bitmapped commits' '
 	test_commit_bulk --id=further 10
 '

That fixes the example I showed above, as well as the included test in
t5310. Some follow-on thoughts:

  - this covers blob:none, and indeed any type-exclusion filter, as they
    share this code. But we probably need a similar fix for other
    filters (like the blob-limit filter). I think this could even be
    hoisted out to the filter_bitmap() function, and then we only have
    to do it in one place.

  - alternatively, the actual bitmap traversal could be smart enough to
    say "this is an explicitly-requested tip object; make sure we
    _don't_ mark it as uninteresting". Which is exactly what the
    non-bitmap traversal code does (I think this is the USER_GIVEN
    stuff). And we even try to build on that code by passing the
    list_objects_filter struct along, but:

      - that is only for fill-in traversal; an on-disk bitmap means we
	might touch those bits without calling into the list-objects
	code at all.

      - we don't feed the list-objects code all the information it needs
	anyway. We do two separate traversals, one for the "haves"
	bitmap and one for the "wants", and then AND-NOT the former into
	the latter. So when we do that "haves" traversal, those objects
	are _not_ being marked as UNINTERESTING. They are considered
	positive tips from the perspective of the traversal code.

     So I think we really do need to manipulate the bits in the bitmap
     to get the right answer. But it seems like there's a big
     optimization opportunity we're missing here. If the other side says
     "I don't have this commit, but I need this blob" (as our examples
     do), then there is no point in figuring out the "haves" bitmap at
     all! We know that it will not be used. In practice I'm not sure how
     big a deal this is (if the other side asked for any commit, we'd
     need it, and in theory it's quick to compute because of the on-disk
     bitmaps). At any rate, I think we should focus on correctness first
     here.

  - as to the mystery of why nobody has noticed this breakage: it's
    because the usual code to fetch-on-demand an object that we are
    missing _doesn't_ send any other "we already have this commit" tips.
    It just says "hey, give me these objects". So a workaround in the
    meantime is:

      git -c fetch.negotiationAlgorithm=noop fetch origin ...

I'm going to sleep on it, but I think the fix above is the right
direction. I just need to look at the other filters, and add a bit of
polish.

-Peff

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: "bad revision" fetch error when fetching missing objects from partial clones
  2021-05-13 10:53     ` Jeff King
@ 2021-05-14  7:27       ` Jeff King
  2021-05-17  6:03         ` Patrick Steinhardt
  2021-05-17 16:25         ` Derrick Stolee
  0 siblings, 2 replies; 12+ messages in thread
From: Jeff King @ 2021-05-14  7:27 UTC (permalink / raw)
  To: Jeff Hostetler
  Cc: Jonathan Tan, Patrick Steinhardt, Taylor Blau, Bagas Sanjaya,
	Git Users, brian m . carlson

[+cc Jonathan Tan for partial clone wisdom]

On Thu, May 13, 2021 at 06:53:36AM -0400, Jeff King wrote:

> > So I think it will require some digging. My _guess_ is that it has to do
> > with the "never filter out an object that was explicitly requested" rule
> > not being consistently followed. But we'll see.
> 
> OK, I think I've unraveled the mysteries.

Nope. This part is wrong:

> It is indeed a problem with the "never filter out an object that was
> explicitly requested" rule. But really, that rule is stronger: it is
> "always include an explicitly requested object". I.e., it must override
> even the usual "you said commit X was uninteresting, so we did not
> include objects reachable from X" logic.

The rule really is the softer "don't filter out explicitly-requested
objects". It's just that the non-bitmap traversal code is way less
careful about finding uninteresting objects.

Here's the same scenario failing without using bitmaps at all:

  # same as before; repo with one commit
  git init repo
  cd repo
  
  echo content >file
  git add file
  git commit -m base

  # and now we make a partial clone omitting the blob
  git config uploadpack.allowfilter true
  git clone --no-local --bare --filter=blob:none . clone

  # but here's the twist. Now we make a second commit...
  echo more >file
  git commit -am more

  # ...and then we fetch both the object we need _and_ that second
  # commit. That causes pack-objects to traverse from base..more.
  # The boundary is at "base", so we mark its tree and blob as
  # UNINTERESTING, and thus we _don't_ send them.
  cd clone
  git fetch origin $(git rev-parse HEAD:file) HEAD

So I guess the first question is: is this supposed to work? Without
bitmaps, it often will. Because we walk commits first, and then only
mark trees uninteresting at the boundary; so if there were more commits
here, and we were asking to get a blob from one of the middle ones, it
would probably work. But fundamentally the client is lying to the server
here (as all partial clones must); it is saying "I have that first
commit", but of course we don't have all of the reachable objects.

If this is supposed to work, I think we need to teach the traversal code
to "add back" all of the objects that were explicitly given when a
filter is in use (either explicitly, or perhaps just clearing or
avoiding the UNINTERESTING flag on user-given objects in the first
place). And my earlier patch does that for the bitmap side, but not the
regular traversal.

Alternatively, part of that fundamental "partial clones are lying about
their reachability" property is that we assume they can fetch the
objects they need on-demand. And indeed, the on-demand fetch we'd do
will use the noop negotiation algorithm, and will succeed. So should we,
after receiving an empty pack from the other side (because we claimed to
have objects reachable from the commit), then do a follow-up on-demand
fetch? If so, then why isn't that kicking in (I can guess there might be
some limits to on-demand fetching during a fetch itself, in order to
avoid infinite recursion)?

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: "bad revision" fetch error when fetching missing objects from partial clones
  2021-05-13  9:57   ` Jeff King
  2021-05-13 10:53     ` Jeff King
@ 2021-05-15  6:52     ` Bagas Sanjaya
  1 sibling, 0 replies; 12+ messages in thread
From: Bagas Sanjaya @ 2021-05-15  6:52 UTC (permalink / raw)
  To: Jeff King, Jeff Hostetler; +Cc: Taylor Blau, Git Users, brian m . carlson

On 13/05/21 16.57, Jeff King wrote:
> I think it is actually a bug with pack-objects not sending the object,
> but it only seems to trigger with bitmaps. This works:
> 
>    git init repo
>    cd repo
> 
>    echo content >file
>    git add file
>    git commit -m base
> 
>    git config uploadpack.allowfilter true
>    git clone --no-local --bare --filter=blob:none . clone
> 
>    cd clone
>    git fetch origin $(git rev-parse HEAD:file)
> 
> But if I add a "git repack -adb" in the parent repository before the "cd
> clone", then I get:
> 
>    remote: Total 0 (delta 0), reused 0 (delta 0), pack-reused 0
>    fatal: bad revision 'd95f3ad14dee633a758d2e331151e950dd13e4ed'
>    error: /home/peff/tmp/repo/. did not send all necessary objects
> 
> So presumably this is a bug in the bitmap-aware filtering code that is
> not present in the regular filter-traversing code. But what really
> puzzles me is that the result seems totally broken. Yet the test
> coverage in t5310 passes, and nobody has noticed on major sites like
> GitHub (which supports partial clones and most certainly has bitmaps
> enabled).

I can reproduce using your test case above, thanks.

Anyway, for the moment being, I clone using "git clone --depth <N>" and
then deepen the clone by "git fetch --deepen <N>" iteratively until
complete. I do that because of quota constraints on my network.

-- 
An old man doll... just what I always wanted! - Clara

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: "bad revision" fetch error when fetching missing objects from partial clones
  2021-05-14  7:27       ` Jeff King
@ 2021-05-17  6:03         ` Patrick Steinhardt
  2021-05-17  6:31           ` Jeff King
  2021-05-17 16:25         ` Derrick Stolee
  1 sibling, 1 reply; 12+ messages in thread
From: Patrick Steinhardt @ 2021-05-17  6:03 UTC (permalink / raw)
  To: Jeff King
  Cc: Jeff Hostetler, Jonathan Tan, Taylor Blau, Bagas Sanjaya,
	Git Users, brian m . carlson

[-- Attachment #1: Type: text/plain, Size: 4436 bytes --]

On Fri, May 14, 2021 at 03:27:12AM -0400, Jeff King wrote:
> [+cc Jonathan Tan for partial clone wisdom]
> 
> On Thu, May 13, 2021 at 06:53:36AM -0400, Jeff King wrote:
> 
> > > So I think it will require some digging. My _guess_ is that it has to do
> > > with the "never filter out an object that was explicitly requested" rule
> > > not being consistently followed. But we'll see.
> > 
> > OK, I think I've unraveled the mysteries.
> 
> Nope. This part is wrong:
> 
> > It is indeed a problem with the "never filter out an object that was
> > explicitly requested" rule. But really, that rule is stronger: it is
> > "always include an explicitly requested object". I.e., it must override
> > even the usual "you said commit X was uninteresting, so we did not
> > include objects reachable from X" logic.
> 
> The rule really is the softer "don't filter out explicitly-requested
> objects". It's just that the non-bitmap traversal code is way less
> careful about finding uninteresting objects.
> 
> Here's the same scenario failing without using bitmaps at all:
> 
>   # same as before; repo with one commit
>   git init repo
>   cd repo
>   
>   echo content >file
>   git add file
>   git commit -m base
> 
>   # and now we make a partial clone omitting the blob
>   git config uploadpack.allowfilter true
>   git clone --no-local --bare --filter=blob:none . clone
> 
>   # but here's the twist. Now we make a second commit...
>   echo more >file
>   git commit -am more
> 
>   # ...and then we fetch both the object we need _and_ that second
>   # commit. That causes pack-objects to traverse from base..more.
>   # The boundary is at "base", so we mark its tree and blob as
>   # UNINTERESTING, and thus we _don't_ send them.
>   cd clone
>   git fetch origin $(git rev-parse HEAD:file) HEAD
> 
> So I guess the first question is: is this supposed to work? Without
> bitmaps, it often will. Because we walk commits first, and then only
> mark trees uninteresting at the boundary; so if there were more commits
> here, and we were asking to get a blob from one of the middle ones, it
> would probably work. But fundamentally the client is lying to the server
> here (as all partial clones must); it is saying "I have that first
> commit", but of course we don't have all of the reachable objects.

I'm not really in a position to jugde about partial clones given that I
got next to no experience with them, so I'm going to skip commenting on
this part.

> If this is supposed to work, I think we need to teach the traversal code
> to "add back" all of the objects that were explicitly given when a
> filter is in use (either explicitly, or perhaps just clearing or
> avoiding the UNINTERESTING flag on user-given objects in the first
> place). And my earlier patch does that for the bitmap side, but not the
> regular traversal.

I think adding these objects back in after computations is the easiest
thing we can do here. For bitmaps, it should be relatively easy to do.
The only thing I wonder about in your patch is whether we should really
do this in a specific filter, or if we can't just do it at the end after
all filters have been computed.

For the traversal-based code, it feels like the current design with the
`NOT_USER_GIVEN` flag is making this harder than necessary. I really
wonder whether we should again refactor the code to use a positive
`USER_GIVEN` flag again: it's much more explicit, there's only one site
where we need to add it and ultimately it's a lot easier to reason
about. Furthermore, we can easily avoid marking such an object as
`UNINTERESTING` if we see that it was explicitly added to the set while
still marking its referenced objects as `UNINTERESTING` if need be.

Patrick

> Alternatively, part of that fundamental "partial clones are lying about
> their reachability" property is that we assume they can fetch the
> objects they need on-demand. And indeed, the on-demand fetch we'd do
> will use the noop negotiation algorithm, and will succeed. So should we,
> after receiving an empty pack from the other side (because we claimed to
> have objects reachable from the commit), then do a follow-up on-demand
> fetch? If so, then why isn't that kicking in (I can guess there might be
> some limits to on-demand fetching during a fetch itself, in order to
> avoid infinite recursion)?
> 
> -Peff

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: "bad revision" fetch error when fetching missing objects from partial clones
  2021-05-17  6:03         ` Patrick Steinhardt
@ 2021-05-17  6:31           ` Jeff King
  0 siblings, 0 replies; 12+ messages in thread
From: Jeff King @ 2021-05-17  6:31 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: Jeff Hostetler, Jonathan Tan, Taylor Blau, Bagas Sanjaya,
	Git Users, brian m . carlson

On Mon, May 17, 2021 at 08:03:22AM +0200, Patrick Steinhardt wrote:

> > If this is supposed to work, I think we need to teach the traversal code
> > to "add back" all of the objects that were explicitly given when a
> > filter is in use (either explicitly, or perhaps just clearing or
> > avoiding the UNINTERESTING flag on user-given objects in the first
> > place). And my earlier patch does that for the bitmap side, but not the
> > regular traversal.
> 
> I think adding these objects back in after computations is the easiest
> thing we can do here. For bitmaps, it should be relatively easy to do.
> The only thing I wonder about in your patch is whether we should really
> do this in a specific filter, or if we can't just do it at the end after
> all filters have been computed.

We definitely should do it at the end for all filters; my patch was just
illustrating the fix for this specific problem. It was actually while
cleaning it up to work on all filters that I dug further and found the
non-bitmap behavior.

> For the traversal-based code, it feels like the current design with the
> `NOT_USER_GIVEN` flag is making this harder than necessary. I really
> wonder whether we should again refactor the code to use a positive
> `USER_GIVEN` flag again: it's much more explicit, there's only one site
> where we need to add it and ultimately it's a lot easier to reason
> about. Furthermore, we can easily avoid marking such an object as
> `UNINTERESTING` if we see that it was explicitly added to the set while
> still marking its referenced objects as `UNINTERESTING` if need be.

That's my gut feeling, too. But I'm a little afraid that it would be
opening up a can of worms that we attempted to fix with 99c9aa9579
(revision: mark non-user-given objects instead, 2018-10-05) in the first
place.

For the purposes of this particular fix, though, we might be able to
skip this question either way with the "add back in" strategy. The
objects found in revs->pending right before prepare_revision_walk() make
up the "user given" list, so we could possibly just make us of that
(possibly stashing it away as necessary).

Thanks for the response. I'll be curious to hear the thoughts of folks
with Jonathan Tan who were involved in the original partial-clone design
(i.e., is this behavior of filters a surprise to them, or just a subtle
corner of the initial design).

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: "bad revision" fetch error when fetching missing objects from partial clones
  2021-05-14  7:27       ` Jeff King
  2021-05-17  6:03         ` Patrick Steinhardt
@ 2021-05-17 16:25         ` Derrick Stolee
  2021-05-18  8:11           ` Jeff King
  1 sibling, 1 reply; 12+ messages in thread
From: Derrick Stolee @ 2021-05-17 16:25 UTC (permalink / raw)
  To: Jeff King, Jeff Hostetler
  Cc: Jonathan Tan, Patrick Steinhardt, Taylor Blau, Bagas Sanjaya,
	Git Users, brian m . carlson

On 5/14/2021 3:27 AM, Jeff King wrote:
> [+cc Jonathan Tan for partial clone wisdom]
> 
> On Thu, May 13, 2021 at 06:53:36AM -0400, Jeff King wrote:
> 
>>> So I think it will require some digging. My _guess_ is that it has to do
>>> with the "never filter out an object that was explicitly requested" rule
>>> not being consistently followed. But we'll see.
>>
>> OK, I think I've unraveled the mysteries.
> 
> Nope. This part is wrong:
> 
>> It is indeed a problem with the "never filter out an object that was
>> explicitly requested" rule. But really, that rule is stronger: it is
>> "always include an explicitly requested object". I.e., it must override
>> even the usual "you said commit X was uninteresting, so we did not
>> include objects reachable from X" logic.
> 
> The rule really is the softer "don't filter out explicitly-requested
> objects". It's just that the non-bitmap traversal code is way less
> careful about finding uninteresting objects.
> 
> Here's the same scenario failing without using bitmaps at all:
> 
...
>   # ...and then we fetch both the object we need _and_ that second
>   # commit. That causes pack-objects to traverse from base..more.
>   # The boundary is at "base", so we mark its tree and blob as
>   # UNINTERESTING, and thus we _don't_ send them.
>   cd clone
>   git fetch origin $(git rev-parse HEAD:file) HEAD

This is the critical reason why this isn't failing in practice: it
is very rare to ask for a commit and a blob at the same time. Usually,
a blob request comes when something like 'git checkout' is actually
navigating to a commit, and then it only asks for missing blobs.

> So I guess the first question is: is this supposed to work? Without
> bitmaps, it often will. Because we walk commits first, and then only
> mark trees uninteresting at the boundary; so if there were more commits
> here, and we were asking to get a blob from one of the middle ones, it
> would probably work. But fundamentally the client is lying to the server
> here (as all partial clones must); it is saying "I have that first
> commit", but of course we don't have all of the reachable objects.

It _should_ work. We should be specifying the blob:none filter, so when
we say "we have these commits" it should apply that filter to those
commits for the "haves".

If there was room to adjust the protocol more, it would perhaps be
helpful to annotate the haves in this way ("have-filtered" or something,
similar to how the "shallow" advertisement works).

> If this is supposed to work, I think we need to teach the traversal code
> to "add back" all of the objects that were explicitly given when a
> filter is in use (either explicitly, or perhaps just clearing or
> avoiding the UNINTERESTING flag on user-given objects in the first
> place). And my earlier patch does that for the bitmap side, but not the
> regular traversal.

This gets more complicated if we were in a treeless clone, for example.
We could explicitly ask for a tree and really need all of its reachable
trees and blobs. It's not enough to just isolate that single object.

I wouldn't spend too much time optimizing for the treeless clone case,
as I believe the client will self-recover by asking for those reachable
trees when it walks to them.

> Alternatively, part of that fundamental "partial clones are lying about
> their reachability" property is that we assume they can fetch the
> objects they need on-demand. And indeed, the on-demand fetch we'd do
> will use the noop negotiation algorithm, and will succeed. So should we,
> after receiving an empty pack from the other side (because we claimed to
> have objects reachable from the commit), then do a follow-up on-demand
> fetch? If so, then why isn't that kicking in (I can guess there might be
> some limits to on-demand fetching during a fetch itself, in order to
> avoid infinite recursion)?

The client has an opportunity to be more robust here, especially in
the case of a promisor remote. A simple retry would be a good band-aid
for now.

A more involved case might be to isolate objects that we have
identified as reachable from a local object, and set those aside
for a separate request. Here, we could ask for the blob in a
different request than for the remote's HEAD reference.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: "bad revision" fetch error when fetching missing objects from partial clones
  2021-05-17 16:25         ` Derrick Stolee
@ 2021-05-18  8:11           ` Jeff King
  2021-05-18 10:14             ` Bagas Sanjaya
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff King @ 2021-05-18  8:11 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Jeff Hostetler, Jonathan Tan, Patrick Steinhardt, Taylor Blau,
	Bagas Sanjaya, Git Users, brian m . carlson

On Mon, May 17, 2021 at 12:25:02PM -0400, Derrick Stolee wrote:

> > Here's the same scenario failing without using bitmaps at all:
> > 
> ...
> >   # ...and then we fetch both the object we need _and_ that second
> >   # commit. That causes pack-objects to traverse from base..more.
> >   # The boundary is at "base", so we mark its tree and blob as
> >   # UNINTERESTING, and thus we _don't_ send them.
> >   cd clone
> >   git fetch origin $(git rev-parse HEAD:file) HEAD
> 
> This is the critical reason why this isn't failing in practice: it
> is very rare to ask for a commit and a blob at the same time. Usually,
> a blob request comes when something like 'git checkout' is actually
> navigating to a commit, and then it only asks for missing blobs.

Yes, I think that may be part of it. But more fundamentally, the request
done by "git checkout" does not send any "have" lines at all, so it
could never trigger this bug, even if it did try to ask for a commit.

> > So I guess the first question is: is this supposed to work? Without
> > bitmaps, it often will. Because we walk commits first, and then only
> > mark trees uninteresting at the boundary; so if there were more commits
> > here, and we were asking to get a blob from one of the middle ones, it
> > would probably work. But fundamentally the client is lying to the server
> > here (as all partial clones must); it is saying "I have that first
> > commit", but of course we don't have all of the reachable objects.
> 
> It _should_ work. We should be specifying the blob:none filter, so when
> we say "we have these commits" it should apply that filter to those
> commits for the "haves".

I guess my "should it work" was more about: are filter requests that
feed arbitrary combinations to git-fetch, including "haves", something
we want to support?

I think the world is a better place if we do. But what I'm wondering is
if there was some intentional choice to avoid triggering this
(especially in the way that the on-demand fetcher uses the "noop"
negotiation algorithm).

And yes, we do specify the blob:none filter. BTW, I saw an interesting
side-behavior here.  If the server _stops_ supporting filters and then
we try to fetch with it, we will happily say "oh well, the other side
doesn't support filters, so I won't bother sending my filter spec". And
then all hell breaks loose, as the other side has no clue that we are a
partial clone.

In practice I think it's an unlikely failure mode for a server you
partial-cloned from to turn off filters, so it's probably not that
important. I hit it because a test script used test_config to enable
them, and then the follow-on test I added to run git-fetch got quite
confused. A more likely scenario is that you might see it a
misconfigured load-balanced pool of servers.

I do wonder how hitting a third-party server should work, though. E.g.,
I partial clone from A, and then ask B to fetch some related history
built on top. Do I tell B that I'm a partial clone and might be missing
some objects? Or do I behave as normal, and expect to fault in objects
that it assumes I have (e.g., a delta base)? And if the latter, does
that work (if it does, then why doesn't the same logic kick in for this
fetch?).

Anyway, that's maybe orthogonal to the bug at hand (and my questions
above are all sincere; it might well work just fine, but I haven't dug
into it further).

> > If this is supposed to work, I think we need to teach the traversal code
> > to "add back" all of the objects that were explicitly given when a
> > filter is in use (either explicitly, or perhaps just clearing or
> > avoiding the UNINTERESTING flag on user-given objects in the first
> > place). And my earlier patch does that for the bitmap side, but not the
> > regular traversal.
> 
> This gets more complicated if we were in a treeless clone, for example.
> We could explicitly ask for a tree and really need all of its reachable
> trees and blobs. It's not enough to just isolate that single object.

Good point. That really requires walking any objects that were listed to
"add back", but _only_ the ones that would have been filtered. In the
treeless clone case, that's easier. But if I say "don't give me any
trees deeper than X", how do I even know which ones those are?

> I wouldn't spend too much time optimizing for the treeless clone case,
> as I believe the client will self-recover by asking for those reachable
> trees when it walks to them.

Yes, I think as long as the on-demand fetch kicks in, then it becomes an
optimization problem, not a correctness one. So perhaps the first fix
should focus on that, even for the blob case being discussed. Then it
_works_, just with an extra round-trip for the on-demand fetch.

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: "bad revision" fetch error when fetching missing objects from partial clones
  2021-05-18  8:11           ` Jeff King
@ 2021-05-18 10:14             ` Bagas Sanjaya
  2021-05-18 14:04               ` Derrick Stolee
  0 siblings, 1 reply; 12+ messages in thread
From: Bagas Sanjaya @ 2021-05-18 10:14 UTC (permalink / raw)
  To: Jeff King, Derrick Stolee
  Cc: Jeff Hostetler, Jonathan Tan, Patrick Steinhardt, Taylor Blau,
	Git Users, brian m . carlson

On 18/05/21 15.11, Jeff King wrote:
> In practice I think it's an unlikely failure mode for a server you
> partial-cloned from to turn off filters, so it's probably not that
> important. I hit it because a test script used test_config to enable
> them, and then the follow-on test I added to run git-fetch got quite
> confused. A more likely scenario is that you might see it a
> misconfigured load-balanced pool of servers.
> 
> I do wonder how hitting a third-party server should work, though. E.g.,
> I partial clone from A, and then ask B to fetch some related history
> built on top. Do I tell B that I'm a partial clone and might be missing
> some objects? Or do I behave as normal, and expect to fault in objects
> that it assumes I have (e.g., a delta base)? And if the latter, does
> that work (if it does, then why doesn't the same logic kick in for this
> fetch?).
> 

My server is running Gitea (compiled from main branch [1]). The server
itself runs Git 2.31.1. I can also reproduce the issue using Jeff's
test case [2] without Gitea. So this is not Gitea's issue, this is Git's
issue.

Even my server setup is just application server + separate database
server and not load-balanced pool ones.

[1]: https://github.com/go-gitea/gitea/tree/main
[2]: https://lore.kernel.org/git/YJz4JTsFjTtL7mE2@coredump.intra.peff.net/

-- 
An old man doll... just what I always wanted! - Clara

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: "bad revision" fetch error when fetching missing objects from partial clones
  2021-05-18 10:14             ` Bagas Sanjaya
@ 2021-05-18 14:04               ` Derrick Stolee
  0 siblings, 0 replies; 12+ messages in thread
From: Derrick Stolee @ 2021-05-18 14:04 UTC (permalink / raw)
  To: Bagas Sanjaya, Jeff King
  Cc: Jeff Hostetler, Jonathan Tan, Patrick Steinhardt, Taylor Blau,
	Git Users, brian m . carlson

On 5/18/2021 6:14 AM, Bagas Sanjaya wrote:
> On 18/05/21 15.11, Jeff King wrote:
>> In practice I think it's an unlikely failure mode for a server you
>> partial-cloned from to turn off filters, so it's probably not that
>> important. I hit it because a test script used test_config to enable
>> them, and then the follow-on test I added to run git-fetch got quite
>> confused. A more likely scenario is that you might see it a
>> misconfigured load-balanced pool of servers.
>>
>> I do wonder how hitting a third-party server should work, though. E.g.,
>> I partial clone from A, and then ask B to fetch some related history
>> built on top. Do I tell B that I'm a partial clone and might be missing
>> some objects? Or do I behave as normal, and expect to fault in objects
>> that it assumes I have (e.g., a delta base)? And if the latter, does
>> that work (if it does, then why doesn't the same logic kick in for this
>> fetch?).
>>
> 
> My server is running Gitea (compiled from main branch [1]). The server
> itself runs Git 2.31.1. I can also reproduce the issue using Jeff's
> test case [2] without Gitea. So this is not Gitea's issue, this is Git's
> issue.
> 
> Even my server setup is just application server + separate database
> server and not load-balanced pool ones.

I think the "third party" thing is mostly about if we have forks
hosted in different places. Imagine that git/git is hosted on
GitHub with partial clone enabled, but git-for-windows/git was
hosted on Azure Repos, which doesn't have partial clone.

What happens when we partial clone git/git from GitHub, but then
add git-for-windows/git as a remote and fetch from it?

This is a separate discussion. I think it really points out the
issues that arise because partial clone makes the remote a more
critical resource. This removes some decentralization from Git,
as requested by the user. These concerns about the remote changing
the availability of filters or about other remotes hosted
elsewhere are unlikely to come up in a centralized workflow (i.e.
a set of users working against a common remote without forks).

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-05-18 14:04 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-07 12:56 "bad revision" fetch error when fetching missing objects from partial clones Bagas Sanjaya
2021-05-11 18:44 ` Jeff Hostetler
2021-05-13  9:57   ` Jeff King
2021-05-13 10:53     ` Jeff King
2021-05-14  7:27       ` Jeff King
2021-05-17  6:03         ` Patrick Steinhardt
2021-05-17  6:31           ` Jeff King
2021-05-17 16:25         ` Derrick Stolee
2021-05-18  8:11           ` Jeff King
2021-05-18 10:14             ` Bagas Sanjaya
2021-05-18 14:04               ` Derrick Stolee
2021-05-15  6:52     ` Bagas Sanjaya

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.