git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Jeff King <peff@peff.net>
Cc: Jeff Hostetler <git@jeffhostetler.com>,
	Jonathan Tan <jonathantanmy@google.com>,
	Taylor Blau <me@ttaylorr.com>,
	Bagas Sanjaya <bagasdotme@gmail.com>,
	Git Users <git@vger.kernel.org>,
	"brian m . carlson" <sandals@crustytoothpaste.net>
Subject: Re: "bad revision" fetch error when fetching missing objects from partial clones
Date: Mon, 17 May 2021 08:03:22 +0200	[thread overview]
Message-ID: <YKIHKjs7n3v4vwNt@ncase> (raw)
In-Reply-To: <YJ4mUJ+EEAnudI3G@coredump.intra.peff.net>

[-- Attachment #1: Type: text/plain, Size: 4436 bytes --]

On Fri, May 14, 2021 at 03:27:12AM -0400, Jeff King wrote:
> [+cc Jonathan Tan for partial clone wisdom]
> 
> On Thu, May 13, 2021 at 06:53:36AM -0400, Jeff King wrote:
> 
> > > So I think it will require some digging. My _guess_ is that it has to do
> > > with the "never filter out an object that was explicitly requested" rule
> > > not being consistently followed. But we'll see.
> > 
> > OK, I think I've unraveled the mysteries.
> 
> Nope. This part is wrong:
> 
> > It is indeed a problem with the "never filter out an object that was
> > explicitly requested" rule. But really, that rule is stronger: it is
> > "always include an explicitly requested object". I.e., it must override
> > even the usual "you said commit X was uninteresting, so we did not
> > include objects reachable from X" logic.
> 
> The rule really is the softer "don't filter out explicitly-requested
> objects". It's just that the non-bitmap traversal code is way less
> careful about finding uninteresting objects.
> 
> Here's the same scenario failing without using bitmaps at all:
> 
>   # same as before; repo with one commit
>   git init repo
>   cd repo
>   
>   echo content >file
>   git add file
>   git commit -m base
> 
>   # and now we make a partial clone omitting the blob
>   git config uploadpack.allowfilter true
>   git clone --no-local --bare --filter=blob:none . clone
> 
>   # but here's the twist. Now we make a second commit...
>   echo more >file
>   git commit -am more
> 
>   # ...and then we fetch both the object we need _and_ that second
>   # commit. That causes pack-objects to traverse from base..more.
>   # The boundary is at "base", so we mark its tree and blob as
>   # UNINTERESTING, and thus we _don't_ send them.
>   cd clone
>   git fetch origin $(git rev-parse HEAD:file) HEAD
> 
> So I guess the first question is: is this supposed to work? Without
> bitmaps, it often will. Because we walk commits first, and then only
> mark trees uninteresting at the boundary; so if there were more commits
> here, and we were asking to get a blob from one of the middle ones, it
> would probably work. But fundamentally the client is lying to the server
> here (as all partial clones must); it is saying "I have that first
> commit", but of course we don't have all of the reachable objects.

I'm not really in a position to jugde about partial clones given that I
got next to no experience with them, so I'm going to skip commenting on
this part.

> If this is supposed to work, I think we need to teach the traversal code
> to "add back" all of the objects that were explicitly given when a
> filter is in use (either explicitly, or perhaps just clearing or
> avoiding the UNINTERESTING flag on user-given objects in the first
> place). And my earlier patch does that for the bitmap side, but not the
> regular traversal.

I think adding these objects back in after computations is the easiest
thing we can do here. For bitmaps, it should be relatively easy to do.
The only thing I wonder about in your patch is whether we should really
do this in a specific filter, or if we can't just do it at the end after
all filters have been computed.

For the traversal-based code, it feels like the current design with the
`NOT_USER_GIVEN` flag is making this harder than necessary. I really
wonder whether we should again refactor the code to use a positive
`USER_GIVEN` flag again: it's much more explicit, there's only one site
where we need to add it and ultimately it's a lot easier to reason
about. Furthermore, we can easily avoid marking such an object as
`UNINTERESTING` if we see that it was explicitly added to the set while
still marking its referenced objects as `UNINTERESTING` if need be.

Patrick

> Alternatively, part of that fundamental "partial clones are lying about
> their reachability" property is that we assume they can fetch the
> objects they need on-demand. And indeed, the on-demand fetch we'd do
> will use the noop negotiation algorithm, and will succeed. So should we,
> after receiving an empty pack from the other side (because we claimed to
> have objects reachable from the commit), then do a follow-up on-demand
> fetch? If so, then why isn't that kicking in (I can guess there might be
> some limits to on-demand fetching during a fetch itself, in order to
> avoid infinite recursion)?
> 
> -Peff

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2021-05-17  6:03 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-07 12:56 Bagas Sanjaya
2021-05-11 18:44 ` Jeff Hostetler
2021-05-13  9:57   ` Jeff King
2021-05-13 10:53     ` Jeff King
2021-05-14  7:27       ` Jeff King
2021-05-17  6:03         ` Patrick Steinhardt [this message]
2021-05-17  6:31           ` Jeff King
2021-05-17 16:25         ` Derrick Stolee
2021-05-18  8:11           ` Jeff King
2021-05-18 10:14             ` Bagas Sanjaya
2021-05-18 14:04               ` Derrick Stolee
2021-05-15  6:52     ` Bagas Sanjaya

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YKIHKjs7n3v4vwNt@ncase \
    --to=ps@pks.im \
    --cc=bagasdotme@gmail.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    --cc=sandals@crustytoothpaste.net \
    --subject='Re: "bad revision" fetch error when fetching missing objects from partial clones' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).