* partial clone: promisor fetch during push (pack-objects) @ 2021-02-26 12:01 Robert Coup 2021-02-26 20:54 ` Jonathan Tan 0 siblings, 1 reply; 4+ messages in thread From: Robert Coup @ 2021-02-26 12:01 UTC (permalink / raw) To: git For a partially-cloned repository, push (pack-objects) does a fetch for missing objects to do delta compression with. Test-case to reproduce, with annotated output: https://gist.github.com/rcoup/5593a0627cca52f226580c72743d8e33 (git v2.30.1.602.g966e671106) My use case is partially-cloning some large repositories with the majority of the blobs not fetched locally, writing some blobs to replace some specific paths I don't currently have cloned, committing & pushing (while leaving most trees as-is). I didn't expect git to attempt promisor fetches during a push :) I happened to see it because allowAnySha1InWant wasn't enabled, so it printed errors but happily continued and completed the push (after a fetch attempt for each object). My current workaround is setting `pack.window=0` during push. Looks from builtin/pack_objects.c:prepare_pack() that `pack.depth=0` should skip it too, but that didn't seem to work. Invoking pack-objects directly with any --missing= value still tries to fetch. And regardless, it carries on if the fetches fail. The fetches happen at the end of builtin/pack_objects.c:check_object(). 1. Feels to me that fetching from a promisor remote is never going to be quicker than skipping delta compression during a push. Maybe there's a case for doing it during other pack invocations to minimise size though? 2. Seems like a bug that check_object() doesn't honour fetch_if_missing and skip the call to prefetch_to_pack(). 3. But push doesn't pass --missing= to pack-objects anyway, so that wouldn't actually solve the original issue. Should it? Thanks! Rob :) ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: partial clone: promisor fetch during push (pack-objects) 2021-02-26 12:01 partial clone: promisor fetch during push (pack-objects) Robert Coup @ 2021-02-26 20:54 ` Jonathan Tan 2021-03-01 11:57 ` Robert Coup 0 siblings, 1 reply; 4+ messages in thread From: Jonathan Tan @ 2021-02-26 20:54 UTC (permalink / raw) To: robert.coup; +Cc: git, Jonathan Tan > For a partially-cloned repository, push (pack-objects) does a fetch > for missing objects to do delta compression with. > > Test-case to reproduce, with annotated output: > https://gist.github.com/rcoup/5593a0627cca52f226580c72743d8e33 > (git v2.30.1.602.g966e671106) I found it difficult to understand the gist (there were what I think are unrelated packings and unpackagings of objects, for example), but what I gleaned is that you did a bare partial clone (thus, no checkout, so no blobs get fetched at all), wrote a new blob to the repo, manually created a tree with the ID of that blob and the IDs of some missing blobs taken from the current HEAD tree, committed, and then pushed that commit. > My use case is partially-cloning some large repositories with the > majority of the blobs not fetched locally, writing some blobs to > replace some specific paths I don't currently have cloned, committing > & pushing (while leaving most trees as-is). I didn't expect git to > attempt promisor fetches during a push :) > > I happened to see it because allowAnySha1InWant wasn't enabled, so it > printed errors but happily continued and completed the push (after a > fetch attempt for each object). > > My current workaround is setting `pack.window=0` during push. Looks > from builtin/pack_objects.c:prepare_pack() that `pack.depth=0` should > skip it too, but that didn't seem to work. Ah, thanks for this. So the cause is not that we are unnecessarily pushing the missing objects, but because they appear in the window when we do delta compression (just like you said at the start). > Invoking pack-objects directly with any --missing= value still tries > to fetch. And regardless, it carries on if the fetches fail. The > fetches happen at the end of builtin/pack_objects.c:check_object(). To clarify, the "end" here is the call to prefetch_to_pack(), which indeed bypasses fetch_if_missing (set by some --missing= arguments). > 1. Feels to me that fetching from a promisor remote is never going to > be quicker than skipping delta compression during a push. Maybe > there's a case for doing it during other pack invocations to minimise > size though? It makes sense to me that we shouldn't fetch in this situation. > 2. Seems like a bug that check_object() doesn't honour > fetch_if_missing and skip the call to prefetch_to_pack(). We do need to fetch if we're trying to pack a missing object. I think the real bug here is that we shouldn't prefetch if the to_pack entry has preferred_base set (which means that it will not go into the final packfile and is just available for delta compression). > 3. But push doesn't pass --missing= to pack-objects anyway, so that > wouldn't actually solve the original issue. Should it? If we fix the bug I described above, I think it's still OK if push doesn't pass --missing=. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: partial clone: promisor fetch during push (pack-objects) 2021-02-26 20:54 ` Jonathan Tan @ 2021-03-01 11:57 ` Robert Coup 2021-03-01 20:55 ` Jonathan Tan 0 siblings, 1 reply; 4+ messages in thread From: Robert Coup @ 2021-03-01 11:57 UTC (permalink / raw) To: Jonathan Tan; +Cc: git Hi Jonathan, On Fri, 26 Feb 2021 at 20:55, Jonathan Tan <jonathantanmy@google.com> wrote: > So the cause is not that we are unnecessarily > pushing the missing objects, but because they appear in the window when > we do delta compression (just like you said at the start). Thanks for persevering :) > > > 2. Seems like a bug that check_object() doesn't honour > > fetch_if_missing and skip the call to prefetch_to_pack(). > > We do need to fetch if we're trying to pack a missing object. I think > the real bug here is that we shouldn't prefetch if the to_pack entry has > preferred_base set (which means that it will not go into the final > packfile and is just available for delta compression). That makes sense, I had a feeling it wouldn't be quite as simple as my suggestion. > > > 3. But push doesn't pass --missing= to pack-objects anyway, so that > > wouldn't actually solve the original issue. Should it? > > If we fix the bug I described above, I think it's still OK if push > doesn't pass --missing=. I'll have a go at a patch. Any suggestions on how to approach writing/expanding a test to cover this? Thanks, Rob :) ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: partial clone: promisor fetch during push (pack-objects) 2021-03-01 11:57 ` Robert Coup @ 2021-03-01 20:55 ` Jonathan Tan 0 siblings, 0 replies; 4+ messages in thread From: Jonathan Tan @ 2021-03-01 20:55 UTC (permalink / raw) To: robert.coup; +Cc: jonathantanmy, git > > > 3. But push doesn't pass --missing= to pack-objects anyway, so that > > > wouldn't actually solve the original issue. Should it? > > > > If we fix the bug I described above, I think it's still OK if push > > doesn't pass --missing=. > > I'll have a go at a patch. Any suggestions on how to approach > writing/expanding a test to cover this? I think the main thing is to test that such a push doesn't also fetch. The most straightforward and reliable way I can think of is to run "git count-objects -v" before and after the push. There are other ways like using GIT_TRACE_PACKET and then grep-ping for the absence of "fetch> done" (or something like that) (but I don't like this approach because grep-ping for absence is not robust in the face of typos) or checking the number of files in .git/objects/pack before and after (but that is not robust if Git decides to GC in between). ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2021-03-01 20:59 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-02-26 12:01 partial clone: promisor fetch during push (pack-objects) Robert Coup 2021-02-26 20:54 ` Jonathan Tan 2021-03-01 11:57 ` Robert Coup 2021-03-01 20:55 ` Jonathan Tan
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.