git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Tao Klerks <tao@klerks.biz>, git@vger.kernel.org
Subject: Re: Partial Clone, and a strange slow rev-list call on fetch
Date: Wed, 2 Jun 2021 07:18:46 -0400	[thread overview]
Message-ID: <71e60d80-44c1-225d-3cf4-26740de2ac6d@gmail.com> (raw)
In-Reply-To: <CAPMMpogCz4o3ZGYHnux_6w+uFyxV-FR0R1hFNeg1COiv0qd_0g@mail.gmail.com>

On 6/2/21 12:56 AM, Tao Klerks wrote:
> Hi folks,
> 
> I'm learning to use Partial Clone, and finding a behavior that I don't
> know how to interpret or investigate:
> 
> Under some circumstances, doing a plain "git fetch <remote>" on a
> filtered repo results in a very long (6-30 min?) wait, during which I
> can see the following command being executed in the background:
> 
> /usr/libexec/git-core/git rev-list --objects --stdin
> --exclude-promisor-objects --not --all --quiet --alternate-refs
> 
> So far, I have noted this happening under two distinct circumstances:
> * Anytime I try to fetch on a filtered repo with a git 2.23 client -
> shorter pause
> * When I try to fetch with a recent (2.31) client in a repo where one
> large packfile has no *.promisor file (but the others do, and the
> remote I am fetching from has promisor=true) - looong pause

This makes me think that there was a bug fix for this situation
but the fix requires doing extra work. To help track this down,
could you re-run the scenario with GIT_TRACE2_PERF=1 which will
give the full Git process stack as we reach that rev-list call.

> Can anyone explain what this rev-list call intends, and/or any hints
> as to how I could see what the stdin content being fed to it from the
> parent process actually is?
> 
> For background, I ended up in the "missing promisor file" situation by
> trying to be (too?) clever about the blobs present in my clone: I
> cloned unfiltered shallow to a certain depth with certain refspecs,
> then added the promisor and filter config, and finally fetched with
> "--unshallow". This produced exactly the blob-population state I
> intended, but meant the original first packfile had no ".promisor"
> file.

This is the critical point: you first cloned without a filter,
and then converted the remote to a promisor remote without
marking the pack-files you received from that remote as promisor
pack-files. That means that Git needs to do some work to discover
which objects are reachable from promisor packs or not, and that
extra work is slowing you down.

Partial clone is designed to work where every remote is a
promisor remote, and always has been so. Any deviation from that
norm is venturing into uncharted territory and will have friction
like this. Another similar issue comes when you have multiple
remotes and one of them is a promisor remote and another is not.

The general advice right now is to use partial clone only if you
will use it for all remotes across the entire existence of the
repo.

Part of the difficulty here is that once you download that first
pack-file from the remote, Git has no way of knowing that the
pack came from that source or was created in another way. We
have no way to be sure that we can "upgrade" the remote in an
automated process.

This does make me wonder what happens when Git repacks objects
created locally and then starts fetching from a promisor remote.

There are some challenges here, for sure. Most likely also some
potential gains, but it is unlikely to create a seamless
experience for what you are trying to do.

Thanks,
-Stolee

  reply	other threads:[~2021-06-02 11:18 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-02  4:56 Tao Klerks
2021-06-02 11:18 ` Derrick Stolee [this message]
2021-06-03 21:10   ` Tao Klerks
2021-06-04 13:21     ` Derrick Stolee
2021-06-05  6:35       ` Tao Klerks

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=71e60d80-44c1-225d-3cf4-26740de2ac6d@gmail.com \
    --to=stolee@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=tao@klerks.biz \
    --subject='Re: Partial Clone, and a strange slow rev-list call on fetch' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).