git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Jonathan Tan <jonathantanmy@google.com>, git@vger.kernel.org
Subject: Re: [PATCH] checkout: send commit provenance during prefetch
Date: Wed, 16 Dec 2020 09:50:01 -0500	[thread overview]
Message-ID: <ad2579d6-c00a-c0a8-ae9b-11b168be1940@gmail.com> (raw)
In-Reply-To: <20201215200207.1083655-1-jonathantanmy@google.com>

On 12/15/2020 3:02 PM, Jonathan Tan wrote:
> In a partial clone, whenever Git needs a missing object, it will fetch
> it from the repo's promisor remote(s), sometimes as part of a bulk
> prefetch. Currently, if the server handling such fetches does not accept
> requests for objects unless the objects are reachable, it needs to
> compute reachability from all refs.
>
> This can be improved by sending the commit from which the prefetch
> request came - a server that understands this would then only need to
> check that this commit is reachable, and check that the object is
> reachable from that commit.

You're right that there are _two_ important checks here:

1. the commit is reachable.
2. the blob is reachable from that commit.

This does have the potential of greatly reducing the amount of
parsed trees. It would be nice to see how much of a gain that
really provides. I suspect that it only helps for commits that
are far from the ref tips, depending on your tree-walk algorithm.

This additional constraint of "the blob is reachable from the
provided commit" should not be considered a limitation to the
server, since a reachability bitmap from the tip commits might
be able to provide faster responses than walking trees from the
provided commit.

Perhaps it would be worth testing a different mechanism for
using your reachability bitmaps? I'm guessing that your algorithm
uses this pattern:

1. Construct a reachability bitmap containing all objects reachable
   from the allowed refs.
2. Check if the blobs have bits on in that bitmap.

Instead, you could do this slightly-more-complicated thing:

1. Walk commits from the tips until finding reachability bitmaps
   or commits inside those discovered bitmaps. (Take unions as
   you find new bitmaps. Stop walking when a commit is in the
   union.)

2. "approve" blobs that have bits on in those bitmaps.

3. While there are unapproved blobs, walk trees from the walked
   commits, adding found objects to the bitmap. Stop when all blobs
   are approved or when all objects are walked.

Basically, you could probably respond more quickly to some of these
requests without needing a provenance commit, especially in the
cases where the provenance commit would be more helpful (older commits,
by my guess).

Perhaps you've already tried these things and have discovered that
the provenance commit is faster more often. That would definitely be
a helpful justification of the feature.

> Therefore, teach the partial clone fetching mechanism to support a
> "provenance" argument, and plumb the commit provenance from checkout to
> the partial clone fetching mechanism.

In a full patch, it would be good to document that this could be sent
across the wire, even if Git's server implementation ignores it.
 
> In the future, other commands can be similarly upgraded. Other possible
> future improvements include better diagnostic messages when a prefetch
> fails.

So, this is only in the case of 'git checkout'. I know that the batch
request logic for partial clone is also called by things like "git diff"
or "git log --follow" when doing rename detection. Those might require
sending _multiple_ provenance commits, so be sure to make that be a
possible outcome of the option.

> @@ -56,11 +56,14 @@ test_expect_success 'verify that .promisor file contains refs fetched' '
>  
>  # checkout master to force dynamic object fetch of blobs at HEAD.
>  test_expect_success 'verify checkout with dynamic object fetch' '
> +	rm -f trace &&
>  	git -C pc1 rev-list --quiet --objects --missing=print HEAD >observed &&
>  	test_line_count = 4 observed &&
> -	git -C pc1 checkout master &&
> +	GIT_TRACE_PACKET="$(pwd)/trace" git -C pc1 checkout master &&
>  	git -C pc1 rev-list --quiet --objects --missing=print HEAD >observed &&
> -	test_line_count = 0 observed
> +	test_line_count = 0 observed &&
> +	HEAD_HASH="$(git -C pc1 rev-parse HEAD)" &&
> +	grep "server-option=provenance=$HEAD_HASH" trace
>  '

This is about as good as we can do for testing the option here. I'm
assuming we have some tests already that check the Git server is
still providing good answers over HTTP (and ignoring this option).

tl;dr: I'm not against this idea. Just hopefully the server-side options
have been fully explored to justify the feature is worth it.

Thanks,
-Stolee

  reply	other threads:[~2020-12-16 14:51 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-15 20:02 [PATCH] checkout: send commit provenance during prefetch Jonathan Tan
2020-12-16 14:50 ` Derrick Stolee [this message]
2020-12-16 18:50 ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ad2579d6-c00a-c0a8-ae9b-11b168be1940@gmail.com \
    --to=stolee@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).