All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Jeff King <peff@peff.net>
Cc: Jacob Vosmaer <jacob@gitlab.com>,
	git@vger.kernel.org, Taylor Blau <me@ttaylorr.com>
Subject: Re: [PATCH 1/1] builtin/pack-objects.c: avoid iterating all refs
Date: Thu, 21 Jan 2021 11:26:13 +0100	[thread overview]
Message-ID: <874kja8u2i.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <YAiKQ0M0/14Q13Ee@coredump.intra.peff.net>


On Wed, Jan 20 2021, Jeff King wrote:

> On Wed, Jan 20, 2021 at 09:50:19AM +0100, Ævar Arnfjörð Bjarmason wrote:
>
>> To elaborate on other things that aren't really your problem & Taylor's
>> E-Mail downthread we originally added this because:
>> 
>>     If new annotated tags have been introduced then we can also include
>>     them in the packfile, saving the client from needing to request them
>>     through a second connection.
>> 
>> I've barked up this whole tag fetch tree before 97716d217c (fetch: add a
>> --prune-tags option and fetch.pruneTags config, 2018-02-09) but really
>> not this specific area.
>> 
>> I wonder if longer term simply moving this to be work the client does
>> wouldn't make more sense. I.e. if we just delete this for_each_ref()
>> loop.
>>     
>> We're just saving a client from saying they "want" a tag. I.e. with the
>> whole thing removed we need this to make t5503-tagfollow.sh pass (see
>> [1] at the end for the dialog, the tag is 3253df4d...):
>
> Isn't this an ordering problem, though? The client cannot mention
> auto-followed tags in a "want", because they first need to "want" the
> main history, receive the pack, and then realize they want the others.
>
> So we are not just saving the client from sending a "want", but making a
> second full connection to do so. That seems to be an optimization worth
> continuing to have.

Correct. Suppose a history COMMIT(TAG) history like:

    mainline: A(X) -> B(Y) -> C
    topic:    A(X) -> B(Y) -> D(Z)

You only want the "mainline" history and its tags in your fetch. Now a
server will give you tags X & Y for free, ignoring Z.

The note in protocol-capabilities.txt supposes that you'll need to only
get A/B/C in one fetch, then do another one where you see from the OIDs
you have and advertised (peeled) OIDs for the tags and know you just
need X/Y, not Z.

So we could also just fetch Z, then do our own walk on the client to see
if it's reachable, and throw it away if not. Although I suppose that'll
need a list of "bad" tags on the client so we don't repeat the process
the next time around, hrm...

>>     diff --git a/t/t5503-tagfollow.sh b/t/t5503-tagfollow.sh
>>     index 6041a4dd32..1ddc430aef 100755
>>     --- a/t/t5503-tagfollow.sh
>>     +++ b/t/t5503-tagfollow.sh
>>     @@ -134,6 +134,7 @@ test_expect_success 'fetch B, S (commit and tag : 1 connection)' '
>>      test_expect_success 'setup expect' '
>>      cat - <<EOF >expect
>>      want $B
>>     +want $T
>>      want $S
>>      EOF
>>      '
>>     @@ -146,6 +147,7 @@ test_expect_success 'new clone fetch master and tags' '
>>                     cd clone2 &&
>>                     git init &&
>>                     git remote add origin .. &&
>>     +               git config --add remote.origin.fetch "+refs/tags/*:refs/tags/*" &&
>>                     GIT_TRACE_PACKET=$UPATH git fetch &&
>>                     test $B = $(git rev-parse --verify origin/master) &&
>>                     test $S = $(git rev-parse --verify tag2) &&
>> 
>> We're also saving the client the work of having to go through
>> refs/tags/* and figure out whether there are tags there that aren't on
>> our main history.
>
> You seem to be against auto-following at all. And certainly I can see an
> argument that it is not worth the trouble it causes. But it is the
> default behavior, and I suspect many people are relying on it. Fetching
> every tag indiscriminately is going to grab a bunch of extra unwanted
> objects in some repos. An obvious case is any time "clone
> --single-branch --depth" is used.

I wonder if the use-cases for --depth in the wild aren't 99.9%
--depth=1, in which case the peeled commit on the main branch being
advertised by a tag would alredy give you this information.

I.e. in the common case you don't get a tag, but if you happen to land
on a release you can see that the commit at the tip is tagged. I wonder
if doing that shortcut already in the client (so not send include-tag)
is a useful micro-optimization.

> Maybe I'm not quite understanding what you're proposing.

Not much really, just that looking deeper into this area might be a
useful exercise. I.e. it's a case of server<->client cooperation where
we offlod the work to one or the other based on an old design decision,
maybe that trade-off is not optimal in most cases anymore.

  reply	other threads:[~2021-01-21 10:27 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-19 14:33 [PATCH 0/1] builtin/pack-objects.c: avoid iterating all refs Jacob Vosmaer
2021-01-19 14:33 ` [PATCH 1/1] " Jacob Vosmaer
2021-01-19 23:08   ` Taylor Blau
2021-01-19 23:33     ` Jeff King
2021-01-19 23:54       ` Taylor Blau
2021-01-19 23:15   ` Jeff King
2021-01-20  8:50   ` Ævar Arnfjörð Bjarmason
2021-01-20 15:02     ` Taylor Blau
2021-01-20 16:32       ` Ævar Arnfjörð Bjarmason
2021-01-20 19:53     ` Jeff King
2021-01-21 10:26       ` Ævar Arnfjörð Bjarmason [this message]
2021-01-21 15:34         ` Jeff King
2021-01-19 23:26 ` [PATCH 0/1] " Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=874kja8u2i.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jacob@gitlab.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.