All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Jacob Vosmaer <jacob@gitlab.com>
Cc: git@vger.kernel.org, Taylor Blau <me@ttaylorr.com>,
	Jeff King <peff@peff.net>
Subject: Re: [PATCH 1/1] builtin/pack-objects.c: avoid iterating all refs
Date: Wed, 20 Jan 2021 09:50:19 +0100	[thread overview]
Message-ID: <87lfco801g.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <20210119143348.27535-2-jacob@gitlab.com>


On Tue, Jan 19 2021, Jacob Vosmaer wrote:

> In git-pack-objects, we iterate over all the tags if the --include-tag
> option is passed on the command line. For some reason this uses
> for_each_ref which is expensive if the repo has many refs. We should
> use a prefix iterator instead.
>
> Signed-off-by: Jacob Vosmaer <jacob@gitlab.com>
> ---
>  builtin/pack-objects.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 2a00358f34..2b32bc93bd 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -3740,7 +3740,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
>  	}
>  	cleanup_preferred_base();
>  	if (include_tag && nr_result)
> -		for_each_ref(add_ref_tag, NULL);
> +		for_each_fullref_in("refs/tags/", add_ref_tag, NULL, 0);
>  	stop_progress(&progress_state);
>  	trace2_region_leave("pack-objects", "enumerate-objects",
>  			    the_repository);

Not really a new problem, but it would be nice to also have a test in
t5305-include-tag.sh to check what happens with "tags" outside the
refs/tags/ namespace. I.e. if this & the existing prefix in
add_ref_tag() were dropped.

To elaborate on other things that aren't really your problem & Taylor's
E-Mail downthread we originally added this because:

    If new annotated tags have been introduced then we can also include
    them in the packfile, saving the client from needing to request them
    through a second connection.

I've barked up this whole tag fetch tree before 97716d217c (fetch: add a
--prune-tags option and fetch.pruneTags config, 2018-02-09) but really
not this specific area.

I wonder if longer term simply moving this to be work the client does
wouldn't make more sense. I.e. if we just delete this for_each_ref()
loop.
    
We're just saving a client from saying they "want" a tag. I.e. with the
whole thing removed we need this to make t5503-tagfollow.sh pass (see
[1] at the end for the dialog, the tag is 3253df4d...):

    diff --git a/t/t5503-tagfollow.sh b/t/t5503-tagfollow.sh
    index 6041a4dd32..1ddc430aef 100755
    --- a/t/t5503-tagfollow.sh
    +++ b/t/t5503-tagfollow.sh
    @@ -134,6 +134,7 @@ test_expect_success 'fetch B, S (commit and tag : 1 connection)' '
     test_expect_success 'setup expect' '
     cat - <<EOF >expect
     want $B
    +want $T
     want $S
     EOF
     '
    @@ -146,6 +147,7 @@ test_expect_success 'new clone fetch master and tags' '
                    cd clone2 &&
                    git init &&
                    git remote add origin .. &&
    +               git config --add remote.origin.fetch "+refs/tags/*:refs/tags/*" &&
                    GIT_TRACE_PACKET=$UPATH git fetch &&
                    test $B = $(git rev-parse --verify origin/master) &&
                    test $S = $(git rev-parse --verify tag2) &&

We're also saving the client the work of having to go through
refs/tags/* and figure out whether there are tags there that aren't on
our main history.

I think that since f0a24aa56e (git-pack-objects: Automatically pack
annotated tags if object was packed, 2008-03-03) was written it's become
clear that in the wild that's almost nobody's use-case. I.e. people with
a set of refs/heads/* branches and a refs/tags/* set that doesn't refer
to those branches.

Or if they do, I think it's such an obscure use-case that if we were
designing this today we could pass that work of figuring out if there's
such tags in refs/tags/* off to the client.

It seems we might just be able to delete this code on the server-side,
per protocol-capabilities.txt:

    Clients MUST be prepared for the case where a server has ignored
    include-tag and has not actually sent tags in the pack.  In such
    cases the client SHOULD issue a subsequent fetch to acquire the tags
    that include-tag would have otherwise given the client.

I.e. in the case where the server isn't playing along and I haven't set
"+refs/tags/*:refs/tags/*". But as the test shows we don't do that
following ourselves unless refs/tags/* is in the refspec (and then it's
not really "following", we're just getting all the tags).

1. From t5503-tagfollow.sh:
    $ grep -C5 3253df4d1cf6fb138b52b1938473bcfec1483223 UPLOAD_LOG 
    packet:  upload-pack< ref-prefix refs/tags/
    packet:  upload-pack< ref-prefix refs/tags/
    packet:  upload-pack< 0000
    packet:  upload-pack> 8e10cf4e007ad7e003463c30c34b1050b039db78 refs/heads/master
    packet:        fetch< 8e10cf4e007ad7e003463c30c34b1050b039db78 refs/heads/master
    packet:  upload-pack> 3253df4d1cf6fb138b52b1938473bcfec1483223 refs/tags/tag1 peeled:c06aaaf9c64f9ef1f209bcb6d7bdba68a5e27fab
    packet:        fetch< 3253df4d1cf6fb138b52b1938473bcfec1483223 refs/tags/tag1 peeled:c06aaaf9c64f9ef1f209bcb6d7bdba68a5e27fab
    packet:  upload-pack> ddfa4a33562179aca1ace2bcc662244a17d0b503 refs/tags/tag2 peeled:8e10cf4e007ad7e003463c30c34b1050b039db78
    packet:  upload-pack> 0000
    packet:        fetch< ddfa4a33562179aca1ace2bcc662244a17d0b503 refs/tags/tag2 peeled:8e10cf4e007ad7e003463c30c34b1050b039db78
    packet:        fetch< 0000
    packet:        fetch> command=fetch
    --
    packet:        fetch> 0001
    packet:        fetch> thin-pack
    packet:        fetch> include-tag
    packet:        fetch> ofs-delta
    packet:        fetch> want 8e10cf4e007ad7e003463c30c34b1050b039db78
    packet:        fetch> want 3253df4d1cf6fb138b52b1938473bcfec1483223
    packet:        fetch> want ddfa4a33562179aca1ace2bcc662244a17d0b503
    packet:        fetch> done
    packet:        fetch> 0000
    packet:  upload-pack< object-format=sha1
    packet:  upload-pack< command=fetch
    --
    packet:  upload-pack< 0001
    packet:  upload-pack< thin-pack
    packet:  upload-pack< include-tag
    packet:  upload-pack< ofs-delta
    packet:  upload-pack< want 8e10cf4e007ad7e003463c30c34b1050b039db78
    packet:  upload-pack< want 3253df4d1cf6fb138b52b1938473bcfec1483223
    packet:  upload-pack< want ddfa4a33562179aca1ace2bcc662244a17d0b503
    packet:  upload-pack< done
    packet:  upload-pack< 0000
    packet:  upload-pack> packfile
    packet:        fetch< packfile
    

  parent reply	other threads:[~2021-01-20  9:00 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-19 14:33 [PATCH 0/1] builtin/pack-objects.c: avoid iterating all refs Jacob Vosmaer
2021-01-19 14:33 ` [PATCH 1/1] " Jacob Vosmaer
2021-01-19 23:08   ` Taylor Blau
2021-01-19 23:33     ` Jeff King
2021-01-19 23:54       ` Taylor Blau
2021-01-19 23:15   ` Jeff King
2021-01-20  8:50   ` Ævar Arnfjörð Bjarmason [this message]
2021-01-20 15:02     ` Taylor Blau
2021-01-20 16:32       ` Ævar Arnfjörð Bjarmason
2021-01-20 19:53     ` Jeff King
2021-01-21 10:26       ` Ævar Arnfjörð Bjarmason
2021-01-21 15:34         ` Jeff King
2021-01-19 23:26 ` [PATCH 0/1] " Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87lfco801g.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jacob@gitlab.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.