All of lore.kernel.org
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Jacob Vosmaer <jacob@gitlab.com>
Cc: git@vger.kernel.org, avarab@gmail.com, peff@peff.net, me@ttaylorr.com
Subject: Re: [PATCH v2 1/1] builtin/pack-objects.c: avoid iterating all refs
Date: Wed, 20 Jan 2021 09:49:20 -0500	[thread overview]
Message-ID: <YAhC8Gsp4H17e28n@nand.local> (raw)
In-Reply-To: <20210120124514.49737-2-jacob@gitlab.com>

Hi Jacob,

On Wed, Jan 20, 2021 at 01:45:14PM +0100, Jacob Vosmaer wrote:
> In git-pack-objects, we iterate over all the tags if the --include-tag
> option is passed on the command line. For some reason this uses
> for_each_ref which is expensive if the repo has many refs. We should
> use for_each_tag_ref instead.

I don't think it's worth sending another version, but I would have liked
to see: "... because we can save time by only iterating over some of the
refs" at the end of this paragraph.

> Because the add_ref_tag callback will now only visit tags we
> simplified it a bit.
>
> The motivation for this change is that we observed performance issues
> with a repository on gitlab.com that has 500,000 refs but only 2,000
> tags. The fetch traffic on that repo is dominated by CI, and when we
> changed CI to fetch with 'git fetch --no-tags' we saw a dramatic
> change in the CPU profile of git-pack-objects. This lead us to this
> particular ref walk. More details in:
> https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/746#note_483546598
>
> Signed-off-by: Jacob Vosmaer <jacob@gitlab.com>
> ---
>  builtin/pack-objects.c | 8 +++-----
>  1 file changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 2a00358f34..ad52c91bdb 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -2803,13 +2803,11 @@ static void add_tag_chain(const struct object_id *oid)
>  	}
>  }
>
> -static int add_ref_tag(const char *path, const struct object_id *oid, int flag, void *cb_data)
> +static int add_ref_tag(const char *tag, const struct object_id *oid, int flag, void *cb_data)
>  {
>  	struct object_id peeled;
>
> -	if (starts_with(path, "refs/tags/") && /* is a tag? */
> -	    !peel_ref(path, &peeled)    && /* peelable? */
> -	    obj_is_packed(&peeled)) /* object packed? */
> +	if (!peel_ref(tag, &peeled) && obj_is_packed(&peeled))
>  		add_tag_chain(oid);
>  	return 0;
>  }
> @@ -3740,7 +3738,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
>  	}
>  	cleanup_preferred_base();
>  	if (include_tag && nr_result)
> -		for_each_ref(add_ref_tag, NULL);
> +		for_each_tag_ref(add_ref_tag, NULL);

OK. Seeing another caller (builtin/pack-objects.c:compute_write_order())
that passes a callback to for_each_tag_ref() makes me feel more
comfortable about using it here.

Thanks for investigating and resolving this in a way which cleans up the
surrounding code.

>  	stop_progress(&progress_state);
>  	trace2_region_leave("pack-objects", "enumerate-objects",
>  			    the_repository);
> --
> 2.30.0

This version looks good to me, thanks for digging!

  Reviewed-by: Taylor Blau <me@ttaylorr.com>

Thanks,
Taylor

  reply	other threads:[~2021-01-20 15:40 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-20 12:45 [PATCH v2 0/1] builtin/pack-objects.c: avoid iterating all refs Jacob Vosmaer
2021-01-20 12:45 ` [PATCH v2 1/1] " Jacob Vosmaer
2021-01-20 14:49   ` Taylor Blau [this message]
2021-01-20 16:18     ` Jeff King
2021-01-20 16:19       ` Taylor Blau
2021-01-20 18:49         ` Jacob Vosmaer
2021-01-20 19:45         ` Jeff King
2021-01-20 21:46           ` Jacob Vosmaer
2021-01-20 21:52             ` Taylor Blau
2021-01-21  2:54             ` Jeff King
2021-01-22 16:46               ` Jacob Vosmaer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YAhC8Gsp4H17e28n@nand.local \
    --to=me@ttaylorr.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jacob@gitlab.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.