All of lore.kernel.org
 help / color / mirror / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Jeff King <peff@peff.net>, git@vger.kernel.org
Cc: Patrick Steinhardt <ps@pks.im>
Subject: Re: [PATCH] bitmaps: don't recurse into trees already in the bitmap
Date: Tue, 15 Jun 2021 10:17:04 -0400	[thread overview]
Message-ID: <471cb9be-bb72-6a37-ede8-f9421d9d3ebe@gmail.com> (raw)
In-Reply-To: <YMdGGL6v8LrbcAJP@coredump.intra.peff.net>

On 6/14/2021 8:05 AM, Jeff King wrote:
> On Mon, Jun 14, 2021 at 03:27:04AM -0400, Jeff King wrote:
...
> We can use the same optimization we do for commits here: when we are
> about to open a tree, see if it's in the bitmap (either the one we are
> building, or the "seen" bitmap which covers the UNINTERESTING side of
> the bitmap when doing a set-difference).

I was surprised that we were not already doing this. As Git can handle
larger repos now than it could a few years ago, I expect this
optimization to be immediately valuable, and critical in years to come.

> But here are numbers from some other real-world repositories (that are
> not public). This one's tree is comparable in size to linux.git, but has
> ~16k refs (and so less complete bitmap coverage):
> 
>   Test                         HEAD^               HEAD
>   -------------------------------------------------------------------------
>   5310.4: simulated clone      38.34(39.86+0.74)   33.95(35.53+0.76) -11.5%
>   5310.5: simulated fetch      2.29(6.31+0.35)     2.20(5.97+0.41) -3.9%
>   5310.7: rev-list (commits)   0.99(0.86+0.13)     0.96(0.85+0.11) -3.0%
>   5310.8: rev-list (objects)   11.32(11.04+0.27)   6.59(6.37+0.21) -41.8%
> 
> And here's another with a very large tree (~340k entries), and a fairly
> large number of refs (~10k):
> 
>   Test                         HEAD^               HEAD
>   -------------------------------------------------------------------------
>   5310.3: simulated clone      53.83(54.71+1.54)   39.77(40.76+1.50) -26.1%
>   5310.4: simulated fetch      19.91(20.11+0.56)   19.79(19.98+0.67) -0.6%
>   5310.6: rev-list (commits)   0.54(0.44+0.11)     0.51(0.43+0.07) -5.6%
>   5310.7: rev-list (objects)   24.32(23.59+0.73)   9.85(9.49+0.36) -59.5%
> 
> This patch provides substantial improvements in these larger cases, and
> have any drawbacks for smaller ones (the cost of the bitmap check is
> quite small compared to an actual tree traversal).

These many-refs scenarios make sense as something that is difficult to
verify using a single fork of an open-source project, but is common in
many closed-source projects that do not use forking to reduce the ref
count per repo.

I was impressed by how little code was required for this change. LGTM.

Thanks,
-Stolee

  reply	other threads:[~2021-06-15 14:24 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-14  7:27 [PATCH] bitmaps: don't recurse into trees already in the bitmap Jeff King
2021-06-14 12:05 ` Jeff King
2021-06-15 14:17   ` Derrick Stolee [this message]
2021-06-16 12:31     ` Patrick Steinhardt
2021-06-18 12:59       ` Jeff King
2021-06-18 13:35         ` Patrick Steinhardt
2021-06-18 14:10           ` Jeff King
2021-06-22 10:47           ` Patrick Steinhardt
2021-06-22 19:39             ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=471cb9be-bb72-6a37-ede8-f9421d9d3ebe@gmail.com \
    --to=stolee@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.