From: Derrick Stolee <stolee@gmail.com>
To: Jeff King <peff@peff.net>, git@vger.kernel.org
Cc: Patrick Steinhardt <ps@pks.im>
Subject: Re: [PATCH] bitmaps: don't recurse into trees already in the bitmap
Date: Tue, 15 Jun 2021 10:17:04 -0400 [thread overview]
Message-ID: <471cb9be-bb72-6a37-ede8-f9421d9d3ebe@gmail.com> (raw)
In-Reply-To: <YMdGGL6v8LrbcAJP@coredump.intra.peff.net>
On 6/14/2021 8:05 AM, Jeff King wrote:
> On Mon, Jun 14, 2021 at 03:27:04AM -0400, Jeff King wrote:
...
> We can use the same optimization we do for commits here: when we are
> about to open a tree, see if it's in the bitmap (either the one we are
> building, or the "seen" bitmap which covers the UNINTERESTING side of
> the bitmap when doing a set-difference).
I was surprised that we were not already doing this. As Git can handle
larger repos now than it could a few years ago, I expect this
optimization to be immediately valuable, and critical in years to come.
> But here are numbers from some other real-world repositories (that are
> not public). This one's tree is comparable in size to linux.git, but has
> ~16k refs (and so less complete bitmap coverage):
>
> Test HEAD^ HEAD
> -------------------------------------------------------------------------
> 5310.4: simulated clone 38.34(39.86+0.74) 33.95(35.53+0.76) -11.5%
> 5310.5: simulated fetch 2.29(6.31+0.35) 2.20(5.97+0.41) -3.9%
> 5310.7: rev-list (commits) 0.99(0.86+0.13) 0.96(0.85+0.11) -3.0%
> 5310.8: rev-list (objects) 11.32(11.04+0.27) 6.59(6.37+0.21) -41.8%
>
> And here's another with a very large tree (~340k entries), and a fairly
> large number of refs (~10k):
>
> Test HEAD^ HEAD
> -------------------------------------------------------------------------
> 5310.3: simulated clone 53.83(54.71+1.54) 39.77(40.76+1.50) -26.1%
> 5310.4: simulated fetch 19.91(20.11+0.56) 19.79(19.98+0.67) -0.6%
> 5310.6: rev-list (commits) 0.54(0.44+0.11) 0.51(0.43+0.07) -5.6%
> 5310.7: rev-list (objects) 24.32(23.59+0.73) 9.85(9.49+0.36) -59.5%
>
> This patch provides substantial improvements in these larger cases, and
> have any drawbacks for smaller ones (the cost of the bitmap check is
> quite small compared to an actual tree traversal).
These many-refs scenarios make sense as something that is difficult to
verify using a single fork of an open-source project, but is common in
many closed-source projects that do not use forking to reduce the ref
count per repo.
I was impressed by how little code was required for this change. LGTM.
Thanks,
-Stolee
next prev parent reply other threads:[~2021-06-15 14:24 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-14 7:27 [PATCH] bitmaps: don't recurse into trees already in the bitmap Jeff King
2021-06-14 12:05 ` Jeff King
2021-06-15 14:17 ` Derrick Stolee [this message]
2021-06-16 12:31 ` Patrick Steinhardt
2021-06-18 12:59 ` Jeff King
2021-06-18 13:35 ` Patrick Steinhardt
2021-06-18 14:10 ` Jeff King
2021-06-22 10:47 ` Patrick Steinhardt
2021-06-22 19:39 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=471cb9be-bb72-6a37-ede8-f9421d9d3ebe@gmail.com \
--to=stolee@gmail.com \
--cc=git@vger.kernel.org \
--cc=peff@peff.net \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.