All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Taylor Blau <me@ttaylorr.com>
Cc: "Patrick Steinhardt" <ps@pks.im>,
	git@vger.kernel.org,
	"Felipe Contreras" <felipe.contreras@gmail.com>,
	"SZEDER Gábor" <szeder.dev@gmail.com>,
	"Chris Torek" <chris.torek@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Junio C Hamano" <gitster@pobox.com>
Subject: Re: [PATCH v2 3/3] connected: implement connectivity check using bitmaps
Date: Fri, 2 Jul 2021 17:21:48 -0400	[thread overview]
Message-ID: <YN+DbDhMrqOnJbcO@coredump.intra.peff.net> (raw)
In-Reply-To: <YN9QbEaWgP09PfeD@nand.local>

On Fri, Jul 02, 2021 at 01:44:12PM -0400, Taylor Blau wrote:

> I would add that these effects change with the size of the bitmap.
> Let's just consider the "count the number of objects in a bitmapped
> commit". On my local copy of the kernel, I see a relatively modest
> improvement:
> 
>     $ tip=2ab38c17aac10bf55ab3efde4c4db3893d8691d2
>     $ hyperfine \
>       'GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index $tip' \
>       'GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index $tip' \
>       --warmup=3
>     Benchmark #1: GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index $tip
>       Time (mean ± σ):      21.5 ms ±   5.6 ms    [User: 8.7 ms, System: 12.7 ms]
>       Range (min … max):    12.4 ms …  34.2 ms    170 runs
> 
>     Benchmark #2: GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index $tip
>       Time (mean ± σ):      10.6 ms ±   1.6 ms    [User: 7.1 ms, System: 3.5 ms]
>       Range (min … max):     4.5 ms …  11.9 ms    258 runs
> 
> but on my copy of the kernel's fork network repo (that containing all of
> torvalds/linux's objects, as well as all of its fork's objects, too),
> the magnitude of the effect is much bigger:
> 
>     Benchmark #1: GIT_READ_COMMIT_TABLE=0 git.compile rev-list --count --objects --use-bitmap-index $tip
>       Time (mean ± σ):     332.3 ms ±  12.6 ms    [User: 210.4 ms, System: 121.8 ms]
>       Range (min … max):   322.7 ms … 362.4 ms    10 runs
> 
>     Benchmark #2: GIT_READ_COMMIT_TABLE=1 git.compile rev-list --count --objects --use-bitmap-index $tip
>       Time (mean ± σ):     260.0 ms ±   9.3 ms    [User: 191.0 ms, System: 69.0 ms]
>       Range (min … max):   250.4 ms … 272.8 ms    11 runs
> 
> That's a more modest 1.28x improvement (versus 2.03x in just linux.git),
> but the overall magnitude is much bigger.

Thanks, this is much more compelling. 70ms is a lot of startup time to
save. I am a little surprised that a no-traversal bitmap query like this
would still take 300ms. I wonder if 2ab38c17aac actually got a bitmap in
your second example (and if not, then there are probably cases where the
relative speedup would be even more impressive).

> This clouded up some of my timings in p5310, which made me think that it
> would be a good idea to `git config pack.writeReverseIndex true` in the
> setup for those tests, but an even better direction would be to change
> the default of pack.writeReverseIndex to true everywhere.

Yes, I'd be in favor of that. IMHO the reason to make it configurable at
all was not because it's ever a bad idea, but just to phase it in and
get experience with it (and to give an escape hatch for debugging it).

It's probably _less_ useful for local clones that are not serving
fetches. But every push is already generating the same thing in-memory,
so it seems like a good tradeoff to just use it everywhere.

> >   - there's some extra complexity in the file format and code to read
> >     and write these (and still fall back to the old system when they're
> >     absent). I don't think it's a deal-breaker, as it's really not that
> >     complicated a feature.
> 
> I agree with both of these. The complexity is manageable, I think,
> especially since I dropped support for the extended offset table (having
> a bitmap file that is >2GiB seems extremely unlikely to me, and it's
> possible to add support for it in the future) and
> fanout table (there are usually less than <1k commits with bitmaps, so
> a 256-entry fanout table doesn't seem to help much in benchmarking).
> 
> So what's left of the format is really just:
> 
>   - a table of object id's
>   - a table of (uint32_t, uint32_t) tuples describing the (short) offset
>     of the bitmap, and an index position of the xor'd bitmap (if one
>     exists).

Yeah, that really seems quite simple. I'd have to judge after seeing the
cleaned up code, but I suspect it's not going to be a burden.

> I'll be offline for the next ~week and a half for my wedding, but I'll
> post some patches to the list shortly after I get back.

Yep, no rush. Thanks for looking into this.

-Peff

  reply	other threads:[~2021-07-02 21:21 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-28  5:33 [PATCH v2 0/3] Speed up connectivity checks via bitmaps Patrick Steinhardt
2021-06-28  5:33 ` [PATCH v2 1/3] p5400: add perf tests for git-receive-pack(1) Patrick Steinhardt
2021-06-28  7:49   ` Ævar Arnfjörð Bjarmason
2021-06-29  6:18     ` Patrick Steinhardt
2021-06-29 12:09       ` Ævar Arnfjörð Bjarmason
2021-06-28  5:33 ` [PATCH v2 2/3] receive-pack: skip connectivity checks on delete-only commands Patrick Steinhardt
2021-06-28  8:00   ` Ævar Arnfjörð Bjarmason
2021-06-28  8:06     ` Ævar Arnfjörð Bjarmason
2021-06-29  6:26     ` Patrick Steinhardt
2021-06-30  1:31   ` Jeff King
2021-06-30  1:35     ` Jeff King
2021-06-30 13:52     ` Patrick Steinhardt
2021-06-28  5:33 ` [PATCH v2 3/3] connected: implement connectivity check using bitmaps Patrick Steinhardt
2021-06-28 20:23   ` Taylor Blau
2021-06-29 22:44     ` Taylor Blau
2021-06-30  2:04       ` Jeff King
2021-06-30  3:07         ` Taylor Blau
2021-06-30  5:45           ` Jeff King
2021-07-02 17:44             ` Taylor Blau
2021-07-02 21:21               ` Jeff King [this message]
2021-06-30  1:51   ` Jeff King
2021-07-20 14:26     ` Patrick Steinhardt
2021-08-02  9:37 ` [PATCH v3 0/4] Speed up connectivity checks Patrick Steinhardt
2021-08-02  9:38   ` [PATCH v3 1/4] connected: do not sort input revisions Patrick Steinhardt
2021-08-02 12:49     ` Ævar Arnfjörð Bjarmason
2021-08-03  8:50       ` Patrick Steinhardt
2021-08-04 11:01         ` Ævar Arnfjörð Bjarmason
2021-08-02 19:00     ` Junio C Hamano
2021-08-03  8:55       ` Patrick Steinhardt
2021-08-03 21:47         ` Junio C Hamano
2021-08-02  9:38   ` [PATCH v3 2/4] revision: stop retrieving reference twice Patrick Steinhardt
2021-08-02 12:53     ` Ævar Arnfjörð Bjarmason
2021-08-02  9:38   ` [PATCH v3 3/4] revision: avoid loading object headers multiple times Patrick Steinhardt
2021-08-02 12:55     ` Ævar Arnfjörð Bjarmason
2021-08-05 10:12       ` Patrick Steinhardt
2021-08-02 19:40     ` Junio C Hamano
2021-08-03  9:07       ` Patrick Steinhardt
2021-08-06 14:17         ` Patrick Steinhardt
2021-08-02  9:38   ` [PATCH v3 4/4] revision: avoid hitting packfiles when commits are in commit-graph Patrick Steinhardt
2021-08-02 20:01     ` Junio C Hamano
2021-08-03  9:16       ` Patrick Steinhardt
2021-08-03 21:56         ` Junio C Hamano
2021-08-05 11:01           ` Patrick Steinhardt
2021-08-05 16:16             ` Junio C Hamano
2021-08-04 10:51         ` Ævar Arnfjörð Bjarmason
2021-08-05 11:25   ` [PATCH v4 0/6] Speed up connectivity checks Patrick Steinhardt
2021-08-05 11:25     ` [PATCH v4 1/6] revision: separate walk and unsorted flags Patrick Steinhardt
2021-08-05 18:47       ` Junio C Hamano
2021-08-05 11:25     ` [PATCH v4 2/6] connected: do not sort input revisions Patrick Steinhardt
2021-08-05 18:44       ` Junio C Hamano
2021-08-06  6:00         ` Patrick Steinhardt
2021-08-06 16:50           ` Junio C Hamano
2021-08-05 11:25     ` [PATCH v4 3/6] revision: stop retrieving reference twice Patrick Steinhardt
2021-08-05 11:25     ` [PATCH v4 4/6] revision: avoid loading object headers multiple times Patrick Steinhardt
2021-08-05 11:25     ` [PATCH v4 5/6] commit-graph: split out function to search commit position Patrick Steinhardt
2021-08-05 11:25     ` [PATCH v4 6/6] revision: avoid hitting packfiles when commits are in commit-graph Patrick Steinhardt
2021-08-09  8:00 ` [PATCH v5 0/5] Speed up connectivity checks Patrick Steinhardt
2021-08-09  8:02   ` Patrick Steinhardt
2021-08-09  8:11 ` Patrick Steinhardt
2021-08-09  8:11   ` [PATCH v5 1/5] revision: separate walk and unsorted flags Patrick Steinhardt
2021-08-09  8:11   ` [PATCH v5 2/5] connected: do not sort input revisions Patrick Steinhardt
2021-08-09  8:11   ` [PATCH v5 3/5] revision: stop retrieving reference twice Patrick Steinhardt
2021-08-09  8:11   ` [PATCH v5 4/5] commit-graph: split out function to search commit position Patrick Steinhardt
2021-08-09  8:12   ` [PATCH v5 5/5] revision: avoid hitting packfiles when commits are in commit-graph Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YN+DbDhMrqOnJbcO@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=avarab@gmail.com \
    --cc=chris.torek@gmail.com \
    --cc=felipe.contreras@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=ps@pks.im \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.