All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Patrick Steinhardt <ps@pks.im>
Cc: Derrick Stolee <stolee@gmail.com>, git@vger.kernel.org
Subject: Re: [PATCH] bitmaps: don't recurse into trees already in the bitmap
Date: Fri, 18 Jun 2021 10:10:00 -0400	[thread overview]
Message-ID: <YMypONmXt142dhbb@coredump.intra.peff.net> (raw)
In-Reply-To: <YMyhCTaHmm6oNFpB@xps>

On Fri, Jun 18, 2021 at 03:35:05PM +0200, Patrick Steinhardt wrote:

> > More than once I've spent many head-scratching hours trying to determine
> > why some real-world repo performs better or worse than expected. :)
> 
> I couldn't agree more. I've also had my fair share of weird performance
> characteristics in completely unexpected ways. Unfortunately, it has
> made me become quite cautious about bitmaps given that they've already
> caused their fair share of perfomance regressions.
> 
> But your work here actually encourages me to give it another try soonish
> and see what kind of repo shapes make them explode this time.

Not really for immediate work, but I just wanted to note while I'm
thinking about it: the two big things that can actually make bitmaps
_worse_ than a regular traversal are:

  - loading the bitmaps is inherently linear in the file size. We could
    do better with an mmap-able index of bitmapped commits in the file.
    This would be easy to add as an extension to the file. This matters
    most for really small queries (the absolute time to parse the bitmap
    file isn't that high, but if the non-bitmap query can be answered in
    a few milliseconds, it becomes relatively more important).

  - we produce a definite answer of reachability for the bitmaps by
    filling in a bitmap for the negative (UNINTERESTING) tips, then one
    for the regular tips, and then taking a set difference. Whereas a
    normal traversal walks both sides together until it hits a merge
    base, and then actually looks into the trees in the boundary
    commits. This normal-traversal technique may look at fewer objects
    overall for some cases (especially if you have a lot of negative
    tips, like with "--not --all" and not-so-good bitmap coverage). But
    it does mean we may report an object as in the traversal when it's
    actually not (if it was added/removed in part of the uninteresting
    history, and so not present at the boundary).

    We could possibly teach the bitmap traversal to behave more like
    a normal one. I suspect this would clear up most of the regressions
    you'd see using bitmaps with rev-list.

There is one other related case I've run into. If you put a bitmap on
every commit, you'd _think_ that would make everything faster, since
you'd never traverse at all. But there's an inflection point where you
spend more time going through each bitmap twiddling bits (which remember
is O(nr_of_objects * nr_bitmapped_commits) bits across all of them). And
most of those bits will already be "1", so there's a point where the
saved traversal is less than the cost of OR-ing bits.

Anyway, this is all big-picture stuff that I don't expect you to work on
or even respond to. I'm mostly just writing up some thoughts that I
don't think have made it to the list in a coherent form. I'm not sure if
I'll eventually work on some of them or not (but anybody who is
interested can be my guest :) ).

-Peff

  reply	other threads:[~2021-06-18 14:10 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-14  7:27 [PATCH] bitmaps: don't recurse into trees already in the bitmap Jeff King
2021-06-14 12:05 ` Jeff King
2021-06-15 14:17   ` Derrick Stolee
2021-06-16 12:31     ` Patrick Steinhardt
2021-06-18 12:59       ` Jeff King
2021-06-18 13:35         ` Patrick Steinhardt
2021-06-18 14:10           ` Jeff King [this message]
2021-06-22 10:47           ` Patrick Steinhardt
2021-06-22 19:39             ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YMypONmXt142dhbb@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=ps@pks.im \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.