All of lore.kernel.org
 help / color / mirror / Atom feed
* rev-list --use-bitmap-index
@ 2021-03-05  8:18 Bryan Turner
  2021-03-05  8:24 ` Bryan Turner
  2021-03-05  8:46 ` Jeff King
  0 siblings, 2 replies; 4+ messages in thread
From: Bryan Turner @ 2021-03-05  8:18 UTC (permalink / raw)
  To: Git Users

The documentation for --use-bitmap-index notes that if used with
--objects trees and blobs they won't have their paths printed, but it
appears to change a whole lot more than that. In my testing, it
appears to mean --date-order, --format. --parents, and maybe more are
effectively ignored.

It appears this changed in 2.26.0. The release notes for that version
include this blurb, which seems like it might be relevant, but I'm not
sure:
* The object reachability bitmap machinery and the partial cloning
  machinery were not prepared to work well together, because some
  object-filtering criteria that partial clones use inherently rely
  on object traversal, but the bitmap machinery is an optimization
  to bypass that object traversal. There however are some cases
  where they can work together, and they were taught about them.

I have a repository with a bitmap:
$ git repack -abdfln --keep-unreachable
Marked 2 islands, done.
Enumerating objects: 3603142, done.
Propagating island marks: 100% (2576295/2576295), done.
Counting objects: 100% (3603142/3603142), done.
Delta compression using up to 20 threads
Compressing objects: 100% (2898179/2898179), done.
Writing objects: 100% (3603142/3603142), done.
Reusing bitmaps: 291, done.
Selecting bitmap commits: 293052, done.
Building bitmaps: 100% (363/363), done.

Here's some output from Git 2.25.1:
$ /opt/git/2.25.1/bin/git rev-list --boundary --ignore-missing
--date-order --parents --use-bitmap-index
c6abb83d2798415fa9fe0ebd683623620076b412
1c55e675a66cb98955232e1bd230119fd97a5467
634396036782682e7cd8c955070dfb30546ed58c -- | head
c6abb83d2798415fa9fe0ebd683623620076b412
2c7281b151d0079acc3f9b2c67d4667e1c9bf6d9
634396036782682e7cd8c955070dfb30546ed58c
1c55e675a66cb98955232e1bd230119fd97a5467
2c7281b151d0079acc3f9b2c67d4667e1c9bf6d9
d672894d3b2413b62034cb3cdb3470e5dee0001c
76250ec85aadff2ff451ec13efdadb8ccfd6b239
d672894d3b2413b62034cb3cdb3470e5dee0001c
013343e1900330429bcd1e31bb2ae7261fc1e3af
3e1e27621aa5f1d49286e23d77199004a835699e
3e1e27621aa5f1d49286e23d77199004a835699e
b944291d204cb7f3d5eb7678360b16435c53b2f3
b745a7b9bd9434eefb411d5f2a80a7187e3e8b93
1c55e675a66cb98955232e1bd230119fd97a5467
7f2c871e0d239e87bef7a1505ae928ae3a09a402
76250ec85aadff2ff451ec13efdadb8ccfd6b239
04f561866a9c015c14c69a0294b753ced5e084f2
013343e1900330429bcd1e31bb2ae7261fc1e3af
d907528818d010a360113790e227ebbcd8a61395
b745a7b9bd9434eefb411d5f2a80a7187e3e8b93
b944291d204cb7f3d5eb7678360b16435c53b2f3
7f2c871e0d239e87bef7a1505ae928ae3a09a402
c2ec4d3d76d865a9b701eb8be822d31252278a76

Changing to Git 2.26.0, I see this:
$ /opt/git/2.26.0/bin/git rev-list --boundary --ignore-missing
--date-order --parents --use-bitmap-index
c6abb83d2798415fa9fe0ebd683623620076b412
1c55e675a66cb98955232e1bd230119fd97a5467
634396036782682e7cd8c955070dfb30546ed58c -- | head
634396036782682e7cd8c955070dfb30546ed58c
1c55e675a66cb98955232e1bd230119fd97a5467
7f2c871e0d239e87bef7a1505ae928ae3a09a402
c2ec4d3d76d865a9b701eb8be822d31252278a76
899053a9043045fcfeb7f9254f2700d286c60a63
f1adcf64a8c06cb12f4e3e876040ee596fb3c0ca
16792db59ffdbbefe4a27a11a9831eac39be69a0
b844c3d11d09c2aec3428ce61bef02fdd097b9f9
802918fb139ef96cae5259822d22a36478c5e7b1
3a6105686ab302093648733dbf5fada3b44db72b

No parents now, and the commits aren't in the same order. I've tested
with 2.30.1 and it produces the same output as 2.26.0. If I remove the
bitmap, all versions produce the same output as 2.25.1 does, with
parents and in the expected order. (I should note, the bitmap is
perfectly up-to-date; I did the repack right before running these
rev-list commands. I've also tried the rev-list without several of the
options in place, like --boundary, and it behaves the same. This
command line is assembled automatically, so I'm just including it here
how the system produced it.)

Is this expected? If so, perhaps the --use-bitmap-index documentation
should be updated to indicate that it has unexpected interactions with
a whole lot more than just --objects? Or perhaps I'm doing something
wrong/unexpected here? What sorts of traversals are --use-bitmap-index
expected to be used for?

Best regards,
Bryan Turner

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: rev-list --use-bitmap-index
  2021-03-05  8:18 rev-list --use-bitmap-index Bryan Turner
@ 2021-03-05  8:24 ` Bryan Turner
  2021-03-05  8:46 ` Jeff King
  1 sibling, 0 replies; 4+ messages in thread
From: Bryan Turner @ 2021-03-05  8:24 UTC (permalink / raw)
  To: Git Users

On Fri, Mar 5, 2021 at 12:18 AM Bryan Turner <bturner@atlassian.com> wrote:
>
> The documentation for --use-bitmap-index notes that if used with
> --objects trees and blobs they won't have their paths printed, but it
> appears to change a whole lot more than that. In my testing, it
> appears to mean --date-order, --format. --parents, and maybe more are
> effectively ignored.
>
> It appears this changed in 2.26.0. The release notes for that version
> include this blurb, which seems like it might be relevant, but I'm not
> sure:
> * The object reachability bitmap machinery and the partial cloning
>   machinery were not prepared to work well together, because some
>   object-filtering criteria that partial clones use inherently rely
>   on object traversal, but the bitmap machinery is an optimization
>   to bypass that object traversal. There however are some cases
>   where they can work together, and they were taught about them.
>
> I have a repository with a bitmap:
> $ git repack -abdfln --keep-unreachable
> Marked 2 islands, done.
> Enumerating objects: 3603142, done.
> Propagating island marks: 100% (2576295/2576295), done.
> Counting objects: 100% (3603142/3603142), done.
> Delta compression using up to 20 threads
> Compressing objects: 100% (2898179/2898179), done.
> Writing objects: 100% (3603142/3603142), done.
> Reusing bitmaps: 291, done.
> Selecting bitmap commits: 293052, done.
> Building bitmaps: 100% (363/363), done.
>
> Here's some output from Git 2.25.1:
> $ /opt/git/2.25.1/bin/git rev-list --boundary --ignore-missing
> --date-order --parents --use-bitmap-index
> c6abb83d2798415fa9fe0ebd683623620076b412
> 1c55e675a66cb98955232e1bd230119fd97a5467
> 634396036782682e7cd8c955070dfb30546ed58c -- | head
> c6abb83d2798415fa9fe0ebd683623620076b412
> 2c7281b151d0079acc3f9b2c67d4667e1c9bf6d9
> 634396036782682e7cd8c955070dfb30546ed58c
> 1c55e675a66cb98955232e1bd230119fd97a5467
> 2c7281b151d0079acc3f9b2c67d4667e1c9bf6d9
> d672894d3b2413b62034cb3cdb3470e5dee0001c
> 76250ec85aadff2ff451ec13efdadb8ccfd6b239
> d672894d3b2413b62034cb3cdb3470e5dee0001c
> 013343e1900330429bcd1e31bb2ae7261fc1e3af
> 3e1e27621aa5f1d49286e23d77199004a835699e
> 3e1e27621aa5f1d49286e23d77199004a835699e
> b944291d204cb7f3d5eb7678360b16435c53b2f3
> b745a7b9bd9434eefb411d5f2a80a7187e3e8b93
> 1c55e675a66cb98955232e1bd230119fd97a5467
> 7f2c871e0d239e87bef7a1505ae928ae3a09a402
> 76250ec85aadff2ff451ec13efdadb8ccfd6b239
> 04f561866a9c015c14c69a0294b753ced5e084f2
> 013343e1900330429bcd1e31bb2ae7261fc1e3af
> d907528818d010a360113790e227ebbcd8a61395
> b745a7b9bd9434eefb411d5f2a80a7187e3e8b93
> b944291d204cb7f3d5eb7678360b16435c53b2f3
> 7f2c871e0d239e87bef7a1505ae928ae3a09a402
> c2ec4d3d76d865a9b701eb8be822d31252278a76

Apologies, it looks like Gmail helpfully jumped in and ruined my
output for me. Let me try this with shorter hashes:
c6abb83 2c7281b
6343960 1c55e67
2c7281b d672894 76250ec
d672894 013343e 3e1e276
3e1e276 b944291 b745a7b
1c55e67 7f2c871
76250ec 04f5618
013343e d907528
b745a7b b944291
7f2c871 c2ec4d3d

>
> Changing to Git 2.26.0, I see this:
> $ /opt/git/2.26.0/bin/git rev-list --boundary --ignore-missing
> --date-order --parents --use-bitmap-index
> c6abb83d2798415fa9fe0ebd683623620076b412
> 1c55e675a66cb98955232e1bd230119fd97a5467
> 634396036782682e7cd8c955070dfb30546ed58c -- | head
> 634396036782682e7cd8c955070dfb30546ed58c
> 1c55e675a66cb98955232e1bd230119fd97a5467
> 7f2c871e0d239e87bef7a1505ae928ae3a09a402
> c2ec4d3d76d865a9b701eb8be822d31252278a76
> 899053a9043045fcfeb7f9254f2700d286c60a63
> f1adcf64a8c06cb12f4e3e876040ee596fb3c0ca
> 16792db59ffdbbefe4a27a11a9831eac39be69a0
> b844c3d11d09c2aec3428ce61bef02fdd097b9f9
> 802918fb139ef96cae5259822d22a36478c5e7b1
> 3a6105686ab302093648733dbf5fada3b44db72b

All of these are on their own lines, as shown.

>
> No parents now, and the commits aren't in the same order. I've tested
> with 2.30.1 and it produces the same output as 2.26.0. If I remove the
> bitmap, all versions produce the same output as 2.25.1 does, with
> parents and in the expected order. (I should note, the bitmap is
> perfectly up-to-date; I did the repack right before running these
> rev-list commands. I've also tried the rev-list without several of the
> options in place, like --boundary, and it behaves the same. This
> command line is assembled automatically, so I'm just including it here
> how the system produced it.)
>
> Is this expected? If so, perhaps the --use-bitmap-index documentation
> should be updated to indicate that it has unexpected interactions with
> a whole lot more than just --objects? Or perhaps I'm doing something
> wrong/unexpected here? What sorts of traversals are --use-bitmap-index
> expected to be used for?
>
> Best regards,
> Bryan Turner

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: rev-list --use-bitmap-index
  2021-03-05  8:18 rev-list --use-bitmap-index Bryan Turner
  2021-03-05  8:24 ` Bryan Turner
@ 2021-03-05  8:46 ` Jeff King
  2021-03-06 23:00   ` Bryan Turner
  1 sibling, 1 reply; 4+ messages in thread
From: Jeff King @ 2021-03-05  8:46 UTC (permalink / raw)
  To: Bryan Turner; +Cc: Git Users

On Fri, Mar 05, 2021 at 12:18:15AM -0800, Bryan Turner wrote:

> The documentation for --use-bitmap-index notes that if used with
> --objects trees and blobs they won't have their paths printed, but it
> appears to change a whole lot more than that. In my testing, it
> appears to mean --date-order, --format. --parents, and maybe more are
> effectively ignored.

Yes, quite a few options won't work with bitmaps. The order you get is
not a traversal order at all, but mostly just the order of objects
within the pack (and then with any extra traversal we had to do tacked
onto the end!). Likewise something like "--boundary", as that implies
that we actually walked the graph. We probably _could_ support --format
and --parents, but don't.

Probably the documentation should be strengthened to say that
--use-bitmap-index implies thinking about the resulting objects as a set
result, rather than a traversal. Or maybe that's getting too into the
weeds.

> It appears this changed in 2.26.0. The release notes for that version
> include this blurb, which seems like it might be relevant, but I'm not
> sure:

It has always been the case that those options wouldn't work with
bitmaps. But v2.26 did let us use bitmaps in more cases.

The blurb you mentioned is a bit of a red herring; it only applies when
--filter is used. The interesting commit for your example below is
4eb707ebd6 (rev-list: allow commit-only bitmap traversals, 2020-02-14).

The "--use-bitmap-index" option is really "if you can use bitmaps to
speed things up, do so". So prior to v2.26 it was simply being ignored
in your example (and you got no speedup benefit from specifying it).

That "use it if you can" behavior should probably likewise be
documented. Callers need to be prepared to receive either result (and
hence asking for stuff like --boundary does not make any sense at all).

> Is this expected? If so, perhaps the --use-bitmap-index documentation
> should be updated to indicate that it has unexpected interactions with
> a whole lot more than just --objects? Or perhaps I'm doing something
> wrong/unexpected here? What sorts of traversals are --use-bitmap-index
> expected to be used for?

The interesting traversals IMHO are:

  - with --objects, quickly getting the result set (but without paths,
    and without any ordering)

  - with --count (with or without --objects), because we avoid quite a
    bit of work by counting bits rather than walking the graph

  - with the new --disk-usage, which likewise avoids a bunch of work

Asking about just commits via bitmaps isn't that big a speed improvement
these days, because commit graphs make the cost to actually traverse
each commit way cheaper (see the numbers in the commit I mentioned
above).

So the behavior you're seeing is expected, but probably not all that
useful (and you should likely just drop --use-bitmap-index).

-Peff

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: rev-list --use-bitmap-index
  2021-03-05  8:46 ` Jeff King
@ 2021-03-06 23:00   ` Bryan Turner
  0 siblings, 0 replies; 4+ messages in thread
From: Bryan Turner @ 2021-03-06 23:00 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Users

On Fri, Mar 5, 2021 at 12:47 AM Jeff King <peff@peff.net> wrote:
>
> On Fri, Mar 05, 2021 at 12:18:15AM -0800, Bryan Turner wrote:
>
> > The documentation for --use-bitmap-index notes that if used with
> > --objects trees and blobs they won't have their paths printed, but it
> > appears to change a whole lot more than that. In my testing, it
> > appears to mean --date-order, --format. --parents, and maybe more are
> > effectively ignored.
>
> Yes, quite a few options won't work with bitmaps. The order you get is
> not a traversal order at all, but mostly just the order of objects
> within the pack (and then with any extra traversal we had to do tacked
> onto the end!). Likewise something like "--boundary", as that implies
> that we actually walked the graph. We probably _could_ support --format
> and --parents, but don't.
>
> Probably the documentation should be strengthened to say that
> --use-bitmap-index implies thinking about the resulting objects as a set
> result, rather than a traversal. Or maybe that's getting too into the
> weeds.

I can't speak for anyone else, but to me that actually feels a lot
less "in the weeds" than the current documentation.
- The existing documentation talks about "speeding up the traversal",
but your comment here suggests what it actually does is _eliminate_
the traversal--producing output that's totally different (in terms of
ordering and content) to what a traversal would have produced
- The existing documentation talks about one specific flag, --objects,
where the output can change dramatically depending on whether
--use-bitmap-index is applied, but it doesn't hint at all that there
are other flags that are also affected, or may just be outright
ignored

Documenting that --use-bitmap-index produces a set, rather than a
traversal, might be more clear.

>
> > It appears this changed in 2.26.0. The release notes for that version
> > include this blurb, which seems like it might be relevant, but I'm not
> > sure:
>
> It has always been the case that those options wouldn't work with
> bitmaps. But v2.26 did let us use bitmaps in more cases.
>
> The blurb you mentioned is a bit of a red herring; it only applies when
> --filter is used. The interesting commit for your example below is
> 4eb707ebd6 (rev-list: allow commit-only bitmap traversals, 2020-02-14).
>
> The "--use-bitmap-index" option is really "if you can use bitmaps to
> speed things up, do so". So prior to v2.26 it was simply being ignored
> in your example (and you got no speedup benefit from specifying it).
>
> That "use it if you can" behavior should probably likewise be
> documented. Callers need to be prepared to receive either result (and
> hence asking for stuff like --boundary does not make any sense at all).
>
> > Is this expected? If so, perhaps the --use-bitmap-index documentation
> > should be updated to indicate that it has unexpected interactions with
> > a whole lot more than just --objects? Or perhaps I'm doing something
> > wrong/unexpected here? What sorts of traversals are --use-bitmap-index
> > expected to be used for?
>
> The interesting traversals IMHO are:
>
>   - with --objects, quickly getting the result set (but without paths,
>     and without any ordering)
>
>   - with --count (with or without --objects), because we avoid quite a
>     bit of work by counting bits rather than walking the graph
>
>   - with the new --disk-usage, which likewise avoids a bunch of work
>
> Asking about just commits via bitmaps isn't that big a speed improvement
> these days, because commit graphs make the cost to actually traverse
> each commit way cheaper (see the numbers in the commit I mentioned
> above).
>
> So the behavior you're seeing is expected, but probably not all that
> useful (and you should likely just drop --use-bitmap-index).

Thanks for all the details, Jeff, and for taking the time to provide
such a thorough answer. I had figured there must be some potential
downsides to --use-bitmap-index--otherwise, if it was just a simple
"go faster" knob I'd expect it would have long ago been enabled by
default--but trying to figure out what they are is tricky. And as they
start to materialize, the next challenge is to figure out, given those
downsides, where the upsides are useful.

Based on your list here, there are a couple places where I think I
could see some benefits, in other commands that I run, but it's clear
it's not a general-use option.

-b

>
> -Peff

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-03-06 23:01 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-05  8:18 rev-list --use-bitmap-index Bryan Turner
2021-03-05  8:24 ` Bryan Turner
2021-03-05  8:46 ` Jeff King
2021-03-06 23:00   ` Bryan Turner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.